25,232 research outputs found
Population size predicts lexical diversity, but so does the mean sea level - why it is important to correctly account for the structure of temporal data
In order to demonstrate why it is important to correctly account for the
(serial dependent) structure of temporal data, we document an apparently
spectacular relationship between population size and lexical diversity: for
five out of seven investigated languages, there is a strong relationship
between population size and lexical diversity of the primary language in this
country. We show that this relationship is the result of a misspecified model
that does not consider the temporal aspect of the data by presenting a similar
but nonsensical relationship between the global annual mean sea level and
lexical diversity. Given the fact that in the recent past, several studies were
published that present surprising links between different economic, cultural,
political and (socio-)demographical variables on the one hand and cultural or
linguistic characteristics on the other hand, but seem to suffer from exactly
this problem, we explain the cause of the misspecification and show that it has
profound consequences. We demonstrate how simple transformation of the time
series can often solve problems of this type and argue that the evaluation of
the plausibility of a relationship is important in this context. We hope that
our paper will help both researchers and reviewers to understand why it is
important to use special models for the analysis of data with a natural
temporal ordering
Profiling of OCR'ed Historical Texts Revisited
In the absence of ground truth it is not possible to automatically determine
the exact spectrum and occurrences of OCR errors in an OCR'ed text. Yet, for
interactive postcorrection of OCR'ed historical printings it is extremely
useful to have a statistical profile available that provides an estimate of
error classes with associated frequencies, and that points to conjectured
errors and suspicious tokens. The method introduced in Reffle (2013) computes
such a profile, combining lexica, pattern sets and advanced matching techniques
in a specialized Expectation Maximization (EM) procedure. Here we improve this
method in three respects: First, the method in Reffle (2013) is not adaptive:
user feedback obtained by actual postcorrection steps cannot be used to compute
refined profiles. We introduce a variant of the method that is open for
adaptivity, taking correction steps of the user into account. This leads to
higher precision with respect to recognition of erroneous OCR tokens. Second,
during postcorrection often new historical patterns are found. We show that
adding new historical patterns to the linguistic background resources leads to
a second kind of improvement, enabling even higher precision by telling
historical spellings apart from OCR errors. Third, the method in Reffle (2013)
does not make any active use of tokens that cannot be interpreted in the
underlying channel model. We show that adding these uninterpretable tokens to
the set of conjectured errors leads to a significant improvement of the recall
for error detection, at the same time improving precision
TCP throughput guarantee in the DiffServ Assured Forwarding service: what about the results?
Since the proposition of Quality of Service architectures by the IETF, the
interaction between TCP and the QoS services has been intensively studied. This
paper proposes to look forward to the results obtained in terms of TCP
throughput guarantee in the DiffServ Assured Forwarding (DiffServ/AF) service
and to present an overview of the different proposals to solve the problem. It
has been demonstrated that the standardized IETF DiffServ conditioners such as
the token bucket color marker and the time sliding window color maker were not
good TCP traffic descriptors. Starting with this point, several propositions
have been made and most of them presents new marking schemes in order to
replace or improve the traditional token bucket color marker. The main problem
is that TCP congestion control is not designed to work with the AF service.
Indeed, both mechanisms are antagonists. TCP has the property to share in a
fair manner the bottleneck bandwidth between flows while DiffServ network
provides a level of service controllable and predictable. In this paper, we
build a classification of all the propositions made during these last years and
compare them. As a result, we will see that these conditioning schemes can be
separated in three sets of action level and that the conditioning at the
network edge level is the most accepted one. We conclude that the problem is
still unsolved and that TCP, conditioned or not conditioned, remains
inappropriate to the DiffServ/AF service
Adaptive Latency Insensitive Protocols andElastic Circuits with Early Evaluation: A Comparative Analysis
AbstractLatency Insensitive Protocols (LIP) and Elastic Circuits (EC) solve the same problem of rendering a design tolerant to additional latencies caused by wires or computational elements. They are performance-limited by a firing semantics that enforces coherency through a lazy evaluation rule: Computation is enabled if all inputs to a block are simultaneously available. Adaptive LIP's (ALIP) and EC with early evaluation (ECEE) increase the performance by relaxing the evaluation rule: Computation is enabled as soon as the subset of inputs needed at a given time is available. Their difference in terms of implementation and behavior in selected cases justifies the need for the comparative analysis reported in this paper. Results have been obtained through simple examples, a single representative case-study already used in the context of both LIP's and EC and through extensive simulations over a suite of benchmarks
Adaptive Load Balancing: A Study in Multi-Agent Learning
We study the process of multi-agent reinforcement learning in the context of
load balancing in a distributed system, without use of either central
coordination or explicit communication. We first define a precise framework in
which to study adaptive load balancing, important features of which are its
stochastic nature and the purely local information available to individual
agents. Given this framework, we show illuminating results on the interplay
between basic adaptive behavior parameters and their effect on system
efficiency. We then investigate the properties of adaptive load balancing in
heterogeneous populations, and address the issue of exploration vs.
exploitation in that context. Finally, we show that naive use of communication
may not improve, and might even harm system efficiency.Comment: See http://www.jair.org/ for any accompanying file
MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities
Entity Resolution (ER) aims to identify different descriptions in various
Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the
Variety, Volume and Veracity of entity descriptions published in the Web of
Data. To address them, we propose the MinoanER framework that simultaneously
fulfills full automation, support of highly heterogeneous entities, and massive
parallelization of the ER process. MinoanER leverages a token-based similarity
of entities to define a new metric that derives the similarity of neighboring
entities from the most important relations, as they are indicated only by
statistics. A composite blocking method is employed to capture different
sources of matching evidence from the content, neighbors, or names of entities.
The search space of candidate pairs for comparison is compactly abstracted by a
novel disjunctive blocking graph and processed by a non-iterative, massively
parallel matching algorithm that consists of four generic, schema-agnostic
matching rules that are quite robust with respect to their internal
configuration. We demonstrate that the effectiveness of MinoanER is comparable
to existing ER tools over real KBs exhibiting low Variety, but it outperforms
them significantly when matching KBs with high Variety.Comment: Presented at EDBT 2001
Sympathy Begins with a Smile, Intelligence Begins with a Word: Use of Multimodal Features in Spoken Human-Robot Interaction
Recognition of social signals, from human facial expressions or prosody of
speech, is a popular research topic in human-robot interaction studies. There
is also a long line of research in the spoken dialogue community that
investigates user satisfaction in relation to dialogue characteristics.
However, very little research relates a combination of multimodal social
signals and language features detected during spoken face-to-face human-robot
interaction to the resulting user perception of a robot. In this paper we show
how different emotional facial expressions of human users, in combination with
prosodic characteristics of human speech and features of human-robot dialogue,
correlate with users' impressions of the robot after a conversation. We find
that happiness in the user's recognised facial expression strongly correlates
with likeability of a robot, while dialogue-related features (such as number of
human turns or number of sentences per robot utterance) correlate with
perceiving a robot as intelligent. In addition, we show that facial expression,
emotional features, and prosody are better predictors of human ratings related
to perceived robot likeability and anthropomorphism, while linguistic and
non-linguistic features more often predict perceived robot intelligence and
interpretability. As such, these characteristics may in future be used as an
online reward signal for in-situ Reinforcement Learning based adaptive
human-robot dialogue systems.Comment: Robo-NLP workshop at ACL 2017. 9 pages, 5 figures, 6 table
A Flexible Network Approach to Privacy of Blockchain Transactions
For preserving privacy, blockchains can be equipped with dedicated mechanisms
to anonymize participants. However, these mechanism often take only the
abstraction layer of blockchains into account whereas observations of the
underlying network traffic can reveal the originator of a transaction request.
Previous solutions either provide topological privacy that can be broken by
attackers controlling a large number of nodes, or offer strong and
cryptographic privacy but are inefficient up to practical unusability. Further,
there is no flexible way to trade privacy against efficiency to adjust to
practical needs. We propose a novel approach that combines existing mechanisms
to have quantifiable and adjustable cryptographic privacy which is further
improved by augmented statistical measures that prevent frequent attacks with
lower resources. This approach achieves flexibility for privacy and efficency
requirements of different blockchain use cases.Comment: 6 pages, 2018 IEEE 38th International Conference on Distributed
Computing Systems (ICDCS
- âŠ