16,178 research outputs found
Optimization of transport protocols with path-length constraints in complex networks
We propose a protocol optimization technique that is applicable to both
weighted or unweighted graphs. Our aim is to explore by how much a small
variation around the Shortest Path or Optimal Path protocols can enhance
protocol performance. Such an optimization strategy can be necessary because
even though some protocols can achieve very high traffic tolerance levels, this
is commonly done by enlarging the path-lengths, which may jeopardize
scalability. We use ideas borrowed from Extremal Optimization to guide our
algorithm, which proves to be an effective technique. Our method exploits the
degeneracy of the paths or their close-weight alternatives, which significantly
improves the scalability of the protocols in comparison to Shortest Paths or
Optimal Paths protocols, keeping at the same time almost intact the length or
weight of the paths. This characteristic ensures that the optimized routing
protocols are composed of paths that are quick to traverse, avoiding negative
effects in data communication due to path-length increases that can become
specially relevant when information losses are present.Comment: 8 pages, 8 figure
Word forms are structured for efficient use
Zipf famously stated that, if natural language lexicons are structured for efficient communication, the words that are used the most frequently should require the least effort. This observation explains the famous finding that the most frequent words in a language tend to be short. A related prediction is that, even within words of the same length, the most frequent word forms should be the ones that are easiest to produce and understand. Using orthographics as a proxy for phonetics, we test this hypothesis using corpora of 96 languages from Wikipedia. We find that, across a variety of languages and language families and controlling for length, the most frequent forms in a language tend to be more orthographically well‐formed and have more orthographic neighbors than less frequent forms. We interpret this result as evidence that lexicons are structured by language usage pressures to facilitate efficient communication. Keywords: Lexicon; Word frequency; Phonology; Communication; EfficiencyNational Science Foundation (Grant ES/N0174041/1
Collective emotions online and their influence on community life
E-communities, social groups interacting online, have recently become an
object of interdisciplinary research. As with face-to-face meetings, Internet
exchanges may not only include factual information but also emotional
information - how participants feel about the subject discussed or other group
members. Emotions are known to be important in affecting interaction partners
in offline communication in many ways. Could emotions in Internet exchanges
affect others and systematically influence quantitative and qualitative aspects
of the trajectory of e-communities? The development of automatic sentiment
analysis has made large scale emotion detection and analysis possible using
text messages collected from the web. It is not clear if emotions in
e-communities primarily derive from individual group members' personalities or
if they result from intra-group interactions, and whether they influence group
activities. We show the collective character of affective phenomena on a large
scale as observed in 4 million posts downloaded from Blogs, Digg and BBC
forums. To test whether the emotions of a community member may influence the
emotions of others, posts were grouped into clusters of messages with similar
emotional valences. The frequency of long clusters was much higher than it
would be if emotions occurred at random. Distributions for cluster lengths can
be explained by preferential processes because conditional probabilities for
consecutive messages grow as a power law with cluster length. For BBC forum
threads, average discussion lengths were higher for larger values of absolute
average emotional valence in the first ten comments and the average amount of
emotion in messages fell during discussions. Our results prove that collective
emotional states can be created and modulated via Internet communication and
that emotional expressiveness is the fuel that sustains some e-communities.Comment: 23 pages including Supporting Information, accepted to PLoS ON
Decoding billions of integers per second through vectorization
In many important applications -- such as search engines and relational
database systems -- data is stored in the form of arrays of integers. Encoding
and, most importantly, decoding of these arrays consumes considerable CPU time.
Therefore, substantial effort has been made to reduce costs associated with
compression and decompression. In particular, researchers have exploited the
superscalar nature of modern processors and SIMD instructions. Nevertheless, we
introduce a novel vectorized scheme called SIMD-BP128 that improves over
previously proposed vectorized approaches. It is nearly twice as fast as the
previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the
same time, SIMD-BP128 saves up to 2 bits per integer. For even better
compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has
a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while
being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see
http://boytsov.info/datasets/clueweb09gap
The placement of the head that maximizes predictability. An information theoretic approach
The minimization of the length of syntactic dependencies is a
well-established principle of word order and the basis of a mathematical theory
of word order. Here we complete that theory from the perspective of information
theory, adding a competing word order principle: the maximization of
predictability of a target element. These two principles are in conflict: to
maximize the predictability of the head, the head should appear last, which
maximizes the costs with respect to dependency length minimization. The
implications of such a broad theoretical framework to understand the
optimality, diversity and evolution of the six possible orderings of subject,
object and verb are reviewed.Comment: in press in Glottometric
The optimality of word lengths. Theoretical foundations and an empirical study
Zipf's law of abbreviation, namely the tendency of more frequent words to be
shorter, has been viewed as a manifestation of compression, i.e. the
minimization of the length of forms -- a universal principle of natural
communication. Although the claim that languages are optimized has become
trendy, attempts to measure the degree of optimization of languages have been
rather scarce. Here we present two optimality scores that are dualy normalized,
namely, they are normalized with respect to both the minimum and the random
baseline. We analyze the theoretical and statistical pros and cons of these and
other scores. Harnessing the best score, we quantify for the first time the
degree of optimality of word lengths in languages. This indicates that
languages are optimized to 62 or 67 percent on average (depending on the
source) when word lengths are measured in characters, and to 65 percent on
average when word lengths are measured in time. In general, spoken word
durations are more optimized than written word lengths in characters. Our work
paves the way to measure the degree of optimality of the vocalizations or
gestures of other species, and to compare them against written, spoken, or
signed human languages.Comment: On the one hand, the article has been reduced: analyses of the law of
abbreviation and some of the methods have been moved to another article;
appendix B has been reduced. On the other hand, various parts have been
rewritten for clarity; new figures have been added to ease the understanding
of the scores; new citations added. Many typos have been correcte
- …