1,748 research outputs found
Long-Range Correlation Underlying Childhood Language and Generative Models
Long-range correlation, a property of time series exhibiting long-term
memory, is mainly studied in the statistical physics domain and has been
reported to exist in natural language. Using a state-of-the-art method for such
analysis, long-range correlation is first shown to occur in long CHILDES data
sets. To understand why, Bayesian generative models of language, originally
proposed in the cognitive scientific domain, are investigated. Among
representative models, the Simon model was found to exhibit surprisingly good
long-range correlation, but not the Pitman-Yor model. Since the Simon model is
known not to correctly reflect the vocabulary growth of natural language, a
simple new model is devised as a conjunct of the Simon and Pitman-Yor models,
such that long-range correlation holds with a correct vocabulary growth rate.
The investigation overall suggests that uniform sampling is one cause of
long-range correlation and could thus have a relation with actual linguistic
processes
Finding community structure in very large networks
The discovery and analysis of community structure in networks is a topic of
considerable recent interest within the physics community, but most methods
proposed so far are unsuitable for very large networks because of their
computational cost. Here we present a hierarchical agglomeration algorithm for
detecting community structure which is faster than many competing algorithms:
its running time on a network with n vertices and m edges is O(m d log n) where
d is the depth of the dendrogram describing the community structure. Many
real-world networks are sparse and hierarchical, with m ~ n and d ~ log n, in
which case our algorithm runs in essentially linear time, O(n log^2 n). As an
example of the application of this algorithm we use it to analyze a network of
items for sale on the web-site of a large online retailer, items in the network
being linked if they are frequently purchased by the same buyer. The network
has more than 400,000 vertices and 2 million edges. We show that our algorithm
can extract meaningful communities from this network, revealing large-scale
patterns present in the purchasing habits of customers
Сравнение сложность алгоритмов вставкой и быстрой сортировки
A sorting algorithm is an algorithm that puts elements of a list in a certain order. The most used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) which require input data to be in sorted lists; it is also often useful for canonicalizing data and for producing human readable output. Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide and conquer algorithms, data structures such as heaps and binary trees, randomized algorithms, best, worst and average case analysis, time-space tradeoffs, and upper and lower bounds
Energy-Efficient Algorithms
We initiate the systematic study of the energy complexity of algorithms (in
addition to time and space complexity) based on Landauer's Principle in
physics, which gives a lower bound on the amount of energy a system must
dissipate if it destroys information. We propose energy-aware variations of
three standard models of computation: circuit RAM, word RAM, and
transdichotomous RAM. On top of these models, we build familiar high-level
primitives such as control logic, memory allocation, and garbage collection
with zero energy complexity and only constant-factor overheads in space and
time complexity, enabling simple expression of energy-efficient algorithms. We
analyze several classic algorithms in our models and develop low-energy
variations: comparison sort, insertion sort, counting sort, breadth-first
search, Bellman-Ford, Floyd-Warshall, matrix all-pairs shortest paths, AVL
trees, binary heaps, and dynamic arrays. We explore the time/space/energy
trade-off and develop several general techniques for analyzing algorithms and
reducing their energy complexity. These results lay a theoretical foundation
for a new field of semi-reversible computing and provide a new framework for
the investigation of algorithms.Comment: 40 pages, 8 pdf figures, full version of work published in ITCS 201
Optimizing Properties of Balanced Words
In the past few decades there has been a good deal of papers which are
concerned with optimization problems in different areas of mathematics (along
0-1 words, finite or infinite) and which yield - sometimes quite unexpectedly -
balanced words as optimal. In this note we list some key results along these
lines known to date.Comment: In Proceedings WORDS 2011, arXiv:1108.341
Route Planning in Transportation Networks
We survey recent advances in algorithms for route planning in transportation
networks. For road networks, we show that one can compute driving directions in
milliseconds or less even at continental scale. A variety of techniques provide
different trade-offs between preprocessing effort, space requirements, and
query time. Some algorithms can answer queries in a fraction of a microsecond,
while others can deal efficiently with real-time traffic. Journey planning on
public transportation systems, although conceptually similar, is a
significantly harder problem due to its inherent time-dependent and
multicriteria nature. Although exact algorithms are fast enough for interactive
queries on metropolitan transit systems, dealing with continent-sized instances
requires simplifications or heavy preprocessing. The multimodal route planning
problem, which seeks journeys combining schedule-based transportation (buses,
trains) with unrestricted modes (walking, driving), is even harder, relying on
approximate solutions even for metropolitan inputs.Comment: This is an updated version of the technical report MSR-TR-2014-4,
previously published by Microsoft Research. This work was mostly done while
the authors Daniel Delling, Andrew Goldberg, and Renato F. Werneck were at
Microsoft Research Silicon Valle
- …