4,502 research outputs found
Zipf and Heaps laws from dependency structures in component systems
Complex natural and technological systems can be considered, on a
coarse-grained level, as assemblies of elementary components: for example,
genomes as sets of genes, or texts as sets of words. On one hand, the joint
occurrence of components emerges from architectural and specific constraints in
such systems. On the other hand, general regularities may unify different
systems, such as the broadly studied Zipf and Heaps laws, respectively
concerning the distribution of component frequencies and their number as a
function of system size. Dependency structures (i.e., directed networks
encoding the dependency relations between the components in a system) were
proposed recently as a possible organizing principles underlying some of the
regularities observed. However, the consequences of this assumption were
explored only in binary component systems, where solely the presence or absence
of components is considered, and multiple copies of the same component are not
allowed. Here, we consider a simple model that generates, from a given ensemble
of dependency structures, a statistical ensemble of sets of components,
allowing for components to appear with any multiplicity. Our model is a minimal
extension that is memoryless, and therefore accessible to analytical
calculations. A mean-field analytical approach (analogous to the "Zipfian
ensemble" in the linguistics literature) captures the relevant laws describing
the component statistics as we show by comparison with numerical computations.
In particular, we recover a power-law Zipf rank plot, with a set of core
components, and a Heaps law displaying three consecutive regimes (linear,
sub-linear and saturating) that we characterize quantitatively
On Verifying Complex Properties using Symbolic Shape Analysis
One of the main challenges in the verification of software systems is the
analysis of unbounded data structures with dynamic memory allocation, such as
linked data structures and arrays. We describe Bohne, a new analysis for
verifying data structures. Bohne verifies data structure operations and shows
that 1) the operations preserve data structure invariants and 2) the operations
satisfy their specifications expressed in terms of changes to the set of
objects stored in the data structure. During the analysis, Bohne infers loop
invariants in the form of disjunctions of universally quantified Boolean
combinations of formulas. To synthesize loop invariants of this form, Bohne
uses a combination of decision procedures for Monadic Second-Order Logic over
trees, SMT-LIB decision procedures (currently CVC Lite), and an automated
reasoner within the Isabelle interactive theorem prover. This architecture
shows that synthesized loop invariants can serve as a useful communication
mechanism between different decision procedures. Using Bohne, we have verified
operations on data structures such as linked lists with iterators and back
pointers, trees with and without parent pointers, two-level skip lists, array
data structures, and sorted lists. We have deployed Bohne in the Hob and Jahob
data structure analysis systems, enabling us to combine Bohne with analyses of
data structure clients and apply it in the context of larger programs. This
report describes the Bohne algorithm as well as techniques that Bohne uses to
reduce the ammount of annotations and the running time of the analysis
Log-log Convexity of Type-Token Growth in Zipf's Systems
It is traditionally assumed that Zipf's law implies the power-law growth of
the number of different elements with the total number of elements in a system
- the so-called Heaps' law. We show that a careful definition of Zipf's law
leads to the violation of Heaps' law in random systems, and obtain alternative
growth curves. These curves fulfill universal data collapses that only depend
on the value of the Zipf's exponent. We observe that real books behave very
much in the same way as random systems, despite the presence of burstiness in
word occurrence. We advance an explanation for this unexpected correspondence
Do Neural Nets Learn Statistical Laws behind Natural Language?
The performance of deep learning in natural language processing has been
spectacular, but the reasons for this success remain unclear because of the
inherent complexity of deep learning. This paper provides empirical evidence of
its effectiveness and of a limitation of neural networks for language
engineering. Precisely, we demonstrate that a neural language model based on
long short-term memory (LSTM) effectively reproduces Zipf's law and Heaps' law,
two representative statistical properties underlying natural language. We
discuss the quality of reproducibility and the emergence of Zipf's law and
Heaps' law as training progresses. We also point out that the neural language
model has a limitation in reproducing long-range correlation, another
statistical property of natural language. This understanding could provide a
direction for improving the architectures of neural networks.Comment: 21 pages, 11 figure
Semantics of Separation-Logic Typing and Higher-order Frame Rules for<br> Algol-like Languages
We show how to give a coherent semantics to programs that are well-specified
in a version of separation logic for a language with higher types: idealized
algol extended with heaps (but with immutable stack variables). In particular,
we provide simple sound rules for deriving higher-order frame rules, allowing
for local reasoning
Statistics of shared components in complex component systems
Many complex systems are modular. Such systems can be represented as
"component systems", i.e., sets of elementary components, such as LEGO bricks
in LEGO sets. The bricks found in a LEGO set reflect a target architecture,
which can be built following a set-specific list of instructions. In other
component systems, instead, the underlying functional design and constraints
are not obvious a priori, and their detection is often a challenge of both
scientific and practical importance, requiring a clear understanding of
component statistics. Importantly, some quantitative invariants appear to be
common to many component systems, most notably a common broad distribution of
component abundances, which often resembles the well-known Zipf's law. Such
"laws" affect in a general and non-trivial way the component statistics,
potentially hindering the identification of system-specific functional
constraints or generative processes. Here, we specifically focus on the
statistics of shared components, i.e., the distribution of the number of
components shared by different system-realizations, such as the common bricks
found in different LEGO sets. To account for the effects of component
heterogeneity, we consider a simple null model, which builds
system-realizations by random draws from a universe of possible components.
Under general assumptions on abundance heterogeneity, we provide analytical
estimates of component occurrence, which quantify exhaustively the statistics
of shared components. Surprisingly, this simple null model can positively
explain important features of empirical component-occurrence distributions
obtained from data on bacterial genomes, LEGO sets, and book chapters. Specific
architectural features and functional constraints can be detected from
occurrence patterns as deviations from these null predictions, as we show for
the illustrative case of the "core" genome in bacteria.Comment: 18 pages, 7 main figures, 7 supplementary figure
- …