5,942 research outputs found
Criticality in Formal Languages and Statistical Physics
We show that the mutual information between two symbols, as a function of the
number of symbols between the two, decays exponentially in any probabilistic
regular grammar, but can decay like a power law for a context-free grammar.
This result about formal languages is closely related to a well-known result in
classical statistical mechanics that there are no phase transitions in
dimensions fewer than two. It is also related to the emergence of power-law
correlations in turbulence and cosmological inflation through recursive
generative processes. We elucidate these physics connections and comment on
potential applications of our results to machine learning tasks like training
artificial recurrent neural networks. Along the way, we introduce a useful
quantity which we dub the rational mutual information and discuss
generalizations of our claims involving more complicated Bayesian networks.Comment: Replaced to match final published version. Discussion improved,
references adde
Self-adaptive exploration in evolutionary search
We address a primary question of computational as well as biological research
on evolution: How can an exploration strategy adapt in such a way as to exploit
the information gained about the problem at hand? We first introduce an
integrated formalism of evolutionary search which provides a unified view on
different specific approaches. On this basis we discuss the implications of
indirect modeling (via a ``genotype-phenotype mapping'') on the exploration
strategy. Notions such as modularity, pleiotropy and functional phenotypic
complex are discussed as implications. Then, rigorously reflecting the notion
of self-adaptability, we introduce a new definition that captures
self-adaptability of exploration: different genotypes that map to the same
phenotype may represent (also topologically) different exploration strategies;
self-adaptability requires a variation of exploration strategies along such a
``neutral space''. By this definition, the concept of neutrality becomes a
central concern of this paper. Finally, we present examples of these concepts:
For a specific grammar-type encoding, we observe a large variability of
exploration strategies for a fixed phenotype, and a self-adaptive drift towards
short representations with highly structured exploration strategy that matches
the ``problem's structure''.Comment: 24 pages, 5 figure
Neural-Augmented Static Analysis of Android Communication
We address the problem of discovering communication links between
applications in the popular Android mobile operating system, an important
problem for security and privacy in Android. Any scalable static analysis in
this complex setting is bound to produce an excessive amount of
false-positives, rendering it impractical. To improve precision, we propose to
augment static analysis with a trained neural-network model that estimates the
probability that a communication link truly exists. We describe a
neural-network architecture that encodes abstractions of communicating objects
in two applications and estimates the probability with which a link indeed
exists. At the heart of our architecture are type-directed encoders (TDE), a
general framework for elegantly constructing encoders of a compound data type
by recursively composing encoders for its constituent types. We evaluate our
approach on a large corpus of Android applications, and demonstrate that it
achieves very high accuracy. Further, we conduct thorough interpretability
studies to understand the internals of the learned neural networks.Comment: Appears in Proceedings of the 2018 ACM Joint European Software
Engineering Conference and Symposium on the Foundations of Software
Engineering (ESEC/FSE
The use of information theory in evolutionary biology
Information is a key concept in evolutionary biology. Information is stored
in biological organism's genomes, and used to generate the organism as well as
to maintain and control it. Information is also "that which evolves". When a
population adapts to a local environment, information about this environment is
fixed in a representative genome. However, when an environment changes,
information can be lost. At the same time, information is processed by animal
brains to survive in complex environments, and the capacity for information
processing also evolves. Here I review applications of information theory to
the evolution of proteins as well as to the evolution of information processing
in simulated agents that adapt to perform a complex task.Comment: 25 pages, 7 figures. To appear in "The Year in Evolutionary Biology",
of the Annals of the NY Academy of Science
Multi-Objective GFlowNets
We study the problem of generating diverse candidates in the context of
Multi-Objective Optimization. In many applications of machine learning such as
drug discovery and material design, the goal is to generate candidates which
simultaneously optimize a set of potentially conflicting objectives. Moreover,
these objectives are often imperfect evaluations of some underlying property of
interest, making it important to generate diverse candidates to have multiple
options for expensive downstream evaluations. We propose Multi-Objective
GFlowNets (MOGFNs), a novel method for generating diverse Pareto optimal
solutions, based on GFlowNets. We introduce two variants of MOGFNs: MOGFN-PC,
which models a family of independent sub-problems defined by a scalarization
function, with reward-conditional GFlowNets, and MOGFN-AL, which solves a
sequence of sub-problems defined by an acquisition function in an active
learning loop. Our experiments on wide variety of synthetic and benchmark tasks
demonstrate advantages of the proposed methods in terms of the Pareto
performance and importantly, improved candidate diversity, which is the main
contribution of this work.Comment: 23 pages, 8 figures. ICML 2023. Code at:
https://github.com/GFNOrg/multi-objective-gf
Identifying statistical dependence in genomic sequences via mutual information estimates
Questions of understanding and quantifying the representation and amount of
information in organisms have become a central part of biological research, as
they potentially hold the key to fundamental advances. In this paper, we
demonstrate the use of information-theoretic tools for the task of identifying
segments of biomolecules (DNA or RNA) that are statistically correlated. We
develop a precise and reliable methodology, based on the notion of mutual
information, for finding and extracting statistical as well as structural
dependencies. A simple threshold function is defined, and its use in
quantifying the level of significance of dependencies between biological
segments is explored. These tools are used in two specific applications. First,
for the identification of correlations between different parts of the maize
zmSRp32 gene. There, we find significant dependencies between the 5'
untranslated region in zmSRp32 and its alternatively spliced exons. This
observation may indicate the presence of as-yet unknown alternative splicing
mechanisms or structural scaffolds. Second, using data from the FBI's Combined
DNA Index System (CODIS), we demonstrate that our approach is particularly well
suited for the problem of discovering short tandem repeats, an application of
importance in genetic profiling.Comment: Preliminary version. Final version in EURASIP Journal on
Bioinformatics and Systems Biology. See http://www.hindawi.com/journals/bsb
- …