Search CORE

5,942 research outputs found

Criticality in Formal Languages and Statistical Physics

Author: Lin Henry W.
Tegmark Max
Publication venue: 'MDPI AG'
Publication date: 23/06/2017
Field of study

We show that the mutual information between two symbols, as a function of the number of symbols between the two, decays exponentially in any probabilistic regular grammar, but can decay like a power law for a context-free grammar. This result about formal languages is closely related to a well-known result in classical statistical mechanics that there are no phase transitions in dimensions fewer than two. It is also related to the emergence of power-law correlations in turbulence and cosmological inflation through recursive generative processes. We elucidate these physics connections and comment on potential applications of our results to machine learning tasks like training artificial recurrent neural networks. Along the way, we introduce a useful quantity which we dub the rational mutual information and discuss generalizations of our claims involving more complicated Bayesian networks.Comment: Replaced to match final published version. Discussion improved, references adde

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Self-adaptive exploration in evolutionary search

Author: Toussaint Marc
Publication venue
Publication date: 01/01/2001
Field of study

We address a primary question of computational as well as biological research on evolution: How can an exploration strategy adapt in such a way as to exploit the information gained about the problem at hand? We first introduce an integrated formalism of evolutionary search which provides a unified view on different specific approaches. On this basis we discuss the implications of indirect modeling (via a ``genotype-phenotype mapping'') on the exploration strategy. Notions such as modularity, pleiotropy and functional phenotypic complex are discussed as implications. Then, rigorously reflecting the notion of self-adaptability, we introduce a new definition that captures self-adaptability of exploration: different genotypes that map to the same phenotype may represent (also topologically) different exploration strategies; self-adaptability requires a variation of exploration strategies along such a ``neutral space''. By this definition, the concept of neutrality becomes a central concern of this paper. Finally, we present examples of these concepts: For a specific grammar-type encoding, we observe a large variability of exploration strategies for a fixed phenotype, and a self-adaptive drift towards short representations with highly structured exploration strategy that matches the ``problem's structure''.Comment: 24 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Neural-Augmented Static Analysis of Android Communication

Author: Abadi Martín
Allamanis Miltiadis
Allamanis Miltiadis
Elish Karim O
Information
Kim Yoon
Kremenek Ted
Octeau Damien
van der Maaten Laurens
Yang Wei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/09/2018
Field of study

We address the problem of discovering communication links between applications in the popular Android mobile operating system, an important problem for security and privacy in Android. Any scalable static analysis in this complex setting is bound to produce an excessive amount of false-positives, rendering it impractical. To improve precision, we propose to augment static analysis with a trained neural-network model that estimates the probability that a communication link truly exists. We describe a neural-network architecture that encodes abstractions of communicating objects in two applications and estimates the probability with which a link indeed exists. At the heart of our architecture are type-directed encoders (TDE), a general framework for elegantly constructing encoders of a compound data type by recursively composing encoders for its constituent types. We evaluate our approach on a large corpus of Android applications, and demonstrate that it achieves very high accuracy. Further, we conduct thorough interpretability studies to understand the internals of the learned neural networks.Comment: Appears in Proceedings of the 2018 ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE

arXiv.org e-Print Archive

Crossref

The use of information theory in evolutionary biology

Author: Adami
Adami
Adami
Adami
Adami
Adami
Ash
Atchley
Ay
Ay
Ay
Balduzzi
Balduzzi
Barrick
Basharin
Benner
Bialek
Billeter
Blount
Callahan
Carothers
Clarke
Cooper
Cover
da Silva
Darwin
Eddy
Edlund
Ewens
Federhen
Finn
Fletcher
Fletcher
Futuyma
Garcia-Horsman
Hartl
Iliopoulos
Jühling
Klyubin
Korber
Kryazhimskiy
Landauer
Lenski
Lenski
Lenski
Levy
Li
Linsker
Lungarella
Lungarella
Maynard Smith
McGill
Pauling
Polani
Queller
Rivoire
Robinson
Schneidman
Scott
Shannon
Sporns
Sporns
Taanman
Thornton
Tononi
Tononi
Tononi
Tononi
Tononi
Tononi
Tononi
Tononi
van der Graaff
Waddington
Wahl
Wang
Wang
Wiener
Woods
Zahedi
Publication venue: 'Wiley'
Publication date: 16/12/2011
Field of study

Information is a key concept in evolutionary biology. Information is stored in biological organism's genomes, and used to generate the organism as well as to maintain and control it. Information is also "that which evolves". When a population adapts to a local environment, information about this environment is fixed in a representative genome. However, when an environment changes, information can be lost. At the same time, information is processed by animal brains to survive in complex environments, and the capacity for information processing also evolves. Here I review applications of information theory to the evolution of proteins as well as to the evolution of information processing in simulated agents that adapt to perform a complex task.Comment: 25 pages, 7 figures. To appear in "The Year in Evolutionary Biology", of the Annals of the NY Academy of Science

arXiv.org e-Print Archive

Crossref

Multi-Objective GFlowNets

Author: Bengio Emmanuel
Bengio Yoshua
Hernandez-Garcia Alex
Jain Moksh
Miret Santiago
Raparthy Sharath Chandra
Rector-Brooks Jarrid
Publication venue
Publication date: 17/07/2023
Field of study

We study the problem of generating diverse candidates in the context of Multi-Objective Optimization. In many applications of machine learning such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives. Moreover, these objectives are often imperfect evaluations of some underlying property of interest, making it important to generate diverse candidates to have multiple options for expensive downstream evaluations. We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse Pareto optimal solutions, based on GFlowNets. We introduce two variants of MOGFNs: MOGFN-PC, which models a family of independent sub-problems defined by a scalarization function, with reward-conditional GFlowNets, and MOGFN-AL, which solves a sequence of sub-problems defined by an acquisition function in an active learning loop. Our experiments on wide variety of synthetic and benchmark tasks demonstrate advantages of the proposed methods in terms of the Pareto performance and importantly, improved candidate diversity, which is the main contribution of this work.Comment: 23 pages, 8 figures. ICML 2023. Code at: https://github.com/GFNOrg/multi-objective-gf

arXiv.org e-Print Archive

Identifying statistical dependence in genomic sequences via mutual information estimates

Author: Aktulga HM
Grama AY
Kontoyiannis I
Lyznik LA
Szpankowski L
Szpankowski W
Publication venue
Publication date: 01/01/2007
Field of study

Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5' untranslated region in zmSRp32 and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from the FBI's Combined DNA Index System (CODIS), we demonstrate that our approach is particularly well suited for the problem of discovering short tandem repeats, an application of importance in genetic profiling.Comment: Preliminary version. Final version in EURASIP Journal on Bioinformatics and Systems Biology. See http://www.hindawi.com/journals/bsb

arXiv.org e-Print Archive

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

CUED - Cambridge University Engineering Department