6,553 research outputs found
Integrated information increases with fitness in the evolution of animats
One of the hallmarks of biological organisms is their ability to integrate
disparate information sources to optimize their behavior in complex
environments. How this capability can be quantified and related to the
functional complexity of an organism remains a challenging problem, in
particular since organismal functional complexity is not well-defined. We
present here several candidate measures that quantify information and
integration, and study their dependence on fitness as an artificial agent
("animat") evolves over thousands of generations to solve a navigation task in
a simple, simulated environment. We compare the ability of these measures to
predict high fitness with more conventional information-theoretic processing
measures. As the animat adapts by increasing its "fit" to the world,
information integration and processing increase commensurately along the
evolutionary line of descent. We suggest that the correlation of fitness with
information integration and with processing measures implies that high fitness
requires both information processing as well as integration, but that
information integration may be a better measure when the task requires memory.
A correlation of measures of information integration (but also information
processing) and fitness strongly suggests that these measures reflect the
functional complexity of the animat, and that such measures can be used to
quantify functional complexity even in the absence of fitness data.Comment: 27 pages, 8 figures, one supplementary figure. Three supplementary
video files available on request. Version commensurate with published text in
PLoS Comput. Bio
Recovering complete and draft population genomes from metagenome datasets.
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery
Representation learning provides new and powerful graph analytical approaches
and tools for the highly valued data science challenge of mining knowledge
graphs. Since previous graph analytical methods have mostly focused on
homogeneous graphs, an important current challenge is extending this
methodology for richly heterogeneous graphs and knowledge domains. The
biomedical sciences are such a domain, reflecting the complexity of biology,
with entities such as genes, proteins, drugs, diseases, and phenotypes, and
relationships such as gene co-expression, biochemical regulation, and
biomolecular inhibition or activation. Therefore, the semantics of edges and
nodes are critical for representation learning and knowledge discovery in real
world biomedical problems. In this paper, we propose the edge2vec model, which
represents graphs considering edge semantics. An edge-type transition matrix is
trained by an Expectation-Maximization approach, and a stochastic gradient
descent model is employed to learn node embedding on a heterogeneous graph via
the trained transition matrix. edge2vec is validated on three biomedical domain
tasks: biomedical entity classification, compound-gene bioactivity prediction,
and biomedical information retrieval. Results show that by considering
edge-types into node embedding learning in heterogeneous graphs,
\textbf{edge2vec}\ significantly outperforms state-of-the-art models on all
three tasks. We propose this method for its added value relative to existing
graph analytical methodology, and in the real world context of biomedical
knowledge discovery applicability.Comment: 10 page
Gene Expression and its Discontents: Developmental disorders as dysfunctions of epigenetic cognition
Systems biology presently suffers the same mereological and sufficiency fallacies that haunt neural network models of high order cognition. Shifting perspective from the massively parallel space of gene matrix interactions to the grammar/syntax of the time series of expressed phenotypes using a cognitive paradigm permits import of techniques from statistical physics via the homology between information source uncertainty and free energy density. This produces a broad spectrum of possible statistical models of development and its pathologies in which epigenetic regulation and the effects of embedding environment are analogous to a tunable enzyme catalyst. A cognitive paradigm naturally incorporates memory, leading directly to models of epigenetic inheritance, as affected by environmental exposures, in the largest sense. Understanding gene expression, development, and their dysfunctions will require data analysis tools considerably more sophisticated than the present crop of simplistic models abducted from neural network studies or stochastic chemical reaction theory
- β¦