139,090 research outputs found

    Training-free Measures Based on Algorithmic Probability Identify High Nucleosome Occupancy in DNA Sequences

    Full text link
    We introduce and study a set of training-free methods of information-theoretic and algorithmic complexity nature applied to DNA sequences to identify their potential capabilities to determine nucleosomal binding sites. We test our measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancies and uncover their potential to pinpoint (high) nucleosome occupancy. We explore different possible signals within and beyond the nucleosome length and find that complexity indices are informative of nucleosome occupancy. We compare against the gold standard (Kaplan model) and find similar and complementary results with the main difference that our sequence complexity approach. For example, for high occupancy, complexity-based scores outperform the Kaplan model for predicting binding representing a significant advancement in predicting the highest nucleosome occupancy following a training-free approach.Comment: 8 pages main text (4 figures), 12 total with Supplementary (1 figure

    Approximations of Algorithmic and Structural Complexity Validate Cognitive-behavioural Experimental Results

    Full text link
    We apply methods for estimating the algorithmic complexity of sequences to behavioural sequences of three landmark studies of animal behavior each of increasing sophistication, including foraging communication by ants, flight patterns of fruit flies, and tactical deception and competition strategies in rodents. In each case, we demonstrate that approximations of Logical Depth and Kolmogorv-Chaitin complexity capture and validate previously reported results, in contrast to other measures such as Shannon Entropy, compression or ad hoc. Our method is practically useful when dealing with short sequences, such as those often encountered in cognitive-behavioural research. Our analysis supports and reveals non-random behavior (LD and K complexity) in flies even in the absence of external stimuli, and confirms the "stochastic" behaviour of transgenic rats when faced that they cannot defeat by counter prediction. The method constitutes a formal approach for testing hypotheses about the mechanisms underlying animal behaviour.Comment: 28 pages, 7 figures and 2 table

    Biology of Applied Digital Ecosystems

    Full text link
    A primary motivation for our research in Digital Ecosystems is the desire to exploit the self-organising properties of biological ecosystems. Ecosystems are thought to be robust, scalable architectures that can automatically solve complex, dynamic problems. However, the biological processes that contribute to these properties have not been made explicit in Digital Ecosystems research. Here, we discuss how biological properties contribute to the self-organising features of biological ecosystems, including population dynamics, evolution, a complex dynamic environment, and spatial distributions for generating local interactions. The potential for exploiting these properties in artificial systems is then considered. We suggest that several key features of biological ecosystems have not been fully explored in existing digital ecosystems, and discuss how mimicking these features may assist in developing robust, scalable self-organising architectures. An example architecture, the Digital Ecosystem, is considered in detail. The Digital Ecosystem is then measured experimentally through simulations, with measures originating from theoretical ecology, to confirm its likeness to a biological ecosystem. Including the responsiveness to requests for applications from the user base, as a measure of the 'ecological succession' (development).Comment: 9 pages, 4 figure, conferenc

    Algorithmic complexity for psychology: A user-friendly implementation of the coding theorem method

    Full text link
    Kolmogorov-Chaitin complexity has long been believed to be impossible to approximate when it comes to short sequences (e.g. of length 5-50). However, with the newly developed \emph{coding theorem method} the complexity of strings of length 2-11 can now be numerically estimated. We present the theoretical basis of algorithmic complexity for short strings (ACSS) and describe an R-package providing functions based on ACSS that will cover psychologists' needs and improve upon previous methods in three ways: (1) ACSS is now available not only for binary strings, but for strings based on up to 9 different symbols, (2) ACSS no longer requires time-consuming computing, and (3) a new approach based on ACSS gives access to an estimation of the complexity of strings of any length. Finally, three illustrative examples show how these tools can be applied to psychology.Comment: to appear in "Behavioral Research Methods", 14 pages in journal format, R package at http://cran.r-project.org/web/packages/acss/index.htm

    SLIDER: Mining correlated motifs in protein-protein interaction networks

    Get PDF
    Abstract—Correlated motif mining (CMM) is the problem to find overrepresented pairs of patterns, called motif pairs, in interacting protein sequences. Algorithmic solutions for CMM thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that CMM is an NP-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the method SLIDER which uses local search with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that SLIDER outperforms existing motif-driven CMM methods and scales to large protein-protein interaction networks

    Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies

    Get PDF
    Existing sequence alignment algorithms use heuristic scoring schemes which cannot be used as objective distance metrics. Therefore one relies on measures like the p- or log-det distances, or makes explicit, and often simplistic, assumptions about sequence evolution. Information theory provides an alternative, in the form of mutual information (MI) which is, in principle, an objective and model independent similarity measure. MI can be estimated by concatenating and zipping sequences, yielding thereby the "normalized compression distance". So far this has produced promising results, but with uncontrolled errors. We describe a simple approach to get robust estimates of MI from global pairwise alignments. Using standard alignment algorithms, this gives for animal mitochondrial DNA estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. Due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics, but we propose a simple modification that overcomes the issue of additivity. We test several versions of our MI based distance measures on a large number of randomly chosen quartets and demonstrate that they all perform better than traditional measures like the Kimura or log-det (resp. paralinear) distances. Even a simplified version based on single letter Shannon entropies, which can be easily incorporated in existing software packages, gave superior results throughout the entire animal kingdom. But we see the main virtue of our approach in a more general way. For example, it can also help to judge the relative merits of different alignment algorithms, by estimating the significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia
    • …
    corecore