294 research outputs found

    On the origin of distribution patterns of motifs in biological networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Inventories of small subgraphs in biological networks have identified commonly-recurring patterns, called motifs. The inference that these motifs have been selected for function rests on the idea that their occurrences are significantly more frequent than random.</p> <p>Results</p> <p>Our analysis of several large biological networks suggests, in contrast, that the frequencies of appearance of common subgraphs are similar in natural and corresponding random networks.</p> <p>Conclusion</p> <p>Indeed, certain topological features of biological networks give rise naturally to the common appearance of the motifs. We therefore question whether frequencies of occurrences are reasonable evidence that the structures of motifs have been selected for their functional contribution to the operation of networks.</p

    On Universal Codes for Integers: Wallace Tree, Elias Omega and Variations

    Full text link
    A universal code for the (positive) integers can be used to store or compress a sequence of integers. Every universal code implies a probability distribution on integers. This implied distribution may be a reasonable choice when the true distribution of a source of integers is unknown. Wallace Tree Code (WTC) is a universal code for integers based on binary trees. We give the encoding and decoding routines for WTC and analyse the properties of the code in comparison to two well-known codes, the Fibonacci and Elias omega codes. Some improvements on the Elias omega code are also described and examined.Comment: 8 pages, 8 figures (3 figure image files

    A fast indexing approach for protein structure comparison

    Get PDF
    BACKGROUND: Protein structure comparison is a fundamental task in structural biology. While the number of known protein structures has grown rapidly over the last decade, searching a large database of protein structures is still relatively slow using existing methods. There is a need for new techniques which can rapidly compare protein structures, whilst maintaining high matching accuracy. RESULTS: We have developed IR Tableau, a fast protein comparison algorithm, which leverages the tableau representation to compare protein tertiary structures. IR tableau compares tableaux using information retrieval style feature indexing techniques. Experimental analysis on the ASTRAL SCOP protein structural domain database demonstrates that IR Tableau achieves two orders of magnitude speedup over the search times of existing methods, while producing search results of comparable accuracy. CONCLUSION: We show that it is possible to obtain very significant speedups for the protein structure comparison problem, by employing an information retrieval style approach for indexing proteins. The comparison accuracy achieved is also strong, thus opening the way for large scale processing of very large protein structure databases

    How precise are reported protein coordinate data?

    Get PDF
    Atomic coordinates in the Worldwide Protein Data Bank (wwPDB) are generally reported to greater precision than the experimental structure determinations have actually achieved. By using information theory and data compression to study the compressibility of protein atomic coordinates, it is possible to quantify the amount of randomness in the coordinate data and thereby to determine the realistic precision of the reported coordinates. On average, the value of each Cα coordinate in a set of selected protein structures solved at a variety of resolutions is good to about 0.1 Å

    A Generalization of the Convex Kakeya Problem

    Full text link
    Given a set of line segments in the plane, not necessarily finite, what is a convex region of smallest area that contains a translate of each input segment? This question can be seen as a generalization of Kakeya's problem of finding a convex region of smallest area such that a needle can be rotated through 360 degrees within this region. We show that there is always an optimal region that is a triangle, and we give an optimal \Theta(n log n)-time algorithm to compute such a triangle for a given set of n segments. We also show that, if the goal is to minimize the perimeter of the region instead of its area, then placing the segments with their midpoint at the origin and taking their convex hull results in an optimal solution. Finally, we show that for any compact convex figure G, the smallest enclosing disk of G is a smallest-perimeter region containing a translate of every rotated copy of G.Comment: 14 pages, 9 figure

    The divergence time of protein structures modelled by Markov matrices and its relation to the divergence of sequences

    Full text link
    A complete time-parameterized statistical model quantifying the divergent evolution of protein structures in terms of the patterns of conservation of their secondary structures is inferred from a large collection of protein 3D structure alignments. This provides a better alternative to time-parameterized sequence-based models of protein relatedness, that have clear limitations dealing with twilight and midnight zones of sequence relationships. Since protein structures are far more conserved due to the selection pressure directly placed on their function, divergence time estimates can be more accurate when inferred from structures. We use the Bayesian and information-theoretic framework of Minimum Message Length to infer a time-parameterized stochastic matrix (accounting for perturbed structural states of related residues) and associated Dirichlet models (accounting for insertions and deletions during the evolution of protein domains). These are used in concert to estimate the Markov time of divergence of tertiary structures, a task previously only possible using proxies (like RMSD). By analyzing one million pairs of homologous structures, we yield a relationship between the Markov divergence time of structures and of sequences. Using these inferred models and the relationship between the divergence of sequences and structures, we demonstrate a competitive performance in secondary structure prediction against neural network architectures commonly employed for this task. The source code and supplementary information are downloadable from \url{http://lcb.infotech.monash.edu.au/sstsum}.Comment: 12 pages, 6 figure

    Genome-Wide Survey of MicroRNA - Transcription Factor Feed-Forward Regulatory Circuits in Human

    Full text link
    In this work, we describe a computational framework for the genome-wide identification and characterization of mixed transcriptional/post-transcriptional regulatory circuits in humans. We concentrated in particular on feed-forward loops (FFL), in which a master transcription factor regulates a microRNA, and together with it, a set of joint target protein coding genes. The circuits were assembled with a two step procedure. We first constructed separately the transcriptional and post-transcriptional components of the human regulatory network by looking for conserved over-represented motifs in human and mouse promoters, and 3'-UTRs. Then, we combined the two subnetworks looking for mixed feed-forward regulatory interactions, finding a total of 638 putative (merged) FFLs. In order to investigate their biological relevance, we filtered these circuits using three selection criteria: (I) GeneOntology enrichment among the joint targets of the FFL, (II) independent computational evidence for the regulatory interactions of the FFL, extracted from external databases, and (III) relevance of the FFL in cancer. Most of the selected FFLs seem to be involved in various aspects of organism development and differentiation. We finally discuss a few of the most interesting cases in detail.Comment: 51 pages, 5 figures, 4 tables. Supporting information included. Accepted for publication in Molecular BioSystem
    corecore