280 research outputs found

    Bounded prefix-suffix duplication

    Get PDF
    We consider a restricted variant of the prefix-suffix duplication operation, called bounded prefix-suffix duplication. It consists in the iterative duplication of a prefix or suffix, whose length is bounded by a constant, of a given word. We give a sufficient condition for the closure under bounded prefix-suffix duplication of a class of languages. Consequently, the class of regular languages is closed under bounded prefix-suffix duplication; furthermore, we propose an algorithm deciding whether a regular language is a finite k-prefix-suffix duplication language. An efficient algorithm solving the membership problem for the k-prefix-suffix duplication of a language is also presented. Finally, we define the k-prefix-suffix duplication distance between two words, extend it to languages and show how it can be computed for regular languages

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

    Evaluation of the Kinetic Properties of the Sporulation Protein SpoIIE of Bacillus subtilis by Inclusion in a Model Membrane

    Full text link
    Starvation induces Bacillus subtilis to initiate a developmental process (sporulation) that includes asymmetric cell division to form the prespore and the mother cell. The integral membrane protein SpoIIE is essential for the prespore-specific activation of the transcription factor σ(F), and it also has a morphogenic activity required for asymmetric division. An increase in the local concentration of SpoIIE at the polar septum of B. subtilis precedes dephosphorylation of the anti-anti-sigma factor SpoIIAA in the prespore. After closure and invagination of the asymmetric septum, phosphatase activity of SpoIIE increases severalfold, but the reason for this dramatic change in activity has not been determined. The central domain of SpoIIE has been seen to self-associate (I. Lucet et al., EMBO J. 19:1467-1475, 2000), suggesting that activation of the C-terminal PP2C-like phosphatase domain might be due to conformational changes brought about by the increased local concentration of SpoIIE in the sporulating septum. Here we report the inclusion of purified SpoIIE protein into a model membrane as a method for studying the effect of local concentration in a lipid bilayer on activity. In vitro assays indicate that the membrane-bound enzyme maintains dephosphorylation rates similar to the highly active micellar state at all molar ratios of protein to lipid. Atomic force microscopy images indicate that increased local concentration does not lead to self-association

    Smarter Vaccine Design Will Circumvent Regulatory T Cell-Mediated Evasion in Chronic HIV and HCV Infection

    Get PDF
    Despite years of research, vaccines against HIV and HCV are not yet available, due largely to effective viral immunoevasive mechanisms. A novel escape mechanism observed in viruses that cause chronic infection is suppression of viral-specific effector CD4(+) and CD8(+) T cells by stimulating regulatory T cells (Tregs) educated on host sequences during tolerance induction. Viral class II MHC epitopes that share a T cell receptor (TCR)-face with host epitopes may activate Tregs capable of suppressing protective responses. We designed an immunoinformatic algorithm, JanusMatrix, to identify such epitopes and discovered that among human-host viruses, chronic viruses appear more human-like than viruses that cause acute infection. Furthermore, an HCV epitope that activates Tregs in chronically infected patients, but not clearers, shares a TCR-face with numerous human sequences. To boost weak CD4(+) T cell responses associated with persistent infection, vaccines for HIV and HCV must circumvent potential Treg activation that can handicap efficacy. Epitope-driven approaches to vaccine design that involve careful consideration of the T cell subsets primed during immunization will advance HIV and HCV vaccine development

    Category Theoretic Analysis of Hierarchical Protein Materials and Social Networks

    Get PDF
    Materials in biology span all the scales from Angstroms to meters and typically consist of complex hierarchical assemblies of simple building blocks. Here we describe an application of category theory to describe structural and resulting functional properties of biological protein materials by developing so-called ologs. An olog is like a “concept web” or “semantic network” except that it follows a rigorous mathematical formulation based on category theory. This key difference ensures that an olog is unambiguous, highly adaptable to evolution and change, and suitable for sharing concepts with other olog. We consider simple cases of beta-helical and amyloid-like protein filaments subjected to axial extension and develop an olog representation of their structural and resulting mechanical properties. We also construct a representation of a social network in which people send text-messages to their nearest neighbors and act as a team to perform a task. We show that the olog for the protein and the olog for the social network feature identical category-theoretic representations, and we proceed to precisely explicate the analogy or isomorphism between them. The examples presented here demonstrate that the intrinsic nature of a complex system, which in particular includes a precise relationship between structure and function at different hierarchical levels, can be effectively represented by an olog. This, in turn, allows for comparative studies between disparate materials or fields of application, and results in novel approaches to derive functionality in the design of de novo hierarchical systems. We discuss opportunities and challenges associated with the description of complex biological materials by using ologs as a powerful tool for analysis and design in the context of materiomics, and we present the potential impact of this approach for engineering, life sciences, and medicine.Presidential Early Career Award for Scientists and Engineers (N000141010562)United States. Army Research Office. Multidisciplinary University Research Initiative (W911NF0910541)United States. Office of Naval Research (grant N000141010841)Massachusetts Institute of Technology. Dept. of MathematicsStudienstiftung des deutschen VolkesClark BarwickJacob Luri

    Are grammatical representations useful for learning from biological sequence data?— a case study

    Get PDF
    This paper investigates whether Chomsky-like grammar representations are useful for learning cost-effective, comprehensible predictors of members of biological sequence families. The Inductive Logic Programming (ILP) Bayesian approach to learning from positive examples is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Collectively, five of the co-authors of this paper, have extensive expertise on NPPs and general bioinformatics methods. Their motivation for generating a NPP grammar was that none of the existing bioinformatics methods could provide sufficient cost-savings during the search for new NPPs. Prior to this project experienced specialists at SmithKline Beecham had tried for many months to hand-code such a grammar but without success. Our best predictor makes the search for novel NPPs more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. As far as these authors are aware, this is both the first biological grammar learnt using ILP and the first real-world scientific application of the ILP Bayesian approach to learning from positive examples. A group of features is derived from this grammar. Other groups of features of NPPs are derived using other learning strategies. Amalgams of these groups are formed. A recognition model is generated for each amalgam using C4.5 and C4.5rules and its performance is measured using both predictive accuracy and a new cost function, Relative Advantage (RA). The highest RA was achieved by a model which includes grammar-derived features. This RA is significantly higher than the best RA achieved without the use of the grammar-derived features. Predictive accuracy is not a good measure of performance for this domain because it does not discriminate well between NPP recognition models: despite covering varying numbers of (the rare) positives, all the models are awarded a similar (high) score by predictive accuracy because they all exclude most of the abundant negatives

    Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set

    Get PDF
    There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands
    corecore