136 research outputs found

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

    IgTM: An algorithm to predict transmembrane domains and topology in proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Due to their role of receptors or transporters, membrane proteins play a key role in many important biological functions. In our work we used Grammatical Inference (GI) to localize transmembrane segments. Our GI process is based specifically on the inference of Even Linear Languages.</p> <p>Results</p> <p>We obtained values close to 80% in both specificity and sensitivity. Six datasets have been used for the experiments, considering different encodings for the input sequences. An encoding that includes the topology changes in the sequence (from inside and outside the membrane to it and vice versa) allowed us to obtain the best results. This software is publicly available at: <url>http://www.dsic.upv.es/users/tlcc/bio/bio.html</url></p> <p>Conclusion</p> <p>We compared our results with other well-known methods, that obtain a slightly better precision. However, this work shows that it is possible to apply Grammatical Inference techniques in an effective way to bioinformatics problems.</p

    Directed acyclic graph kernels for structural RNA analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent discoveries of a large variety of important roles for non-coding RNAs (ncRNAs) have been reported by numerous researchers. In order to analyze ncRNAs by kernel methods including support vector machines, we propose stem kernels as an extension of string kernels for measuring the similarities between two RNA sequences from the viewpoint of secondary structures. However, applying stem kernels directly to large data sets of ncRNAs is impractical due to their computational complexity.</p> <p>Results</p> <p>We have developed a new technique based on directed acyclic graphs (DAGs) derived from base-pairing probability matrices of RNA sequences that significantly increases the computation speed of stem kernels. Furthermore, we propose profile-profile stem kernels for multiple alignments of RNA sequences which utilize base-pairing probability matrices for multiple alignments instead of those for individual sequences. Our kernels outperformed the existing methods with respect to the detection of known ncRNAs and kernel hierarchical clustering.</p> <p>Conclusion</p> <p>Stem kernels can be utilized as a reliable similarity measure of structural RNAs, and can be used in various kernel-based applications.</p

    Large-Scale Phylogenetic Analysis of Emerging Infectious Diseases

    Get PDF
    Microorganisms that cause infectious diseases present critical issues of national security, public health, and economic welfare.  For example, in recent years, highly pathogenic strains of avian influenza have emerged in Asia, spread through Eastern Europe and threaten to become pandemic. As demonstrated by the coordinated response to Severe Acute Respiratory Syndrome (SARS) and influenza, agents of infectious disease are being addressed via large-scale genomic sequencing.  The goal of genomic sequencing projects are to rapidly put large amounts of data in the public domain to accelerate research on disease surveillance, treatment, and prevention. However, our ability to derive information from large comparative genomic datasets lags far behind acquisition.  Here we review the computational challenges of comparative genomic analyses, specifically sequence alignment and reconstruction of phylogenetic trees.  We present novel analytical results on from two important infectious diseases, Severe Acute Respiratory Syndrome (SARS) and influenza.SARS and influenza have similarities and important differences both as biological and comparative genomic analysis problems.  Influenza viruses (Orthymxyoviridae) are RNA based.  Current evidence indicates that influenza viruses originate in aquatic birds from wild populations. Influenza has been studied for decades via well-coordinated international efforts.  These efforts center on surveillance via antibody characterization of the hemagglutinin (HA) and neuraminidase (N) proteins of the circulating strains to inform vaccine design. However we still do not have a clear understanding of: 1) various transmission pathways such as the role of intermediate hosts such as swine and domestic birds and 2) the key mutation and genomic recombination events that underlie periodic pandemics of influenza.  In the past 30 years, sequence data from HA and N loci has become an important data type. In the past year, full genomic data has become prominent.  These data present exciting opportunities to address unanswered questions in influenza pandemics.SARS is caused by a previously unrecognized lineage of coronavirus, SARS-CoV, which like influenza has an RNA based genome.  Although SARS-CoV is widely believed to have originated in animals there remains disagreement over the candidate animal source that lead to the original outbreak of SARS.  In contrast to the long history of the study of influenza, SARS was only recognized in late 2002 and the virus that causes SARS has been documented primarily by genomic sequencing.In the past, most studies of influenza were performed on a limited number of isolates and genes suited to a particular problem.  Major goals in science today are to understand emerging diseases in broad geographic, environmental, societal, biological, and genomic contexts. Synthesizing diverse information brought together by various researchers is important to find out what can be done to prevent future outbreaks {JON03}.  Thus comprehensive means to organize and analyze large amounts of diverse information are critical.  For example, the relationships of isolates and patterns of genomic change observed in large datasets might not be consistent with hypotheses formed on partial data.  Moreover when researchers rely on partial datasets, they restrict the range of possible discoveries.Phylogenetics is well suited to the complex task of understanding emerging infectious disease. Phylogenetic analyses can test many hypotheses by comparing diverse isolates collected from various hosts, environments, and points in time and organizing these data into various evolutionary scenarios.  The products of a phylogenetic analysis are a graphical tree of ancestor-descendent relationships and an inferred summary of mutations, recombination events, host shifts, geographic, and temporal spread of the viruses.  However, this synthesis comes at a price.  The cost of computation of phylogenetic analysis expands combinatorially as the number of isolates considered increases. Thus, large datasets like those currently produced are commonly considered intractable.  We address this problem with synergistic development of heuristics tree search strategies and parallel computing.Fil: Janies, D.. Ohio State University; Estados UnidosFil: Pol, Diego. Ohio State University; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentin

    Category Theoretic Analysis of Hierarchical Protein Materials and Social Networks

    Get PDF
    Materials in biology span all the scales from Angstroms to meters and typically consist of complex hierarchical assemblies of simple building blocks. Here we describe an application of category theory to describe structural and resulting functional properties of biological protein materials by developing so-called ologs. An olog is like a “concept web” or “semantic network” except that it follows a rigorous mathematical formulation based on category theory. This key difference ensures that an olog is unambiguous, highly adaptable to evolution and change, and suitable for sharing concepts with other olog. We consider simple cases of beta-helical and amyloid-like protein filaments subjected to axial extension and develop an olog representation of their structural and resulting mechanical properties. We also construct a representation of a social network in which people send text-messages to their nearest neighbors and act as a team to perform a task. We show that the olog for the protein and the olog for the social network feature identical category-theoretic representations, and we proceed to precisely explicate the analogy or isomorphism between them. The examples presented here demonstrate that the intrinsic nature of a complex system, which in particular includes a precise relationship between structure and function at different hierarchical levels, can be effectively represented by an olog. This, in turn, allows for comparative studies between disparate materials or fields of application, and results in novel approaches to derive functionality in the design of de novo hierarchical systems. We discuss opportunities and challenges associated with the description of complex biological materials by using ologs as a powerful tool for analysis and design in the context of materiomics, and we present the potential impact of this approach for engineering, life sciences, and medicine.Presidential Early Career Award for Scientists and Engineers (N000141010562)United States. Army Research Office. Multidisciplinary University Research Initiative (W911NF0910541)United States. Office of Naval Research (grant N000141010841)Massachusetts Institute of Technology. Dept. of MathematicsStudienstiftung des deutschen VolkesClark BarwickJacob Luri

    Diabetes Alters Intracellular Calcium Transients in Cardiac Endothelial Cells

    Get PDF
    Diabetic cardiomyopathy (DCM) is a diabetic complication, which results in myocardial dysfunction independent of other etiological factors. Abnormal intracellular calcium ([Ca2+]i) homeostasis has been implicated in DCM and may precede clinical manifestation. Studies in cardiomyocytes have shown that diabetes results in impaired [Ca2+]i homeostasis due to altered sarcoplasmic reticulum Ca2+ ATPase (SERCA) and sodium-calcium exchanger (NCX) activity. Importantly, altered calcium homeostasis may also be involved in diabetes-associated endothelial dysfunction, including impaired endothelium-dependent relaxation and a diminished capacity to generate nitric oxide (NO), elevated cell adhesion molecules, and decreased angiogenic growth factors. However, the effect of diabetes on Ca2+ regulatory mechanisms in cardiac endothelial cells (CECs) remains unknown. The objective of this study was to determine the effect of diabetes on [Ca2+]i homeostasis in CECs in the rat model (streptozotocin-induced) of DCM. DCM-associated cardiac fibrosis was confirmed using picrosirius red staining of the myocardium. CECs isolated from the myocardium of diabetic and wild-type rats were loaded with Fura-2, and UTP-evoked [Ca2+]i transients were compared under various combinations of SERCA, sarcoplasmic reticulum Ca2+ ATPase (PMCA) and NCX inhibitors. Diabetes resulted in significant alterations in SERCA and NCX activities in CECs during [Ca2+]i sequestration and efflux, respectively, while no difference in PMCA activity between diabetic and wild-type cells was observed. These results improve our understanding of how diabetes affects calcium regulation in CECs, and may contribute to the development of new therapies for DCM treatment

    Graph Transformation in Molecular Biology

    Full text link
    In the beginning, one of the main fields of application of graph transformation was biology, and more specifically morphology. Later, however, it was like if the biological applications had been left aside by the graph transformation community, just to be moved back into the mainstream these very last years with a new interest in molecular biology. In this paper, we review several fields of application of graph grammars in molecular biology, including: the modeling higherdimensional structures of biomolecules, the description of biochemical reactions, the analysis of metabolic pathways, and their potential use in computational systems biology

    Disease-Aging Network Reveals Significant Roles of Aging Genes in Connecting Genetic Diseases

    Get PDF
    One of the challenging problems in biology and medicine is exploring the underlying mechanisms of genetic diseases. Recent studies suggest that the relationship between genetic diseases and the aging process is important in understanding the molecular mechanisms of complex diseases. Although some intricate associations have been investigated for a long time, the studies are still in their early stages. In this paper, we construct a human disease-aging network to study the relationship among aging genes and genetic disease genes. Specifically, we integrate human protein-protein interactions (PPIs), disease-gene associations, aging-gene associations, and physiological system–based genetic disease classification information in a single graph-theoretic framework and find that (1) human disease genes are much closer to aging genes than expected by chance; and (2) diseases can be categorized into two types according to their relationships with aging. Type I diseases have their genes significantly close to aging genes, while type II diseases do not. Furthermore, we examine the topological characters of the disease-aging network from a systems perspective. Theoretical results reveal that the genes of type I diseases are in a central position of a PPI network while type II are not; (3) more importantly, we define an asymmetric closeness based on the PPI network to describe relationships between diseases, and find that aging genes make a significant contribution to associations among diseases, especially among type I diseases. In conclusion, the network-based study provides not only evidence for the intricate relationship between the aging process and genetic diseases, but also biological implications for prying into the nature of human diseases

    Linking genes to literature: text mining, information extraction, and retrieval applications for biology

    Get PDF
    Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet
    corecore