6,296 research outputs found

    SIMPLE AND FAST ALIGNMENT OF METABOLIC PATHWAYS BY EXPLOITING LOCAL DIVERSITY

    Full text link

    The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity

    Get PDF
    New bioinformatic tools are needed to analyze the growing volume of DNA sequence data. This is especially true in the case of secondary metabolite biosynthesis, where the highly repetitive nature of the associated genes creates major challenges for accurate sequence assembly and analysis. Here we introduce the web tool Natural Product Domain Seeker (NaPDoS), which provides an automated method to assess the secondary metabolite biosynthetic gene diversity and novelty of strains or environments. NaPDoS analyses are based on the phylogenetic relationships of sequence tags derived from polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) genes, respectively. The sequence tags correspond to PKS-derived ketosynthase domains and NRPS-derived condensation domains and are compared to an internal database of experimentally characterized biosynthetic genes. NaPDoS provides a rapid mechanism to extract and classify ketosynthase and condensation domains from PCR products, genomes, and metagenomic datasets. Close database matches provide a mechanism to infer the generalized structures of secondary metabolites while new phylogenetic lineages provide targets for the discovery of new enzyme architectures or mechanisms of secondary metabolite assembly. Here we outline the main features of NaPDoS and test it on four draft genome sequences and two metagenomic datasets. The results provide a rapid method to assess secondary metabolite biosynthetic gene diversity and richness in organisms or environments and a mechanism to identify genes that may be associated with uncharacterized biochemistry

    Positive selection in glycolysis among Australasian stick insects

    Get PDF
    Background: The glycolytic pathway is central to cellular energy production. Selection on individual enzymes within glycolysis, particularly phosphoglucose isomerase (Pgi), has been associated with metabolic performance in numerous organisms. Nonetheless, how whole energy-producing pathways evolve to allow organisms to thrive in different environments and adopt new lifestyles remains little explored. The Lanceocercata radiation of Australasian stick insects includes transitions from tropical to temperate climates, lowland to alpine habitats, and winged to wingless forms. This permits a broad investigation to determine which steps within glycolysis and what sites within enzymes are the targets of positive selection. To address these questions we obtained transcript sequences from seven core glycolysis enzymes, including two Pgi paralogues, from 29 Lanceocercata species. Results: Using maximum likelihood methods a signature of positive selection was inferred in two core glycolysis enzymes. Pgi and Glyceraldehyde 3-phosphate dehydrogenase (Gaphd) genes both encode enzymes linking glycolysis to the pentose phosphate pathway. Positive selection among Pgi paralogues and orthologues predominately targets amino acids with residues exposed to the protein’s surface, where changes in physical properties may alter enzyme performance. Conclusion: Our results suggest that, for Lancerocercata stick insects, adaptation to new stressful lifestyles requires a balance between maintaining cellular energy production, efficiently exploiting different energy storage pools and compensating for stress-induced oxidative damag

    Data-driven modelling of biological multi-scale processes

    Full text link
    Biological processes involve a variety of spatial and temporal scales. A holistic understanding of many biological processes therefore requires multi-scale models which capture the relevant properties on all these scales. In this manuscript we review mathematical modelling approaches used to describe the individual spatial scales and how they are integrated into holistic models. We discuss the relation between spatial and temporal scales and the implication of that on multi-scale modelling. Based upon this overview over state-of-the-art modelling approaches, we formulate key challenges in mathematical and computational modelling of biological multi-scale and multi-physics processes. In particular, we considered the availability of analysis tools for multi-scale models and model-based multi-scale data integration. We provide a compact review of methods for model-based data integration and model-based hypothesis testing. Furthermore, novel approaches and recent trends are discussed, including computation time reduction using reduced order and surrogate models, which contribute to the solution of inference problems. We conclude the manuscript by providing a few ideas for the development of tailored multi-scale inference methods.Comment: This manuscript will appear in the Journal of Coupled Systems and Multiscale Dynamics (American Scientific Publishers

    Analytical Tools and Databases for Metagenomics in the Next-Generation Sequencing Era

    Get PDF
    Metagenomics has become one of the indispensable tools in microbial ecology for the last few decades, and a new revolution in metagenomic studies is now about to begin, with the help of recent advances of sequencing techniques. The massive data production and substantial cost reduction in next-generation sequencing have led to the rapid growth of metagenomic research both quantitatively and qualitatively. It is evident that metagenomics will be a standard tool for studying the diversity and function of microbes in the near future, as fingerprinting methods did previously. As the speed of data accumulation is accelerating, bioinformatic tools and associated databases for handling those datasets have become more urgent and necessary. To facilitate the bioinformatics analysis of metagenomic data, we review some recent tools and databases that are used widely in this field and give insights into the current challenges and future of metagenomics from a bioinformatics perspective.

    Comparing biological networks via graph compression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparison of various kinds of biological data is one of the main problems in bioinformatics and systems biology. Data compression methods have been applied to comparison of large sequence data and protein structure data. Since it is still difficult to compare global structures of large biological networks, it is reasonable to try to apply data compression methods to comparison of biological networks. In existing compression methods, the uniqueness of compression results is not guaranteed because there is some ambiguity in selection of overlapping edges.</p> <p>Results</p> <p>This paper proposes novel efficient methods, CompressEdge and CompressVertices, for comparing large biological networks. In the proposed methods, an original network structure is compressed by iteratively contracting identical edges and sets of connected edges. Then, the similarity of two networks is measured by a compression ratio of the concatenated networks. The proposed methods are applied to comparison of metabolic networks of several organisms, <it>H. sapiens, M. musculus, A. thaliana, D. melanogaster, C. elegans, E. coli, S. cerevisiae,</it> and <it>B. subtilis,</it> and are compared with an existing method. These results suggest that our methods can efficiently measure the similarities between metabolic networks.</p> <p>Conclusions</p> <p>Our proposed algorithms, which compress node-labeled networks, are useful for measuring the similarity of large biological networks.</p

    Marine viruses discovered via metagenomics shed light on viral strategies throughout the oceans

    Get PDF
    Marine viruses are key drivers of host diversity, population dynamics and biogeochemical cycling and contribute to the daily flux of billions of tons of organic matter. Despite recent advancements in metagenomics, much of their biodiversity remains uncharacterized. Here we report a data set of 27,346 marine virome contigs that includes 44 complete genomes. These outnumber all currently known phage genomes in marine habitats and include members of previously uncharacterized lineages. We designed a new method for host prediction based on co-occurrence associations that reveals these viruses infect dominant members of the marine microbiome such as Prochlorococcus and Pelagibacter. A negative association between host abundance and the virus-to-host ratio supports the recently proposed Piggyback-the-Winner model of reduced phage lysis at higher host densities. An analysis of the abundance patterns of viruses throughout the oceans revealed how marine viral communities adapt to various seasonal, temperature and photic regimes according to targeted hosts and the diversity of auxiliary metabolic genes.CAPESCNPqFAPERJCiencia sem fronteiras programUniv Fed Rio de Janeiro, IB, BR-21944970 Rio de Janeiro, BrazilRadboud Univ Nijmegen, Radboud Inst Mol Life Sci, CMBI, Med Ctr, NL-6500 HB Nijmegen, NetherlandsUniv Utrecht, Theoret Biol & Bioinformat, NL-3584 CH Utrecht, NetherlandsSan Diego State Univ, Dept Biol, San Diego, CA 92182 USAUniv Fed Sao Paulo UNIFESP, Dept Ciencias Mar, BR-11070100 Baixada Santista, BrazilNIOZ Royal Netherlands Inst Sea Res, Dept Marine Microbiol & Biogeochem, POB 59, NL-1790 AB Den Burg, NetherlandsUniv Utrecht, POB 59, NL-1790 AB Den Burg, NetherlandsUniv Amsterdam, Dept Aquat Microbiol, IBED, NL-1090 GE Amsterdam, NetherlandsUniv Fed Rio de Janeiro, COPPE, SAGE, BR-21941950 Rio de Janeiro, BrazilUniv Fed Sao Paulo UNIFESP, Dept Ciencias Mar, BR-11070100 Baixada Santista, BrazilCAPESCNPqFAPERJCiencia sem fronteiras program: 864.14.004Web of Scienc

    The leaf transcriptome of fennel (Foeniculum vulgare Mill.) enables characterization of the t-anethole pathway and the discovery of microsatellites and single-nucleotide variants

    Get PDF
    Fennel is a plant species of both agronomic and pharmaceutical interest that is characterized by a shortage of genetic and molecular data. Taking advantage of NGS technology, we sequenced and annotated the first fennel leaf transcriptome using material from four different lines and two different bioinformatic approaches: de novo and genome-guided transcriptome assembly. A reference transcriptome for assembly was produced by combining these two approaches. Among the 79,263 transcripts obtained, 47,775 were annotated using BLASTX analysis performed against the NR protein database subset with 11,853 transcripts representing putative full-length CDS. Bioinformatic analyses revealed 1,011 transcripts encoding transcription factors, mainly from the BHLH, MYB-related, C2H2, MYB, and ERF families, and 6,411 EST-SSR regions. Single-nucleotide variants of SNPs and indels were identified among the 8 samples at a frequency of 0.5 and 0.04 variants per Kb, respectively. Finally, the assembled transcripts were screened to identify genes related to the biosynthesis of t-anethole, a compound well-known for its nutraceutical and medical properties. For each of the 11 genes encoding structural enzymes in the t-anethole biosynthetic pathway, we identified at least one transcript showing a significant match. Overall, our work represents a treasure trove of information exploitable both for marker-assisted breeding and for in-depth studies on thousands of genes, including those involved in t-anethole biosynthesis
    corecore