6,954 research outputs found

    mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking.

    Get PDF
    Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community

    Recovering complete and draft population genomes from metagenome datasets.

    Get PDF
    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

    Updates in metabolomics tools and resources: 2014-2015

    Get PDF
    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

    Host-linked soil viral ecology along a permafrost thaw gradient

    Get PDF
    Climate change threatens to release abundant carbon that is sequestered at high latitudes, but the constraints on microbial metabolisms that mediate the release of methane and carbon dioxide are poorly understood1,2,3,4,5,6,7. The role of viruses, which are known to affect microbial dynamics, metabolism and biogeochemistry in the oceans8,9,10, remains largely unexplored in soil. Here, we aimed to investigate how viruses influence microbial ecology and carbon metabolism in peatland soils along a permafrost thaw gradient in Sweden. We recovered 1,907 viral populations (genomes and large genome fragments) from 197 bulk soil and size-fractionated metagenomes, 58% of which were detected in metatranscriptomes and presumed to be active. In silico predictions linked 35% of the viruses to microbial host populations, highlighting likely viral predators of key carbon-cycling microorganisms, including methanogens and methanotrophs. Lineage-specific virus/host ratios varied, suggesting that viral infection dynamics may differentially impact microbial responses to a changing climate. Virus-encoded glycoside hydrolases, including an endomannanase with confirmed functional activity, indicated that viruses influence complex carbon degradation and that viral abundances were significant predictors of methane dynamics. These findings suggest that viruses may impact ecosystem function in climate-critical, terrestrial habitats and identify multiple potential viral contributions to soil carbon cycling

    Inference, Orthology, and Inundation: Addressing Current Challenges in the Field of Metagenomics

    Get PDF
    The vast increase in the number of sequenced genomes has irreversibly changed the landscape of the biological sciences and has spawned the current post-genomic era of research. Genomic data have illuminated many adaptation and survival strategies between species and their habitats. Moreover, the analysis of prokaryotic genomic sequences is indispensible for understanding the mechanisms of bacterial pathogens and for subsequently developing effective diagnostics, drugs, and vaccines. Computational strategies for the annotation of genomic sequences are driven by the inference of function from reference genomes. However, the effectiveness of such methods is bounded by the fractional diversity of known genomes. Although metagenomes can reconcile this limitation by offering access to previously intangible organisms, harnessing metagenomic data comes with its own collection of challenges. Since the sequenced environmental fragments of metagenomes do not equate to discrete and fully intact genomes, this prevents the conventional establishment of orthologous relationships that are required for functional inference. Furthermore, the current surge in metagenomic data sets requires the development of compression strategies that can effectively accommodate large data sets that are comprised of multiple sequences and a greater proportion of auxiliary data, such as sequence headers. While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. To address the issues of inference and orthology a novel protocol was developed for the prediction of functional interactions that supports data sources that lack information about orthologous relationships. To address the issue of database inundation, a compression protocol was designed that can differentiate between sequence data and auxiliary data, thereby offering reconciliation between sequence specific and general-purpose compression strategies. By resolving these and other challenges, it becomes possible to extend the potential utility of the emerging field of metagenomics

    MEMOSys: Bioinformatics platform for genome-scale metabolic models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent advances in genomic sequencing have enabled the use of genome sequencing in standard biological and biotechnological research projects. The challenge is how to integrate the large amount of data in order to gain novel biological insights. One way to leverage sequence data is to use genome-scale metabolic models. We have therefore designed and implemented a bioinformatics platform which supports the development of such metabolic models.</p> <p>Results</p> <p>MEMOSys (MEtabolic MOdel research and development System) is a versatile platform for the management, storage, and development of genome-scale metabolic models. It supports the development of new models by providing a built-in version control system which offers access to the complete developmental history. Moreover, the integrated web board, the authorization system, and the definition of user roles allow collaborations across departments and institutions. Research on existing models is facilitated by a search system, references to external databases, and a feature-rich comparison mechanism. MEMOSys provides customizable data exchange mechanisms using the SBML format to enable analysis in external tools. The web application is based on the Java EE framework and offers an intuitive user interface. It currently contains six annotated microbial metabolic models.</p> <p>Conclusions</p> <p>We have developed a web-based system designed to provide researchers a novel application facilitating the management and development of metabolic models. The system is freely available at <url>http://www.icbi.at/MEMOSys</url>.</p

    The Hierarchic treatment of marine ecological information from spatial networks of benthic platforms

    Get PDF
    Measuring biodiversity simultaneously in different locations, at different temporal scales, and over wide spatial scales is of strategic importance for the improvement of our understanding of the functioning of marine ecosystems and for the conservation of their biodiversity. Monitoring networks of cabled observatories, along with other docked autonomous systems (e.g., Remotely Operated Vehicles [ROVs], Autonomous Underwater Vehicles [AUVs], and crawlers), are being conceived and established at a spatial scale capable of tracking energy fluxes across benthic and pelagic compartments, as well as across geographic ecotones. At the same time, optoacoustic imaging is sustaining an unprecedented expansion in marine ecological monitoring, enabling the acquisition of new biological and environmental data at an appropriate spatiotemporal scale. At this stage, one of the main problems for an effective application of these technologies is the processing, storage, and treatment of the acquired complex ecological information. Here, we provide a conceptual overview on the technological developments in the multiparametric generation, storage, and automated hierarchic treatment of biological and environmental information required to capture the spatiotemporal complexity of a marine ecosystem. In doing so, we present a pipeline of ecological data acquisition and processing in different steps and prone to automation. We also give an example of population biomass, community richness and biodiversity data computation (as indicators for ecosystem functionality) with an Internet Operated Vehicle (a mobile crawler). Finally, we discuss the software requirements for that automated data processing at the level of cyber-infrastructures with sensor calibration and control, data banking, and ingestion into large data portals.Peer ReviewedPostprint (published version
    corecore