8,009 research outputs found

    메타게놈 데이터 분석을 위한 통계적 방법론 비교

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 자연과학대학 협동과정 생물정보학전공, 2018. 2. 박태성.A comparison study of statistical methods for the analysis of metagenome data Chanyoung Lee Interdisciplinary Program in Bioinformatics The Graduate School Seoul National University With the advent of next-generation sequencing (NGS) technology, sequencing microorganisms from varied samples facilitates association analysis between feature and environment. Several statistical methods have been proposed for analyzing metagenome data such as Metastats, metagenomeSeq, ZIBSeq, ANCOM, edgeR, and DESeq2. Each method has assumed its own specific distribution and model assumptions. While there have been some comparative studies on these methods, the comparison is rather limited and the results have been varied depending on how to generate simulation datasets. In this study, we systematically investigate the properties of these statistical methods for finding differentially abundant features (DAF). In addition, centered log-ratio transformation and permutation logistic regression model (CLR Perm) were applied to metagenome data. We compare their performances using simulation data generated from the Human Microbiome Project (HMP). We first assessed the type I error rate of each method over different levels of sparsity. CLR Perm, metagenomeSeq and ANCOM methods yielded well preserved type I error rates regardless of sparsity. In the power comparison study, CLR Perm showed the highest power among the methods preserving type I error. Furthermore, we applied the methods to real data on colorectal cancer (CRC) to compare our results with existing taxonomic markers of CRC. In conclusion, we recommend using a combination of CLR Perm and metagenomeSeq for the analysis of metagenome data because there are differences in the list of significant taxa discovered by CLR Perm and metagenomeSeq.1 Introduction 1 2 Material and Methods 6 2.1 Simulation materials (HMP) 6 2.2 Colorectal cancer data 8 2.3 Existing methods 11 2.4 Permutation logistic regression with centered log-ratio transformation (CLR Perm) 14 3 Simulation 17 3.1 Simulation model 17 3.2 Power and type I error rate 18 4 Results 22 4.1 Simulation results 22 4.2 Colorectal cancer data results 26 5 Discussion 33 Bibliography 36Maste

    Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand.</p> <p>Results</p> <p>The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (<b>RAMMCAP</b>) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes".</p> <p>Conclusion</p> <p>RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from <url>http://tools.camera.calit2.net/camera/rammcap/</url>.</p

    Environmental shaping of codon usage and functional adaptation across microbial communities.

    Get PDF
    Microbial communities represent the largest portion of the Earth's biomass. Metagenomics projects use high-throughput sequencing to survey these communities and shed light on genetic capabilities that enable microbes to inhabit every corner of the biosphere. Metagenome studies are generally based on (i) classifying and ranking functions of identified genes; and (ii) estimating the phyletic distribution of constituent microbial species. To understand microbial communities at the systems level, it is necessary to extend these studies beyond the species' boundaries and capture higher levels of metabolic complexity. We evaluated 11 metagenome samples and demonstrated that microbes inhabiting the same ecological niche share common preferences for synonymous codons, regardless of their phylogeny. By exploring concepts of translational optimization through codon usage adaptation, we demonstrated that community-wide bias in codon usage can be used as a prediction tool for lifestyle-specific genes across the entire microbial community, effectively considering microbial communities as meta-genomes. These findings set up a 'functional metagenomics' platform for the identification of genes relevant for adaptations of entire microbial communities to environments. Our results provide valuable arguments in defining the concept of microbial species through the context of their interactions within the community

    Comparative metagenomic analysis reveals mechanisms for stress response in hypoliths from extreme hyperarid deserts

    Get PDF
    Understanding microbial adaptation to environmental stressors is crucial for interpreting broader ecological patterns. In the most extreme hot and cold deserts, cryptic niche communities are thought to play key roles in ecosystem processes and represent excellent model systems for investigating microbial responses to environmental stressors. However, relatively little is known about the genetic diversity underlying such functional processes in climatically extreme desert systems. This study presents the first comparative metagenome analysis of cyanobacteria-dominated hypolithic communities in hot (Namib Desert, Namibia) and cold (Miers Valley, Antarctica) hyperarid deserts. The most abundant phyla in both hypolith metagenomes were Actinobacteria, Proteobacteria, Cyanobacteria and Bacteroidetes with Cyanobacteria dominating in Antarctic hypoliths. However, no significant differences between the twometagenomeswere identified. The Antarctic hypolithicmetagenome displayed a high number of sequences assigned to sigma factors, replication,recombination andrepair, translation, ribosomal structure,andbiogenesis. In contrast, theNamibDesert metagenome showed a high abundance of sequences assigned to carbohydrate transport and metabolism. Metagenome data analysis also revealed significantdivergence inthe geneticdeterminantsof aminoacidandnucleotidemetabolismbetween these two metagenomes and those of soil from other polar deserts, hot deserts, and non-desert soils. Our results suggest extensive niche differentiation in hypolithic microbial communities from these two extreme environments and a high genetic capacity for survival under environmental extremes.Fil: Le, Phuong Thi. University of Pretoria; Sudáfrica. Vlaams Instituut voor Biotechnologie; Bélgica. University of Ghent; BélgicaFil: Makhalanyane, Thulani P.. University of Pretoria; SudáfricaFil: Guerrero, Leandro Demián. University of Pretoria; Sudáfrica. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Investigaciones en Ingeniería Genética y Biología Molecular "Dr. Héctor N. Torres"; ArgentinaFil: Vikram, Surendra. University of Pretoria; SudáfricaFil: Van De Peer, Yves. University of Pretoria; Sudáfrica. Vlaams Instituut voor Biotechnologie; Bélgica. University of Ghent; BélgicaFil: Cowan, Don A.. University of Pretoria; Sudáfric

    Methodology and ontology in microbiome research

    Get PDF
    Research on the human microbiome has gen- erated a staggering amount of sequence data, revealing variation in microbial diversity at the community, species (or phylotype), and genomic levels. In order to make this complexity more manageable and easier to interpret, new units—the metagenome, core microbiome, and entero- type—have been introduced in the scientific literature. Here, I argue that analytical tools and exploratory statisti- cal methods, coupled with a translational imperative, are the primary drivers of this new ontology. By reducing the dimensionality of variation in the human microbiome, these new units render it more tractable and easier to interpret, and hence serve an important heuristic role. Nonetheless, there are several reasons to be cautious about these new categories prematurely ‘‘hardening’’ into natural units: a lack of constraints on what can be sequenced metagenomically, freedom of choice in taxonomic level in defining a ‘‘core microbiome,’’ typological framing of some of the concepts, and possible reification of statistical constructs. Finally, lessons from the Human Genome Project have led to a translational imperative: a drive to derive results from the exploration of microbiome variation that can help to articulate the emerging paradigm of per- sonalized genomic medicine (PGM). There is a tension between the typologizing inherent in much of this research and the personal in PGM

    Recovering complete and draft population genomes from metagenome datasets.

    Get PDF
    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

    Mobile resistome of human gut and pathogen drives anthropogenic bloom of antibiotic resistance

    Get PDF
    BACKGROUND:The impact of human activities on the environmental resistome has been documented in many studies, but there remains the controversial question of whether the increased antibiotic resistance observed in anthropogenically impacted environments is just a result of contamination by resistant fecal microbes or is mediated by indigenous environmental organisms. Here, to determine exactly how anthropogenic influences shape the environmental resistome, we resolved the microbiome, resistome, and mobilome of the planktonic microbial communities along a single river, the Han, which spans a gradient of human activities. RESULTS:The bloom of antibiotic resistance genes (ARGs) was evident in the downstream regions and distinct successional dynamics of the river resistome occurred across the spatial continuum. We identified a number of widespread ARG sequences shared between the river, human gut, and pathogenic bacteria. These human-related ARGs were largely associated with mobile genetic elements rather than particular gut taxa and mainly responsible for anthropogenically driven bloom of the downstream river resistome. Furthermore, both sequence- and phenotype-based analyses revealed environmental relatives of clinically important proteobacteria as major carriers of these ARGs. CONCLUSIONS:Our results demonstrate a more nuanced view of the impact of anthropogenic activities on the river resistome: fecal contamination is present and allows the transmission of ARGs to the environmental resistome, but these mobile genes rather than resistant fecal bacteria proliferate in environmental relatives of their original hosts. Video abstract

    A Robust and Universal Metaproteomics Workflow for Research Studies and Routine Diagnostics Within 24 h Using Phenol Extraction, FASP Digest, and the MetaProteomeAnalyzer

    Get PDF
    The investigation of microbial proteins by mass spectrometry (metaproteomics) is a key technology for simultaneously assessing the taxonomic composition and the functionality of microbial communities in medical, environmental, and biotechnological applications. We present an improved metaproteomics workflow using an updated sample preparation and a new version of the MetaProteomeAnalyzer software for data analysis. High resolution by multidimensional separation (GeLC, MudPIT) was sacrificed to aim at fast analysis of a broad range of different samples in less than 24 h. The improved workflow generated at least two times as many protein identifications than our previous workflow, and a drastic increase of taxonomic and functional annotations. Improvements of all aspects of the workflow, particularly the speed, are first steps toward potential routine clinical diagnostics (i.e., fecal samples) and analysis of technical and environmental samples. The MetaProteomeAnalyzer is provided to the scientific community as a central remote server solution at www.mpa.ovgu.de.Peer Reviewe
    corecore