6 research outputs found

    MetaMine – A tool to detect and analyse gene patterns in their environmental context

    Get PDF
    Background Modern sequencing technologies allow rapid sequencing and bioinformatic analysis of genomes and metagenomes. With every new sequencing project a vast number of new proteins become available with many genes remaining functionally unclassified based on evidences from sequence similarities alone. Extending similarity searches with gene pattern approaches, defined as genes sharing a distinct genomic neighbourhood, have shown to significantly improve the number of functional assignments. Further functional evidences can be gained by correlating these gene patterns with prevailing environmental parameters. MetaMine was developed to approach the large pool of unclassified proteins by searching for recurrent gene patterns across habitats based on key genes. Results MetaMine is an interactive data mining tool which enables the detection of gene patterns in an environmental context. The gene pattern search starts with a user defined environmentally interesting key gene. With this gene a BLAST search is carried out against the Microbial Ecological Genomics DataBase (MEGDB) containing marine genomic and metagenomic sequences. This is followed by the determination of all neighbouring genes within a given distance and a search for functionally equivalent genes. In the final step a set of common genes present in a defined number of distinct genomes is determined. The gene patterns found are associated with their individual pattern instances describing gene order and directions. They are presented together with information about the sample and the habitat. MetaMine is implemented in Java and provided as a client/server application with a user-friendly graphical user interface. The system was evaluated with environmentally relevant genes related to the methane-cycle and carbon monoxide oxidation. Conclusion MetaMine offers a targeted, semi-automatic search for gene patterns based on expert input. The graphical user interface of MetaMine provides a user-friendly overview of the computed gene patterns for further inspection in an ecological context. Prevailing biological processes associated with a key gene can be used to infer new annotations and shape hypotheses to guide further analyses. The use-cases demonstrate that meaningful gene patterns can be quickly detected using MetaMine

    Data integration for marine ecological genomics

    Get PDF

    A Primer on Metagenomics

    Get PDF
    Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics

    차세대 염기서열 분석 장비로 생성한 메타지놈 데이터 분석을 위한 최적의 생물정보학 시스템 개발

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 협동과정 생물정보학전공, 2014. 2. 천종식.Metagenome is total DNA directly extracted from environment, and the purpose of metagenomics is to reveal the function of the metagenome as well as the taxonomic structure in the metagenome. There are two analysis approaches for metagenomics, namely amplicon based approach and random shotgun based approach. Both approaches require large scale sequencing reads which could not be satisfied through Sanger sequencing. However, high throughput sequencing of reads at relatively low cost by Next Generation Sequencing (NGS) technologies meets the requirement of metagenomics. In addition, the advent of NGS technologies gave rise to the development of bioinformatic algorithms necessary for processing this large and complex sequencing data. Consequently, the large amount of sequencing data obtained from NGS and corresponding proper bioinformatic algorithms facilitated the metagenomics to become essential tool for microbiology. However, limitations incurred by NGS sequencing errors, short read length, and lack of analysis system still hinder accurate metagenome analysis. Therefore, evaluation of currently used NGS error handling algorithms and development of systematic pipeline with more efficient algorithms are required to improve the accuracy of analysis. In this study, bioinformatic pipelines were constructed for both metagenome analysis approaches. The pipelines were dedicated to improve the accuracy of the final end result by minimizing the effect of errors and short read length. For the amplicon based metagenomics, two different analysis pipelines were developed for both 454 pyrosequencing and Illumina MiSeq. During the construction of 454 pyrosequencing pipeline, new error handling algorithm was developed to treat homo-polymer and PCR errors. Upon completion of the pipeline construction, household microbial community was analyzed using 454 pyrosequencing data as a case study. As for Illumina MiSeq data, the most appropriate sequencing conditions and sequencing target region were settled. Paired end merging programs were evaluated and correlation of the sequencing errors and quality was studied to correct the errors within 3 overlap regions. Novel iterative consensus clustering method was developed to correct the errors occurring ubiquitously in a single read. For shotgun metagenomics approach, bioinformatic analysis system for Illumina MiSeq paired end data was constructed. Unlike the targeted amplicon sequencing reads, most of the shotgun sequencing reads are not mergedthus short reads are used for both functional and taxonomical profiling. However, a short read has less information than longer contigs, so the use of short reads is likely to cause biased characterization of the metagenome. Therefore, the development of analysis system did focus on creating longer contigs by means of mapping and de novo assembly. For raw read mapping, a dynamic mapping genome set construction method was developed. A list of mapping genomes was selected from the taxonomic profile inferred from the ribosomal RNA profiles. The genome sequence of the selected genomes were downloaded from Ezbiocloud. By mapping raw reads to the genome sequences, the longer contigs can be obtained in case of the relatively simple metagenome such as fecal matter. However in case of the complex metagenomes such as soil sample, both mapping and de novo assembly did not perform properly due to a lack of sequencing coverage and numerousity of uncultured microorganisms in the metagenome. In addition to the pipeline construction, visualization tools were also developed to display resultant taxonomic and functional profile at the same time. Newly developed JAVA-based standalone sequence alignment editing application was named as EzEditor. As both, conserved functional coding sequences and 16S rRNA gene have been used copiously in bacterial molecular phylogenetics, the codon-based sequence alignment editing functions are required for the coding genes. EzEditor provides simultaneous DNA and protein sequence alignment editing interface which enables us with the robust sequence alignment for both protein and rRNA sequences. EzEditor can be applied to various molecular sequence involved analysis not only as a basic sequence editor but also for phylogenetic application.ABSTRACT I TABLE OF CONTENTS IV ABBREVIATIONS VI FIGURE LIST VII TABLE LIST XII Chapter 1 General Introduction 1 1.1 Bioinformatics 2 1.2 Next Generation Sequencing 5 1.3 Metagenomics 11 1.4 Objectives of This Study 21 Chapter 2 Amplicon-based Metagenome Analysis Systems 23 2.1 Introduction 24 2.2 Analysis System for 454 Pyrosequencing 35 2.2.1 Methods 36 2.2.2 Results 39 2.3 Analysis System for Illumina MiSeq 60 2.3.1 Methods 62 2.3.2 Results 68 2.4 Summary and Discussion 93 Chapter 3 Shotgun-based Metagenome Analysis System 99 3.1 Introduction 100 3.1.1 Tools for Metagenomics 101 3.2 Methods 118 3.3 Results 125 3.4 Summary and Discussion 165 Chapter 4 EzEditor: A versatile Molecular Sequence Editor for Both Ribosomal RNA and Protein Coding Genes 169 4.1 Overview 170 4.2 Features of EzEditor 172 4.2.1 Algorithms and Models Implemented in EzEditor 177 4.2.2 Miscellaneous Functions 178 4.3 Summary and Discussion 181 Conclusions 183 References 187 APPENDIX I. Estimated Diversity Index of Household Microbiome 217 국문 초록 (Abstract in Korean) 221Docto
    corecore