127 research outputs found

    Analytical Tools and Databases for Metagenomics in the Next-Generation Sequencing Era

    Get PDF
    Metagenomics has become one of the indispensable tools in microbial ecology for the last few decades, and a new revolution in metagenomic studies is now about to begin, with the help of recent advances of sequencing techniques. The massive data production and substantial cost reduction in next-generation sequencing have led to the rapid growth of metagenomic research both quantitatively and qualitatively. It is evident that metagenomics will be a standard tool for studying the diversity and function of microbes in the near future, as fingerprinting methods did previously. As the speed of data accumulation is accelerating, bioinformatic tools and associated databases for handling those datasets have become more urgent and necessary. To facilitate the bioinformatics analysis of metagenomic data, we review some recent tools and databases that are used widely in this field and give insights into the current challenges and future of metagenomics from a bioinformatics perspective.

    ๋น„๊ต์œ ์ „์ฒดํ•™์„ ์ด์šฉํ•œ ์„ ์ถฉ์˜ ์„œ๋ธŒํ…”๋กœ๋ฏธ์–ด ์ง„ํ™”์™€ ํ‘œํ˜„ํ˜• ๋ณ€์ด ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ์ƒ๋ช…๊ณผํ•™๋ถ€,2020. 2. ์ด์ค€ํ˜ธ.CB4856 ๊ณ„ํ†ต์˜ ์œ ์ „์ฒด๋ฅผ N2์˜ ํ‘œ์ค€ ์œ ์ „์ฒด์™€ ๋น„๊ตํ•˜์˜€๋‹ค. CB4856 ์œ ์ „์ฒด๋Š” Pacific Biosciences (PacBio) ์‚ฌ์˜ RSII ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•ด ์—ผ๊ธฐ์„œ์—ด ๋ถ„์„์„ ์ง„ํ–‰ํ•˜์˜€๊ณ (80ร—, N50 ๋ฆฌ๋“œ ๊ธธ์ด 11.8 kb), ์ดํ›„ ์œ ์ „์ฒด ์ด์–ด๋ถ™์ด๊ธฐ ๊ณผ์ •์„ ๊ฑฐ์ณ ์—ผ์ƒ‰์ฒด์— ๊ฐ€๊นŒ์šด ์ˆ˜์ค€(76 contigs, N50 contig 2.8 Mb)์œผ๋กœ ์™„์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‘ ์œ ์ „์ฒด๋ฅผ ๋น„๊ตํ•œ ๊ฒฐ๊ณผ 2,694๊ฐœ ์œ ์ „์ž์—์„œ ๊ตฌ์กฐ ๋ณ€์ด๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๊ณ  ๊ทธ ์ค‘ ์ƒ๋‹น์ˆ˜๋Š” ์—ผ์ƒ‰์ฒด ๋ฐ”๊นฅ์ชฝ์— ๋ชฐ๋ ค์žˆ์—ˆ๋‹ค. ์—ผ์ƒ‰์ฒด ๋์— ์ธ์ ‘ํ•œ ์„œ๋ธŒํ…”๋กœ๋ฏธ์–ด(subtelomere) ์ง€์—ญ์€ ๊ฐ€์žฅ ๊ตฌ์กฐ ๋ณ€์ด๊ฐ€ ์‹ฌ๊ฐํ•œ ์ง€์—ญ์œผ๋กœ, ๊ทธ ์ค‘์—๋Š” ์ƒˆ๋กญ๊ฒŒ ์„œ๋ธŒํ…”๋กœ๋ฏธ์–ด๊ฐ€ ์ƒ๊ฒจ๋‚œ ๊ณณ๋„ ์žˆ์—ˆ๋‹ค. 5๋ฒˆ ์—ผ์ƒ‰์ฒด ์˜ค๋ฅธ์ชฝ์˜ ์„œ๋ธŒํ…”๋กœ๋ฏธ์–ด ๊ตฌ์กฐ๋Š” CB4856 ๊ณ„ํ†ต์˜ ์กฐ์ƒ์—์„œ ํ…”๋กœ๋ฏธ์–ด(telomere) ์†์ƒ์ด ์ผ์–ด๋‚ฌ๊ณ , ํ…”๋กœ๋จธ๋ ˆ์ด์ฆˆ(telomerase) ์œ ์ „์ž๊ฐ€ ๋ถ„๋ช… ์กด์žฌํ–ˆ์Œ์—๋„ ๊ทธ ๋Œ€์‹  ๋Œ€์•ˆ์  ํ…”๋กœ๋ฏธ์–ด ์—ฐ์žฅ(Alternative Lengthening of telomeres)์„ ํ†ตํ•ด ์†์ƒ์ด ํšŒ๋ณต๋์œผ๋ฉฐ, ์ดํ›„ ์ ˆ๋‹จ ์œ ๋„ ๋ณต์ œ(break-induced replication)์ด ์ผ์–ด๋‚˜๋ฉด์„œ ์ƒˆ๋กญ๊ฒŒ ์„œ๋ธŒํ…”๋กœ๋ฏธ์–ด๊ฐ€ ํ˜•์„ฑ๋๋‹ค๋Š” ๊ฒƒ์„ ์•”์‹œํ•˜๊ณ  ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ๊ตฌ์กฐ ๋ณ€์ด์™€ ์ƒˆ๋กœ์šด ์„œ๋ธŒํ…”๋กœ๋ฏธ์–ด๋ฅผ ํฌํ•จํ•œ ์ƒ๋‹นํ•œ ์œ ์ „์ฒด ๋ณ€ํ™”๊ฐ€ ํ•œ ์ข… ๋‚ด์—์„œ๋„ ์œ ์ง€๋  ์ˆ˜ ์žˆ๊ณ , ์ด๋Ÿฌํ•œ ๋ณ€ํ™”๊ฐ€ ์ข… ๋‚ด์˜ ์œ ์ „๋‹ค์–‘์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‹ค์Œ์œผ๋กœ, ์˜ˆ์œ๊ผฌ๋งˆ์„ ์ถฉ์˜ ๊ทผ์—ฐ์ข…์ด๋ฉด์„œ๋„ ์„ฑ๋ณ„(์•”์ˆ˜ํ•œ๋ชธ, ์•”์ปท, ์ˆ˜์ปท)๊ณผ ํ–‰๋™(ํŠœ๋ธŒ ๋‹‰ํ…Œ์ด์…˜)์—์„œ ํ™•์—ฐํ•œ ์ฐจ์ด๋ฅผ ๋ณด์ด๋Š” Auanema freiburgensis์™€ Auanema sp. APS14 ๋‘ ์ข…์˜ ์œ ์ „์ฒด ์ดˆ์•ˆ ๋˜ํ•œ ๋ณธ ์—ฐ๊ตฌ์—์„œ ๋ถ„์„๋๋‹ค. A. freiburgensis์™€ Auanema sp. APS14์˜ ์œ ์ „์ฒด๋Š” ๊ฐ๊ฐ PacBio RSII (270ร—, N50 ๋ฆฌ๋“œ ๊ธธ์ด 12.5 kb)์™€ Oxford Nanopore Technologies (ONT) ์‚ฌ์˜ MinION (113ร—, N50 ๋ฆฌ๋“œ ๊ธธ์ด 3.6 kb)์„ ํ†ตํ•ด ์—ผ๊ธฐ์„œ์—ด ์ด ๋ถ„์„๋์œผ๋ฉฐ, ์œ ์ „์ฒด ์ด์–ด๋ถ™์ด๊ธฐ ๊ฒฐ๊ณผ ์˜ˆ์œ๊ผฌ๋งˆ์„ ์ถฉ(~100 Mb)์— ๋น„ํ•ด ์œ ์ „์ฒด ํฌ๊ธฐ ๋˜ํ•œ ์ƒ๋‹นํžˆ ์ž‘๋‹ค๋Š” ๊ฒƒ(๊ฐ๊ฐ 55 Mb์™€ 69 Mb) ๋˜ํ•œ ํ™•์ธ๋˜์—ˆ๋‹ค. ์ด ๋‘ ์œ ์ „์ฒด๋Š” ์–ด๋–ป๊ฒŒ ์œ ์ „์ฒด ๋‚ด์— ์ƒ๊ธด ๋ณ€ํ™”๊ฐ€ ์ƒˆ๋กœ์šด ํ˜•์งˆ์˜ ์ง„ํ™”์— ์˜ํ–ฅ์„ ์ค„ ์ˆ˜ ์žˆ์—ˆ์„์ง€ ์ดํ•ดํ•˜๋Š” ๋ฐ์— ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๋‚ด๋‹ค๋ณธ๋‹ค.Long-read sequencing technologies have contributed greatly to comparative genomics among species and can also be applied to study genomics within a species. In this study, to determine how substantial genomic changes are generated and tolerated within a species, a C. elegans strain, CB4856, was sequenced which is one of the most genetically divergent strains compared to the N2 reference strain. For this comparison, the Pacific Biosciences (PacBio) RSII platform (80ร—, N50 read length 11.8 kb) was used and de novo genome assembly were generated to the level of pseudochromosomes containing 76 contigs (N50 contig = 2.8 Mb). I identified structural variations that affected as many as 2,694 genes, most of which are at chromosome arms. Subtelomeric regions contained the most extensive genomic rearrangements, which even created new subtelomeres in some cases. The subtelomere structure of Chromosome VR implies that ancestral telomere damage was repaired by alternative lengthening of telomeres even in the presence of a functional telomerase gene and that a new subtelomere was formed by break-induced replication. My study demonstrates that substantial genomic changes including structural variations and new subtelomeres can be tolerated within a species, and that these changes may accumulate genetic diversity within a species. Secondly, I also assembled draft genomes of two C. elegans relative species, Auanema freiburgensis and Auanema sp. APS14, which have and a distinct reproductive (three genders; male, female, and hermaphrodite) and behavioral repertoire (tube-nictation). A. freiburgensis and Auanema sp. APS14 were sequenced using the PacBio RSII (270ร—, N50 read length 12.5 kb) and the Oxford Nanopore Technologies (ONT) MinION platforms (113ร—, N50 read length 3.6 kb), respectively, and their reads were assembled as smaller genomes (55 and 69 Mb, respectively) compared to that of C. elegans (~100 Mb). Comparative genomic studies of these genomes will help understand how genomic changes in close relative species affect evolution of novel traits.Chapter 1. Introduction 1 Long-read sequencing and de novo genome assembly 2 Caenorhabditis and Caenorhabditis elegans as a model system for comparative genomics 2 Repetitive nature of subtelomere and the trace of alternative lengthening of telomeres (ALT) in subtelomeric regions 3 Phenotypic diversity in the genus Auanema 4 Purposes of the study 6 Materials and Methods 7 Chapter II. De novo genome assembly of the CB4856 genome and subtelomere evolution via past ALT events in C. elegans 17 Part I. De novo genome assembly of the CB4856 genome and structural variants compared to the reference strain, N2 18 Long-read sequencing and de novo assembly of the CB4856 genome 18 Long-read sequencing identified new structural variations 19 Part II. Subtelomere evolution via past ALT events in C. elegans 21 Long-read sequencing revealed the hypervariable nature of subtelomeres 21 The structure of Chr VR subtelomere is unique, in consequence of past ALT and BIR events 21 New genes in the subtelomeric region 22 Chapter III. Phenotypic characterization of Korean nematodes and draft genome assembly of two Auanema species 24 Korean nematode collection 25 Phenotypic diversification in the genus Auanema 25 Highly contiguous genome assembly using two long-read sequencing technologies 26 Chapter IV. Discussion 28 Enrichment of genetic variations in chromosome arms and subtelomeres by background selection and error-prone recombination 29 New subtelomere formation by ALT and BIR 30 References 78 Abstract in Korean 87 Acknowledgement 88Docto

    Biased Gene Fractionation and Dominant Gene Expression among the Subgenomes of Brassica rapa

    Get PDF
    Polyploidization, both ancient and recent, is frequent among plants. A โ€œtwo-step theory" was proposed to explain the meso-triplication of the Brassica โ€œA" genome: Brassica rapa. By accurately partitioning of this genome, we observed that genes in the less fractioned subgenome (LF) were dominantly expressed over the genes in more fractioned subgenomes (MFs: MF1 and MF2), while the genes in MF1 were slightly dominantly expressed over the genes in MF2. The results indicated that the dominantly expressed genes tended to be resistant against gene fractionation. By re-sequencing two B. rapa accessions: a vegetable turnip (VT117) and a Rapid Cycling line (L144), we found that genes in LF had less non-synonymous or frameshift mutations than genes in MFs; however mutation rates were not significantly different between MF1 and MF2. The differences in gene expression patterns and on-going gene death among the three subgenomes suggest that โ€œtwo-step" genome triplication and differential subgenome methylation played important roles in the genome evolution of B. rapa

    Annotation of marine eukaryotic genomes

    Get PDF

    One Tile to Rule Them All: Simulating Any Tile Assembly System with a Single Universal Tile

    Get PDF
    In the classical model of tile self-assembly, unit square tiles translate in the plane and attach edgewise to form large crystalline structures. This model of self-assembly has been shown to be capable of asymptotically optimal assembly of arbitrary shapes and, via information-theoretic arguments, increasingly complex shapes necessarily require increasing numbers of distinct types of tiles. We explore the possibility of complex and efficient assembly using systems consisting of a single tile. Our main result shows that any system of square tiles can be simulated using a system with a single tile that is permitted to flip and rotate. We also show that systems of single tiles restricted to translation only can simulate cellular automata for a limited number of steps given an appropriate seed assembly, and that any longer-running simulation must induce infinite assembly

    Patterns and Signals of Biology: An Emphasis On The Role of Post Translational Modifications in Proteomes for Function and Evolutionary Progression

    Get PDF
    After synthesis, a protein is still immature until it has been customized for a specific task. Post-translational modifications (PTMs) are steps in biosynthesis to perform this customization of protein for unique functionalities. PTMs are also important to protein survival because they rapidly enable protein adaptation to environmental stress factors by conformation change. The overarching contribution of this thesis is the construction of a computational profiling framework for the study of biological signals stemming from PTMs associated with stressed proteins. In particular, this work has been developed to predict and detect the biological mechanisms involved in types of stress response with PTMs in mitochondrial (Mt) and non-Mt protein. Before any mechanism can be studied, there must first be some evidence of its existence. This evidence takes the form of signals such as biases of biological actors and types of protein interaction. Our framework has been developed to locate these signals, distilled from โ€œBig Dataโ€ resources such as public databases and the the entire PubMed literature corpus. We apply this framework to study the signals to learn about protein stress responses involving PTMs, modification sites (MSs). We developed of this framework, and its approach to analysis, according to three main facets: (1) by statistical evaluation to determine patterns of signal dominance throughout large volumes of data, (2) by signal location to track down the regions where the mechanisms must be found according to the types and numbers of associated actors at relevant regions in protein, and (3) by text mining to determine how these signals have been previously investigated by researchers. The results gained from our framework enable us to uncover the PTM actors, MSs and protein domains which are the major components of particular stress response mechanisms and may play roles in protein malfunction and disease

    Bioinformatic Tools for Next Generation DNA Sequencing:Development and Analysis of Model Systems

    Get PDF

    Structual variation detection in the human genome

    Get PDF
    Thesis advisor: Gabor T. MarthStructural variations (SVs), like single nucleotide polymorphisms (SNPs) and short insertion-deletion polymorphisms (INDELs), are a ubiquitous feature of genomic sequences and are major contributors to human genetic diversity and disease. Due to technical difficulties, i.e. the high data-acquisition cost and/or low detection resolution of previous genome-scanning technologies, this source of genetic variation has not been well studied until the completion of the Human Genome Project and the emergence of next-generation sequencing (NGS) technologies. The assembly of the human genome and economical high-throughput sequencing technologies enable the development of numerous new SV detection algorithms with unprecedented accuracy, sensitivity and precision. Although a number of SV detection programs have been developed for various SV types, such as copy number variations, deletions, tandem duplications, inversions and translocations, some types of SVs, e.g. copy number variations (CNVs) in capture sequencing data and mobile element insertions (MEIs) have undergone limited study. This is a result of the lack of suitable statistical models and computational approaches, e.g. efficient mapping method to handle multiple aligned reads from mobile element (ME) sequences. The focus of my dissertation was to identify and characterize CNVs in capture sequencing data and MEI from large-scale whole-genome sequencing data. This was achieved by building sophisticated statistical models and developing efficient algorithms and analysis methods for NGS data. In Chapter 2, I present a novel algorithm that uses the read depth (RD) signal to detect CNVs in deep-coverage exon capture sequencing data that are originally designed for SNPs discovery. We were one of the early pioneers to tackle this problem. In Chapter 3, I present a fast, convenient and memory-efficient program, Tangram, that integrates read-pair (RP) and split-read (SR) signals to detect and genotype MEI events. Based on the results from both simulated and experimental data, Tangram has superior sensitivity, specificity, breakpoint resolution and genotyping accuracy, when compared to other recently published MEI detection methods. Lastly, Chapter 4 summarizes my work for SV detection in human genomes during my PhD study and describes the future direction of genetic variant researches.Thesis (PhD) โ€” Boston College, 2013.Submitted to: Boston College. Graduate School of Arts and Sciences.Discipline: Biology
    • โ€ฆ
    corecore