Search CORE

6 research outputs found

Do Read Errors Matter for Genome Assembly?

Author: Courtade Thomas
Shomorony Ilan
Tse David
Publication venue
Publication date: 31/08/2016
Field of study

While most current high-throughput DNA sequencing technologies generate short reads with low error rates, emerging sequencing technologies generate long reads with high error rates. A basic question of interest is the tradeoff between read length and error rate in terms of the information needed for the perfect assembly of the genome. Using an adversarial erasure error model, we make progress on this problem by establishing a critical read length, as a function of the genome and the error rate, above which perfect assembly is guaranteed. For several real genomes, including those from the GAGE dataset, we verify that this critical read length is not significantly greater than the read length required for perfect assembly from reads without errors.Comment: Submitted to ISIT 201

arXiv.org e-Print Archive

Crossref

A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set

Author: Frey K.
Holtgraewe D.
Huettel B.
Pucker B.
Reinhardt R.
Stadermann K.
Weisshaar B.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

MPG.PuRe

Chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set

Author: Frey Katharina
Holtgräwe Daniela
Huettel Bruno
Pucker Boas
Reinhardt Richard
Stadermann Kai Bernd
Weisshaar Bernd
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

Pucker B, Holtgräwe D, Stadermann KB, et al. Chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set. PLoS One. 2019;14(5): e0216233.In addition to the BAC-based reference sequence of the accession Columbia-0 from the year 2000, several short read assemblies of THE plant model organism Arabidopsis thaliana were published during the last years. Also, a SMRT-based assembly of Landsberg erecta has been generated that identified translocation and inversion polymorphisms between two genotypes of the species. Here we provide a chromosome-arm level assembly of the A. thaliana accession Niederzenz-1 (AthNd-1_v2c) based on SMRT sequencing data. The best assembly comprises 69 nucleome sequences and displays a contig length of up to 16 Mbp. Compared to an earlier Illumina short read-based NGS assembly (AthNd-1_v1), a 75 fold increase in contiguity was observed for AthNd-1_v2c. To assign contig locations independent from the Col-0 gold standard reference sequence, we used genetic anchoring to generate a de novo assembly. In addition, we assembled the chondrome and plastome sequences. Detailed analyses of AthNd-1_v2c allowed reliable identification of large genomic rearrangements between A. thaliana accessions contributing to differences in the gene sets that distinguish the genotypes. One of the differences detected identified a gene that is lacking from the Col-0 gold standard sequence. This de novo assembly extends the known proportion of the A. thaliana pan-genome

Publications at Bielefeld University

Information Theory in Computational Biology: Where We Stand Today

Author: Chanda Pritam
Costa Eduardo
Hu Jie
Sukumar Shravan
Van Hemert John
Walia Rasna
Publication venue: 'MDPI AG'
Publication date: 01/06/2020
Field of study

"A Mathematical Theory of Communication" was published in 1948 by Claude Shannon to address the problems in the field of data compression and communication over (noisy) communication channels. Since then, the concepts and ideas developed in Shannon's work have formed the basis of information theory, a cornerstone of statistical learning and inference, and has been playing a key role in disciplines such as physics and thermodynamics, probability and statistics, computational sciences and biological sciences. In this article we review the basic information theory based concepts and describe their key applications in multiple major areas of research in computational biology-gene expression and transcriptomics, alignment-free sequence comparison, sequencing and error correction, genome-wide disease-gene association mapping, metabolic networks and metabolomics, and protein sequence, structure and interaction analysis

IUPUIScholarWorks

Do Read Errors Matter for Genome Assembly?

Author
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 31/08/2016
Field of study

Crossref