Search CORE

2,852 research outputs found

Sensitive Long-Indel-Aware Alignment of Sequencing Reads

Author: Marschall Tobias
Schönhuth Alexander
Publication venue
Publication date: 01/01/2013
Field of study

The tremdendous advances in high-throughput sequencing technologies have made population-scale sequencing as performed in the 1000 Genomes project and the Genome of the Netherlands project possible. Next-generation sequencing has allowed genom-wide discovery of variations beyond single-nucleotide polymorphisms (SNPs), in particular of structural variations (SVs) like deletions, insertions, duplications, translocations, inversions, and even more complex rearrangements. Here, we design a read aligner with special emphasis on the following properties: (1) high sensitivity, i.e. find all (reasonable) alignments; (2) ability to find (long) indels; (3) statistically sound alignment scores; and (4) runtime fast enough to be applied to whole genome data. We compare performance to BWA, bowtie2, stampy and find that our methods is especially advantageous on reads containing larger indels

arXiv.org e-Print Archive

Publications at Bielefeld University

Organellar inheritance in the green lineage: insights from Ostreococcus tauri

Author: Adam Eyre-Walker
Baur
Birky
Bonen
Boynton
Bruen
Correns
De Clerck
Derelle
Duret
Grimsley
Guindon
Gwenael Piganeau
Hasegawa
Hill
Hill
Houliston
Hua
Huang
Hurst
Hutson
Jancek
Kurtz
Larkin
Lewis
Lewontin
Li
MacAlpine
Marin
Marshall
Maréchal
Maynard Smith
McVean
Miyamura
Muller
Nei
Ness
Olson
Piganeau
Posada
Posada
R Development Core Team
Robbens
Rodríguez-Ezpeleta
Romain Blanc-Mathieu
Sager
Sager
Sager
Simpson
Sophie Sanchez-Ferandin
Städler
Sun
Sung
Swofford
Tamura
Tamura
Tsai
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

Along the green lineage (Chlorophyta and Streptophyta), mitochondria and chloroplast are mainly uniparentally transmitted and their evolution is thus clonal. The mode of organellar inheritance in their ancestor is less certain. The inability to make clear phylogenetic inference is partly due to a lack of information for deep branching organisms in this lineage. Here, we investigate organellar evolution in the early branching green alga Ostreococcus tauri using population genomics data from the complete mitochondrial and chloroplast genomes. The haplotype structure is consistent with clonal evolution in mitochondria, while we find evidence for recombination in the chloroplast genome. The number of recombination events in the genealogy of the chloroplast suggests that recombination, and thus biparental inheritance, is not rare. Consistent with the evidence of recombination, we find that the ratio of the number of nonsynonymous to the synonymous polymorphisms per site is lower in chloroplast than in the mitochondria genome. We also find evidence for the segregation of two selfish genetic elements in the chloroplast. These results shed light on the role of recombination and the evolutionary history of organellar inheritance in the green lineage

Crossref

PubMed Central

Sussex Research Online

PolyTB: a genomic variation map for Mycobacterium tuberculosis

Author: Clark T.
Coll F.
Drobniewski F.
Gagneux S.
Glynn J.
Guerra-Assuncao J.
Harris D.
Hill-Cawthorne G.
Martin Nigel
McNerney R.
Pain A.
Parkhill J.
Perdigao J.
Portugal I.
Preston M.
Viveiros M.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool (http://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest

Elsevier - Publisher Connector

LSHTM Research Online

edoc

PubMed Central

Birkbeck Institutional Research Online

SNPredict: A Machine Learning Approach for Detecting Low Frequency Variants in Cancer

Author: Mehra Vatsal
Publication venue: e-Publications@Marquette
Publication date: 01/07/2016
Field of study

Cancer is a genetic disease caused by the accumulation of DNA variants such as single nucleotide changes or insertions/deletions in DNA. DNA variants can cause silencing of tumor suppressor genes or increase the activity of oncogenes. In order to come up with successful therapies for cancer patients, these DNA variants need to be identified accurately. DNA variants can be identified by comparing DNA sequence of tumor tissue to a non-tumor tissue by using Next Generation Sequencing (NGS) technology. But the problem of detecting variants in cancer is hard because many of these variant occurs only in a small subpopulation of the tumor tissue. It becomes a challenge to distinguish these low frequency variants from sequencing errors, which are common in today\u27s NGS methods. Several algorithms have been made and implemented as a tool to identify such variants in cancer. However, it has been previously shown that there is low concordance in the results produced by these tools. Moreover, the number of false positives tend to significantly increase when these tools are faced with low frequency variants. This study presents SNPredict, a single nucleotide polymorphism (SNP) detection pipeline that aims to utilize the results of multiple variant callers to produce a consensus output with higher accuracy than any of the individual tool with the help of machine learning techniques. By extracting features from the consensus output that describe traits associated with an individual variant call, it creates binary classifiers that predict a SNP’s true state and therefore help in distinguishing a sequencing error from a true variant

epublications@Marquette

A comprehensive evaluation of alignment algorithms in the context of RNA-seq.

Author: Friedel Caroline C.
Lindner Robert
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2012
Field of study

Transcriptome sequencing (RNA-Seq) overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the identification of splicing events. In recent years, a large number of such mapping algorithms have been developed, all of which have in common that they require algorithms for aligning a vast number of reads to genomic or transcriptomic sequences. Although the FM-index based aligner Bowtie has become a de facto standard within mapping pipelines, a much larger number of possible alignment algorithms have been developed also including other variants of FM-index based aligners. Accordingly, developers and users of RNA-seq mapping pipelines have the choice among a large number of available alignment algorithms. To provide guidance in the choice of alignment algorithms for these purposes, we evaluated the performance of 14 widely used alignment programs from three different algorithmic classes: algorithms using either hashing of the reference transcriptome, hashing of reads, or a compressed FM-index representation of the genome. Here, special emphasis was placed on both precision and recall and the performance for different read lengths and numbers of mismatches and indels in a read. Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

Open Access LMU

PubMed Central

CLEVER: Clique-Enumerating Variant Finder

Author: Bauer Markus
Canzar Stefan
Costa Ivan
Klau Gunnar
Marschall Tobias
Schliep Alexander
Schönhuth Alexander
Publication venue
Publication date: 01/01/2012
Field of study

Next-generation sequencing techniques have facilitated a large scale analysis of human genetic variation. Despite the advances in sequencing speeds, the computational discovery of structural variants is not yet standard. It is likely that many variants have remained undiscovered in most sequenced individuals. Here we present a novel internal segment size based approach, which organizes all, including also concordant reads into a read alignment graph where max-cliques represent maximal contradiction-free groups of alignments. A specifically engineered algorithm then enumerates all max-cliques and statistically evaluates them for their potential to reflect insertions or deletions (indels). For the first time in the literature, we compare a large range of state-of-the-art approaches using simulated Illumina reads from a fully annotated genome and present various relevant performance statistics. We achieve superior performance rates in particular on indels of sizes 20--100, which have been exposed as a current major challenge in the SV discovery literature and where prior insert size based approaches have limitations. In that size range, we outperform even split read aligners. We achieve good results also on real data where we make a substantial amount of correct predictions as the only tool, which complement the predictions of split-read aligners. CLEVER is open source (GPL) and available from http://clever-sv.googlecode.com.Comment: 30 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

VU Research Portal

CWI's Institutional Repository

Publikationsserver der RWTH Aachen University

Publications at Bielefeld University