Search CORE

16 research outputs found

Local alignment of two-base encoded DNA sequence

Author: A Izmailov
A Izmailov
B Ewing
B Ewing
B Ma
Barry Merriman
DR Powell
DR Smith
DS Hirschberg
EW Myers
H Li
N Jones
Nils Homer
O Gotoh
R Hamming
R Li
S Levy
SB Needleman
SF Altschul
SM Rumble
ST Sherry
Stanley F Nelson
TF Smith
VI Levenshtein
W Ewans
WJ Kent
X Huang
Z Ning
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data.</p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ParMap, an Algorithm for the Identification of Complex Genomic Variations in Nextgen Sequencing Data

Author: Adolfo A. Ferrando
Hossein Khiabanian
Pieter Van Vlierberghe
Raul Rabadan
Teresa Palomero
Publication venue
Publication date: 08/01/2010
Field of study

Next-generation sequencing produces high-throughput data, albeit with greater error and shorter reads than traditional Sanger sequencing methods. This complicates the detection of genomic variations, especially, small insertions and deletions. Here we describe ParMap, a statistical algorithm for the identification of complex genetic variants using partially mapped reads in nextgen sequencing data. We also report ParMap’s successful application to the mutation analysis of chromosome X exome-captured leukemia DNA samples

Nature Precedings

Detection of microRNAs in color space

Author: Altschul
Antonio Marco
Applied
Applied
Bartel
Berezikov
Cai
Chen
Cloonan
David
Flicek
Friedlander
Goff
Hofacker
Homer
Kozomara
Langmead
Li
Li
Li
Lowe
Marco
Mardis
Morin
Moxon
Robin
Ruby
Rumble
Sam Griffiths-Jones
Sasson
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/12/2011
Field of study

MotivationDeep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs.ResultsHere we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs.Availability and implementationA bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/[email protected] informationSupplementary data are available at Bioinformatics online

University of Essex Research Repository

Crossref

PubMed Central

The University of Manchester - Institutional Repository

ParMap, an algorithm for the identification of small genomic insertions and deletions in nextgen sequencing data

Author: A Gnirke
Adolfo A Ferrando
AV Dalca
Hossein Khiabanian
J Shendure
JD McPherson
KJ McKernan
N Homer
P Medvedev
P Van Vlierberghe
Pieter Van Vlierberghe
Raul Rabadan
RM Kuhn
SM Rumble
Teresa Palomero
Publication venue: BioMed Central
Publication date: 01/05/2010
Field of study

Abstract Background Next-generation sequencing produces high-throughput data, albeit with greater error and shorter reads than traditional Sanger sequencing methods. This complicates the detection of genomic variations, especially, small insertions and deletions. Findings Here we describe ParMap, a statistical algorithm for the identification of complex genetic variants, such as small insertion and deletions, using partially mapped reads in nextgen sequencing data. Conclusions We report ParMap's successful application to the mutation analysis of chromosome X exome-captured leukemia DNA samples.</p

Crossref

Directory of Open Access Journals

PubMed Central

Whole Methylome Analysis by Ultra-Deep Sequencing Using Two-Base Encoding

Author: Barker Melissa
Bormann Chung Christina A.
Boyd Victoria L.
Fu Yutao
McKernan Kevin J.
Monighetti Cinna
Peckham Heather E.
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Methylation, the addition of methyl groups to cytosine (C), plays an important role in the regulation of gene expression in both normal and dysfunctional cells. During bisulfite conversion and subsequent PCR amplification, unmethylated Cs are converted into thymine (T), while methylated Cs will not be converted. Sequencing of this bisulfite-treated DNA permits the detection of methylation at specific sites. Through the introduction of next-generation sequencing technologies (NGS) simultaneous analysis of methylation motifs in multiple regions provides the opportunity for hypothesis-free study of the entire methylome. Here we present a whole methylome sequencing study that compares two different bisulfite conversion methods (in solution versus in gel), utilizing the high throughput of the SOLiD™ System. Advantages and disadvantages of the two different bisulfite conversion methods for constructing sequencing libraries are discussed. Furthermore, the application of the SOLiD™ bisulfite sequencing to larger and more complex genomes is shown with preliminary in silico created bisulfite converted reads

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Transcriptomics of an extended phenotype: Parasite manipulation of wasp social behaviour shifts expression of caste-related genes

Author: Beani Laura
Geffre Amy C
Grozinger Christina M.
Kathirithamby Jeyaraney
Liu Ruolin
Manfredini Fabio
Toth Amy L.
Publication venue: 'The Royal Society'
Publication date: 01/01/2017
Field of study

Parasites can manipulate host behaviour to increase their own transmission and fitness, but the genomic mechanisms by which parasites manipulate hosts are not well understood. We investigated the relationship between the social paper wasp, Polistes dominula, and its parasite, Xenos vesparum (Insecta: Strepsiptera) to understand the effects of an obligate endoparasitoid on its host’s brain transcriptome. Previous research suggests that X. vesparum shifts aspects of host social caste-related behaviour and physiology in ways that benefit the parasitoid. We hypothesized that X. vesparum-infested (stylopized) females would show a shift in caste-related brain gene expression. Specifically, we predicted stylopized females, who would normally be workers, would show gene expression patterns resembling pre-overwintering queens (gynes), reflecting gyne-like changes in behaviour. We used RNA-sequencing data to characterize patterns of brain gene expression in stylopized females, and compared these to those of unstylopized workers and gynes. In support of our hypothesis, we found that stylopized females, despite sharing numerous physiological and life history characteristics with members of the worker caste, show gyne-shifted brain expression patterns. These data suggest the parasitoid affects its host by exploiting phenotypic plasticity related to social caste, thus shifting naturally occurring social behaviour in a way that is beneficial to the parasitoid

Florence Research

Oxford University Research Archive

Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA

Author: Homer Nils
Nelson Stanley F
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

A primary component of next-generation sequencing analysis is to align short reads to a reference genome, with each read aligned independently. However, reads that observe the same non-reference DNA sequence are highly correlated and can be used to better model the true variation in the target genome. A novel short-read micro re-aligner, SRMA, that leverages this correlation to better resolve a consensus of the underlying DNA sequence of the targeted genome is described here

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Local alignment of generalized k-base encoded DNA sequence

Author: ABI
ABI
Barry Merriman
D Smith
DJ Lipman
H Li
MJ Clark
N Homer
N Homer
Nils Homer
O Gotoh
R Hamming
S Needleman
SM Rumble
Stanley F Nelson
T Smith
W Kent
WR Pearson
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain technical errors, and therefore encoded sequencing errors must be incorporated when comparing an encoded DNA sequence to a reference DNA sequence. Results Although two-base encoding is currently used in practice, many other encoding schemes are possible, whereby two ore more bases are encoded at a time. A generalized <it>k</it>-base encoding scheme is presented, whereby feasible higher order encodings are better able to differentiate errors in the encoded sequence from true DNA sequence variants. A generalized version of the previous two-base encoding DNA sequence comparison algorithm is used to compare a <it>k</it>-base encoded sequence to a DNA reference sequence. Finally, simulations are performed to evaluate the power, the false positive and false negative SNP discovery rates, and the performance time of <it>k</it>-base encoding compared to previous methods as well as to the standard DNA sequence comparison algorithm. Conclusions The novel generalized <it>k</it>-base encoding scheme and resulting local alignment algorithm permits the development of higher fidelity ligation-based next generation sequencing technology. This bioinformatic solution affords greater robustness to errors, as well as lower false SNP discovery rates, only at the cost of computational time.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

BFAST: An Alignment Tool for Large Scale Genome Resequencing

Author: A Cox
B Langmead
B Ma
Barry Merriman
CA Hutchison 3rd
Chad Creighton
DR Bentley
DR Smith
F Sanger
H Li
H Li
L Ilie
M Margulies
N Homer
Nils Homer
R Li
RA Holt
SF Altschul
SM Rumble
SM Rumble
Stanley F. Nelson
TF Smith
WJ Kent
Y Sun
Z Ning
Publication venue: Public Library of Science
Publication date: 01/11/2009
Field of study

BACKGROUND:The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation. METHODOLOGY:We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. CONCLUSIONS:We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net)

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central