Search CORE

85 research outputs found

Slider—maximum use of probability information for alignment of short sequence reads and SNP detection

Author: Altschul
Delcher
Holt
Korf
Kurtz
M. Ester
N. Malhis
S. J. M. Jones
Schatz
Slater
Smith
Y. S. N. Butterfield
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Motivation: A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this article, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files

CiteSeerX

Crossref

PubMed Central

Evaluation of next-generation sequencing software in mapping and assembly

Author: A Bashir
A Bateman
AC McHardy
AD Smith
B Langmead
BinBin Wang
C Trapnell
CA Tilford
D Campagna
D Hernandez
D Weese
DR Bentley
DR Zerbino
DS Horner
DW Bryant Jr
ER Mardis
ER Mardis
ES Lander
EW Myers
F Sanger
H Jiang
H Li
H Li
H Li
H Lin
HL Eaves
J Butler
JC Dohm
JC Venter
JO Korbel
JR Miller
JR Miller
JT Simpson
JT Simpson
K Chen
KE Holt
L Engstrand
L Noe
M Margulies
M Pop
M Pop
MC Schatz
MJ Chaisson
ML Metzker
MS Hossain
N Homer
N Malhis
NL Clement
O Morozova
O Morozova
P Flicek
P Flicek
P Medvedev
PA Pevzner
PJ Campbell
PJ Hurd
R Staden
RF Service
RL Warren
RQ Li
RQ Li
Rui Jiang
SC Schuster
SM Rumble
Suying Bao
WingKeung Kwan
WJ Ansorge
WR Jeck
Xu Ma
Y Chen
YJ Kim
You-Qiang Song
Z Ning
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed. These new non-Sanger-based technologies feature several advantages when compared with traditional sequencing methods in terms of higher sequencing speed, lower per run cost and higher accuracy. However, reads from next-generation sequencing (NGS) platforms, such as 454/Roche, ABI/SOLiD and Illumina/Solexa, are usually short, thereby restricting the applications of NGS platforms in genome assembly and annotation. We presented an overview of the challenges that these novel technologies meet and particularly illustrated various bioinformatics attempts on mapping and assembly for problem solving. We then compared the performance of several programs in these two fields, and further provided advices on selecting suitable tools for specific biological applications.published_or_final_versio

Crossref

HKU Scholars Hub

Fast and accurate short read alignment with Burrows–Wheeler transform

Author: H. Li
Langmead
Lippert
R. Durbin
Smith
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals

CiteSeerX

Crossref

PubMed Central

Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions

Author: A McKenna
A von Bubnoff
AD Smith
AD Smith
AR Quinlan
B Langmead
D Weese
DC Koboldt
ER Mardis
ER Mardis
ER Martin
F Antequera
F Sanger
G Basti
GT Marth
H Jiang
H Li
H Li
H Li
H Li
H Li
H Lin
HL Eaves
JW Wang
L Bonetta
M David
N Homer
N Malhis
O Harismendy
P Flicek
PJA Cock
R Goya
R McLendon
RQ Li
RQ Li
S Graf
SC Schuster
SF Altschul
SM Rumble
SP Shah
V Bansal
WJ Kent
YF Shen
Publication venue: Nature Publishing Group
Publication date: 01/01/2011
Field of study

The rapid development of next generation sequencing (NGS) technology provides a new chance to extend the scale and resolution of genomic research. How to efficiently map millions of short reads to the reference genome and how to make accurate SNP calls are two major challenges in taking full advantage of NGS. In this article, we reviewed the current software tools for mapping and SNP calling, and evaluated their performance on samples from The Cancer Genome Atlas (TCGA) project. We found that BWA and Bowtie are better than the other alignment tools in comprehensive performance for Illumina platform, while NovoalignCS showed the best overall performance for SOLiD. Furthermore, we showed that next-generation sequencing platform has significantly lower coverage and poorer SNP-calling performance in the CpG islands, promoter and 5′-UTR regions of the genome. NGS experiments targeting for these regions should have higher sequencing depth than the normal genomic region

Crossref

PubMed Central

HKU Scholars Hub

Highly Sensitive and Specific Detection of Rare Variants in Mixed Viral Populations from Massively Parallel Sequence Data

Author: Alexander R. Macalalad
Bruce W. Birren
C Hedskog
C Hoffmann
C Quince
C Wang
C Wang
Christian L. Boutwell
Christine M. Malboeuf
D Altshuler
Doug E. Brackney
DR Bentley
Elizabeth M. Ryan
G Rozera
Gregory D. Ebel
H Li
HF Gunthard
I Astrovskaya
J Archer
JF Salazar-Gonzalez
Joshua Z. Levin
Karen A. Power
Kendra N. Pesko
LQ Zhang
M Margulies
MA DePristo
Matthew R. Henn
MCF Prosperi
Michael C. Zody
MR Henn
N Eriksson
N Malhis
Niall J. Lennon
O Zagordi
O Zagordi
O Zagordi
Patrick Charlebois
R Goya
R Li
Ruchi M. Newman
S Palmer
Sergei L. Kosakovsky Pond
T Zhu
Todd M. Allen
VF Boltz
W Brockman
Publication venue: Public Library of Science
Publication date: 15/03/2012
Field of study

Viruses diversify over time within hosts, often undercutting the effectiveness of host defenses and therapeutic interventions. To design successful vaccines and therapeutics, it is critical to better understand viral diversification, including comprehensively characterizing the genetic variants in viral intra-host populations and modeling changes from transmission through the course of infection. Massively parallel sequencing technologies can overcome the cost constraints of older sequencing methods and obtain the high sequence coverage needed to detect rare genetic variants (<1%) within an infected host, and to assay variants without prior knowledge. Critical to interpreting deep sequence data sets is the ability to distinguish biological variants from process errors with high sensitivity and specificity. To address this challenge, we describe V-Phaser, an algorithm able to recognize rare biological variants in mixed populations. V-Phaser uses covariation (i.e. phasing) between observed variants to increase sensitivity and an expectation maximization algorithm that iteratively recalibrates base quality scores to increase specificity. Overall, V-Phaser achieved >97% sensitivity and >97% specificity on control read sets. On data derived from a patient after four years of HIV-1 infection, V-Phaser detected 2,015 variants across the ∼10 kb genome, including 603 rare variants (<1% frequency) detected only using phase information. V-Phaser identified variants at frequencies down to 0.2%, comparable to the detection threshold of allele-specific PCR, a method that requires prior knowledge of the variants. The high sensitivity and specificity of V-Phaser enables identifying and tracking changes in low frequency variants in mixed populations such as RNA viruses

Public Library of Science (PLOS)

CiteSeerX

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Simultaneous alignment of short reads against multiple genomes

Author: Gesing Sandra
Hagmann Jörg
Kohlbacher Oliver
Ossowski Stephan
Schneeberger Korbinian
Warthmann Norman
Weigel Detlef
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

New software for the alignment of short-read sequence data to multiple genomes allows identification of polymorphisms that cannot be identified by alignment to a single reference genome

Crossref

Springer - Publisher Connector

PubMed Central

The Australian National University

MPG.PuRe

Technology dictates algorithms: Recent developments in read alignment

Author: Alkan Can
Alser Mohammed
Balliu Brunilda
Deshpande Dhrithi
Icer Baykal Pelin
Knyazev Sergey
Koslicki David
Mangul Serghei
Mutlu Onur
Rotman Jeremy
Shi Huwenbo
Singer Benjamin D.
Skums Pavel
Taraszka Kodi
Xue Victor
Yang Harry T.
Zelikovsky Alex
Publication venue
Publication date: 09/07/2020
Field of study

Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants and is an essential step of the majority of genomic analysis pipelines. Aligned reads are essential for answering important biological questions, such as detecting mutations driving various human diseases and complex traits as well as identifying species present in metagenomic samples. The read alignment problem is extremely challenging due to the large size of analyzed datasets and numerous technological limitations of sequencing platforms, and researchers have developed novel bioinformatics algorithms to tackle these difficulties. Importantly, computational algorithms have evolved and diversified in accordance with technological advances, leading to todays diverse array of bioinformatics tools. Our review provides a survey of algorithmic foundations and methodologies across 107 alignment methods published between 1988 and 2020, for both short and long reads. We provide rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read aligners. We separately discuss how longer read lengths produce unique advantages and limitations to read alignment techniques. We also discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology, including whole transcriptome, adaptive immune repertoire, and human microbiome studies

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

Novel computational techniques for mapping and classifying Next-Generation Sequencing data

Author: Břinda Karel
Publication venue
Publication date
Field of study

Since their emergence around 2006, Next-Generation Sequencing technologies have been revolutionizing biological and medical research. Quickly obtaining an extensive amount of short or long reads of DNA sequence from almost any biological sample enables detecting genomic variants, revealing the composition of species in a metagenome, deciphering cancer biology, decoding the evolution of living or extinct species, or understanding human migration patterns and human history in general. The pace at which the throughput of sequencing technologies is increasing surpasses the growth of storage and computer capacities, which creates new computational challenges in NGS data processing. In this thesis, we present novel computational techniques for read mapping and taxonomic classification. With more than a hundred of published mappers, read mapping might be considered fully solved. However, the vast majority of mappers follow the same paradigm and only little attention has been paid to non-standard mapping approaches. Here, we propound the so-called dynamic mapping that we show to significantly improve the resulting alignments compared to traditional mapping approaches. Dynamic mapping is based on exploiting the information from previously computed alignments, helping to improve the mapping of subsequent reads. We provide the first comprehensive overview of this method and demonstrate its qualities using Dynamic Mapping Simulator, a pipeline that compares various dynamic mapping scenarios to static mapping and iterative referencing. An important component of a dynamic mapper is an online consensus caller, i.e., a program collecting alignment statistics and guiding updates of the reference in the online fashion. We provide Ococo, the first online consensus caller that implements a smart statistics for individual genomic positions using compact bit counters. Beyond its application to dynamic mapping, Ococo can be employed as an online SNP caller in various analysis pipelines, enabling SNP calling from a stream without saving the alignments on disk. Metagenomic classification of NGS reads is another major topic studied in the thesis. Having a database with thousands of reference genomes placed on a taxonomic tree, the task is to rapidly assign a huge amount of NGS reads to tree nodes, and possibly estimate the relative abundance of involved species. In this thesis, we propose improved computational techniques for this task. In a series of experiments, we show that spaced seeds consistently improve the classification accuracy. We provide Seed-Kraken, a spaced seed extension of Kraken, the most popular classifier at present. Furthermore, we suggest ProPhyle, a new indexing strategy based on a BWT-index, obtaining a much smaller and more informative index compared to Kraken. We provide a modified version of BWA that improves the BWT-index for a quick k-mer look-up

ZENODO

A study on the effect of stroop test on the formation of students discipline by using the Heart Rate Variability (HRV) technique

Author: Abdul Wahab Muhammad Nubli
Ani Fauziah
Damin Zahrul Akmal
Halim Harliana
Hamzah Shahidah
Jaes Lutfan
Johar Siti Sarawati
Saad Shamsaadal Sholeh
Publication venue: 'Science Publishing Corporation'
Publication date: 01/01/2018
Field of study

Discipline refers to self-control and individual behaviour. Other than that, discipline is an important element in the formation of integrity level. The objective of the study is to assess the effects of using the Stroop test of biofeedback protocol in order to evaluate individual level of discipline. A clinical study has been conducted on 50 participants which is the participants is a undergraduate student from Universiti Malaysia Pahang, who were divided into two groups. First group is students get high achiever and second group is students get low achierver in academic. The Heart Rate Variability (HRV) technique has been used in the assessment of this protocol. The findings show that there was a positive relationship between the Stroop test and the students discipline that those who excelled managed to get higher score of LF spectrum as compared to HF and VLF, while the students with lower achievement showed higher score of VLF and HF spectrum than LF. In conclusion, this test is one of the tests that can be used in increasing the level of individual discipline

UTHM Institutional Repository

Read alignment using deep neural networks

Author: Shrestha Akash
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2019
Field of study

2019 Spring.Includes bibliographical references.Read alignment is the process of mapping short DNA sequences into the reference genome. With the advent of consecutively evolving "next generation" sequencing technologies, the need for sequence alignment tools appeared. Many scientific communities and the companies marketing the sequencing technologies developed a whole spectrum of read aligners/mappers for different error profiles and read length characteristics. Among the most recent successfully marketed sequencing technologies are Oxford Nanopore and PacBio SMRT sequencing, which are considered top players because of their extremely long reads and low cost. However, the reads may contain error up to 20% that are not generally uniformly distributed. To deal with that level of error rate and read length, proximity preserving hashing techniques, such as Minhash and Minimizers, were utilized to quickly map a read to the target region of the reference sequence. Subsequently, a variant of global or local alignment dynamic programming is then used to give the final alignment. In this research work, we train a Deep Neural Network (DNN) to yield a hashing scheme for the highly erroneous long reads, which is deemed superior to Minhash for mapping the reads. We implemented that idea to build a read alignment tool: DNNAligner. We evaluated the performance of our aligner against the popular read aligners in the bioinformatics community currently — minimap2, bwa-mem and graphmap. Our results show that the performance of DNNAligner is comparable to other tools without any code optimization or integration of other advanced features. Moreover, DNN exhibits superior performance in comparison with Minhashon neighborhood classification

Mountain Scholar (Digital Collections of Colorado and Wyoming)