Search CORE

46 research outputs found

Updates to the RMAP short-read mapping software

Author: Chung Wen-Yu
Hannon Greg
Hicks James
Hodges Emily
Kendall Jude
Smith Andrew D.
Xuan Zhenyu
Zhang Michael Q.
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Summary: We report on a major new version of the RMAP software for mapping reads from short-read sequencing technology. General improvements to accuracy and space requirements are included, along with novel functionality. Included in the RMAP software package are tools for mapping paired-end reads, mapping using more sophisticated use of quality scores, collecting ambiguous mapping locations and mapping bisulfite-treated reads

CiteSeerX

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

BS Seeker: precise mapping for bisulfite sequencing

Author: Chen Pao-Yang
Cokus Shawn J
Pellegrini Matteo
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Bisulfite sequencing using next generation sequencers yields genome-wide measurements of DNA methylation at single nucleotide resolution. Traditional aligners are not designed for mapping bisulfite-treated reads, where the unmethylated Cs are converted to Ts. We have developed BS Seeker, an approach that converts the genome to a three-letter alphabet and uses Bowtie to align bisulfite-treated reads to a reference genome. It uses sequence tags to reduce mapping ambiguity. Post-processing of the alignments removes non-unique and low-quality mappings. Results We tested our aligner on synthetic data, a bisulfite-converted <it>Arabidopsis </it>library, and human libraries generated from two different experimental protocols. We evaluated the performance of our approach and compared it to other bisulfite aligners. The results demonstrate that among the aligners tested, BS Seeker is more versatile and faster. When mapping to the human genome, BS Seeker generates alignments significantly faster than RMAP and BSMAP. Furthermore, BS Seeker is the only alignment tool that can explicitly account for tags which are generated by certain library construction protocols. Conclusions BS Seeker provides fast and accurate mapping of bisulfite-converted reads. It can work with BS reads generated from the two different experimental protocols, and is able to efficiently map reads to large mammalian genomes. The Python program is freely available at <url>http://pellegrini.mcdb.ucla.edu/BS_Seeker/BS_Seeker.html</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

G-SNPM - A GPU-based SNP mapping tool

Author: Armano Giuliano
Manca Emanuele
Manconi Andrea
Milanesi Luciano
Orro Alessandro
Publication venue
Publication date: 13/02/2014
Field of study

Open Access Repository

A comprehensive evaluation of alignment algorithms in the context of RNA-seq.

Author: Friedel Caroline C.
Lindner Robert
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2012
Field of study

Transcriptome sequencing (RNA-Seq) overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the identification of splicing events. In recent years, a large number of such mapping algorithms have been developed, all of which have in common that they require algorithms for aligning a vast number of reads to genomic or transcriptomic sequences. Although the FM-index based aligner Bowtie has become a de facto standard within mapping pipelines, a much larger number of possible alignment algorithms have been developed also including other variants of FM-index based aligners. Accordingly, developers and users of RNA-seq mapping pipelines have the choice among a large number of available alignment algorithms. To provide guidance in the choice of alignment algorithms for these purposes, we evaluated the performance of 14 widely used alignment programs from three different algorithmic classes: algorithms using either hashing of the reference transcriptome, hashing of reads, or a compressed FM-index representation of the genome. Here, special emphasis was placed on both precision and recall and the performance for different read lengths and numbers of mismatches and indels in a read. Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

Open Access LMU

PubMed Central

CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping

Author: A Smith
A Smith
B Langmead
B Langmead
Douglas Ruden
H Li
H Li
H Li
H Li
J Dean
J Dudley
M Schatz
N Homer
Tung Nguyen
Weisong Shi
Y Li
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS). However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. Results To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80%) mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at <url>http://cloudaligner.sourceforge.net/</url> and its web version is at <url>http://mine.cs.wayne.edu:8080/CloudAligner/.</url> Conclusions Our results show that CloudAligner is faster than CloudBurst, provides more accurate results than RMAP, and supports various input as well as output formats. In addition, with the web-based interface, it is easier to use than its counterparts.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Commons@Wayne State University

Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions

Author: A McKenna
A von Bubnoff
AD Smith
AD Smith
AR Quinlan
B Langmead
D Weese
DC Koboldt
ER Mardis
ER Mardis
ER Martin
F Antequera
F Sanger
G Basti
GT Marth
H Jiang
H Li
H Li
H Li
H Li
H Li
H Lin
HL Eaves
JW Wang
L Bonetta
M David
N Homer
N Malhis
O Harismendy
P Flicek
PJA Cock
R Goya
R McLendon
RQ Li
RQ Li
S Graf
SC Schuster
SF Altschul
SM Rumble
SP Shah
V Bansal
WJ Kent
YF Shen
Publication venue: Nature Publishing Group
Publication date: 01/01/2011
Field of study

The rapid development of next generation sequencing (NGS) technology provides a new chance to extend the scale and resolution of genomic research. How to efficiently map millions of short reads to the reference genome and how to make accurate SNP calls are two major challenges in taking full advantage of NGS. In this article, we reviewed the current software tools for mapping and SNP calling, and evaluated their performance on samples from The Cancer Genome Atlas (TCGA) project. We found that BWA and Bowtie are better than the other alignment tools in comprehensive performance for Illumina platform, while NovoalignCS showed the best overall performance for SOLiD. Furthermore, we showed that next-generation sequencing platform has significantly lower coverage and poorer SNP-calling performance in the CpG islands, promoter and 5′-UTR regions of the genome. NGS experiments targeting for these regions should have higher sequencing depth than the normal genomic region

Crossref

PubMed Central

HKU Scholars Hub

A new hash function and its use in read mapping on genome

Author: Farzaneh Salari
Fatemeh Zare Mirakabad
Mehdi Sadeghi
Publication venue: Amirkabir University of Technology
Publication date: 01/09/2020
Field of study

Mapping reads onto genomes is an indispensable step in sequencing data analysis. A widely used method to speed up mapping is to index a genome by a hash table, in which genomic positions of

k

-mers are stored in the table. The hash table size increases exponentially with the

k

-mer length and thus the traditional hash function is not appropriate for a

k

-mer as long as a read. We present a hashing mechanism by two functions named

score1

and

score2

which can hash sequences with the length of reads. The size of hash table is directly proportional to the genome size, which is absolutely lower than that of hash table built by the conventional hash function. We evaluate our hashing system by developing a read mapper and running the mapper on

E. coli

genome with some simulated data sets. The results show that the high percentage of simulated reads can be mapped to correct locations on the genome

Directory of Open Access Journals

Review of state-of-the-art algorithms for genomics data analysis pipelines

Author: Canal-Alonso Ángel
Corchado Rodríguez Juan Manuel
Egido Noelia
Jiménez Pedro
Prieto Tejedor Javier
Publication venue
Publication date: 01/01/2022
Field of study

[EN]The advent of big data and advanced genomic sequencing technologies has presented challenges in terms of data processing for clinical use. The complexity of detecting and interpreting genetic variants, coupled with the vast array of tools and algorithms and the heavy computational workload, has made the development of comprehensive genomic analysis platforms crucial to enabling clinicians to quickly provide patients with genetic results. This chapter reviews and describes the pipeline for analyzing massive genomic data using both short-read and long-read technologies, discussing the current state of the main tools used at each stage and the role of artificial intelligence in their development. It also introduces DeepNGS (deepngs.eu), an end-to-end genomic analysis web platform, including its key features and applications

Gestion del Repositorio Documental de la Universidad de Salamanca