Search CORE

112 research outputs found

ReadDepth: A Parallel R Package for Detecting Copy Number Alterations from Short Sequencing Reads

Author: AJ Iafrate
Aleksandar Milosavljevic
AM Snijders
C Alkan
C Coarfa
C Xie
Christopher A. Miller
Cristian Coarfa
D Pinkel
DR Bentley
DY Chiang
ES Venkatraman
F Mitelman
H Li
J Castle
J Wang
JL Freeman
K Inoue
L Shayesteh
M Frommer
MD Robinson
Oliver Hampton
P Cohen
R Lister
S Ahn
S Yoon
SA McCarroll
Stein Aerts
Y Ji
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Copy number alterations are important contributors to many genetic diseases, including cancer. We present the readDepth package for R, which can detect these aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. In addition to achieving higher accuracy than existing packages, our tool runs much faster by utilizing multi-core architectures to parallelize the processing of these large data sets. In contrast to other published methods, readDepth does not require the sequencing of a reference sample, and uses a robust statistical model that accounts for overdispersed data. It includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. We also demonstrate a method for inferring copy number using reads generated by whole-genome bisulfite sequencing, thus enabling integrative study of epigenomic and copy number alterations. Finally, we apply this tool to two genomes, showing that it performs well on genomes sequenced to both low and high coverage. The readDepth package runs on Linux and MacOSX, is released under the Apache 2.0 license, and is available at http://code.google.com/p/readdepth/

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Recommended from our members

Computational solutions for omics data

Author: A Butte
A Chatr-aryamontri
A Franceschini
A Joshi
A Lan
A Mortazavi
A Subramanian
A Tanay
AC Jungkamp
AJ Pinho
AK Wong
AR Whitney
B Langmead
B Langmead
B Paten
Bonnie Berger
BP Kelley
C Huttenhower
C Kingsford
C Trapnell
C Trapnell
C Trapnell
C Wang
CH Yeang
CJ Vaske
CS Liao
D Croft
D Earl
D Kim
D Kim
D Park
DB Allison
DB Jaffe
DR Zerbino
E Banks
E Banks
E Cerami
E Nabieva
E Segal
E Yeger-Lotem
EJ Rossin
ER Mardis
ES Lander
ET Wang
F Hach
F Hach
F Markowetz
F Ozsolak
F Vandin
F Vandin
F Vezzi
GE Zinman
H Li
H Li
I Ulitsky
I Ulitsky
IA Adzhubei
J Butler
J Clarke
J Flannick
J Goecks
J Lamb
J Pandey
JC Marioni
JC Venter
Jian Peng
JT Dudley
JT Leek
JT Simpson
JT Simpson
K Rhrissorrakrai
KI Goh
KY Yeung
L Parts
LD Stein
LH Hartwell
LM Heiser
LR Meyer
M Ascano
M Burrows
M Garber
M Gross
M Gstaiger
M Hafner
M Hsi-Yang Fritz
M Kircher
M Koyuturk
M Narayanan
M Reich
M Schatz
M Schmid
M Sirota
M Steffen
M Yandell
MB Gerstein
MB Gerstein
MC Brandon
MC Schatz
MG Grabherr
MH Maathuis
ML Metzker
Mona Singh
N Atias
N de Souza
N Tuncbag
NP Palmer
NT Ingolia
O Hirose
O Litvin
O Ogasawara
O Stegle
O Vanunu
P Ferragina
P Flicek
P Jiang
P Kumar
P Lu
P Shannon
PA Pevzner
PE Compeau
PG Doyle
PO Brown
PR Loh
PR Schmid
R Colak
R Gaujoux
R Li
R Li
R Li
R Singh
RC Gentleman
S Anders
S Batzoglou
S Christley
S Deorowicz
S Erten
S Kohler
S Levy
S Navlakha
S Ng
S Suthram
SA Chowdhury
SD Kahn
SF Altschul
SG Tringe
SL Salzberg
SS Huang
SS Shen-Orr
T Barrett
T Ideker
T Michoel
TS Furey
U Manber
UD Akavia
W Ali
W Li
W Tembe
WJ Kent
X Liu
X Wang
X Zhou
Y Prat
Y Wang
Y Zhang
YA Kim
Z Tu
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2013
Field of study

High-throughput experimental technologies are generating increasingly massive and complex genomic data sets. The sheer enormity and heterogeneity of these data threaten to make the arising problems computationally infeasible. Fortunately, powerful algorithmic techniques lead to software that can answer important biomedical questions in practice. In this Review, we sample the algorithmic landscape, focusing on state-of-the-art techniques, the understanding of which will aid the bench biologist in analysing omics data. We spotlight specific examples that have facilitated and enriched analyses of sequence, transcriptomic and network data sets.National Institutes of Health (U.S.) (Grant GM081871

Princeton University Open Access Repository

DSpace@MIT

Crossref

PubMed Central

OPENMENDEL: A Cooperative Programming Project for Statistical Genetics

Author: Bates Douglas M.
Chu Benjamin B.
German Christopher A.
Ji Sarah S.
Keys Kevin L.
Kim Juhyun
Ko Seyoon
Lange Kenneth
Mosher Gordon D.
Papp Jeanette C.
Sinsheimer Janet S.
Sobel Eric M.
Zhai Jing
Zhou Hua
Zhou Jin J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/02/2019
Field of study

Statistical methods for genomewide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet this need is called the OPENMENDELproject (https://openmendel.github.io). It aims to (1) enable interactive and reproducible analyses with informative intermediate results, (2) scale to big data analytics, (3) embrace parallel and distributed computing, (4) adapt to rapid hardware evolution, (5) allow cloud computing, (6) allow integration of varied genetic data types, and (7) foster easy communication between clinicians, geneticists, statisticians, and computer scientists. This article reviews and makes recommendations to the genetic epidemiology community in the context of the OPENMENDEL project.Comment: 16 pages, 2 figures, 2 table

arXiv.org e-Print Archive

eScholarship - University of California

Technology dictates algorithms: Recent developments in read alignment

Author: Alkan Can
Alser Mohammed
Balliu Brunilda
Deshpande Dhrithi
Icer Baykal Pelin
Knyazev Sergey
Koslicki David
Mangul Serghei
Mutlu Onur
Rotman Jeremy
Shi Huwenbo
Singer Benjamin D.
Skums Pavel
Taraszka Kodi
Xue Victor
Yang Harry T.
Zelikovsky Alex
Publication venue
Publication date: 09/07/2020
Field of study

Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants and is an essential step of the majority of genomic analysis pipelines. Aligned reads are essential for answering important biological questions, such as detecting mutations driving various human diseases and complex traits as well as identifying species present in metagenomic samples. The read alignment problem is extremely challenging due to the large size of analyzed datasets and numerous technological limitations of sequencing platforms, and researchers have developed novel bioinformatics algorithms to tackle these difficulties. Importantly, computational algorithms have evolved and diversified in accordance with technological advances, leading to todays diverse array of bioinformatics tools. Our review provides a survey of algorithmic foundations and methodologies across 107 alignment methods published between 1988 and 2020, for both short and long reads. We provide rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read aligners. We separately discuss how longer read lengths produce unique advantages and limitations to read alignment techniques. We also discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology, including whole transcriptome, adaptive immune repertoire, and human microbiome studies

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

GENOME INFORMATICS

Author: Birol I.
Cant J.
Champ M.
Publication venue
Publication date: 01/09/2010
Field of study

Cold Spring Harbor Laboratory Institutional Repository

High-Performance Modelling and Simulation for Big Data Applications

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2021
Field of study

This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

Directory of Open Access Books (DOAB)

THE BIOLOGY OF GENOMES

Author: Celniker S.
Clark A.
Ponting C.
Weinstock G.
Publication venue
Publication date: 01/05/2010
Field of study

Cold Spring Harbor Laboratory Institutional Repository

High-Performance Modelling and Simulation for Big Data Applications

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library

ALGORITHMS AND HIGH PERFORMANCE COMPUTING APPROACHES FOR SEQUENCING-BASED COMPARATIVE GENOMICS

Author: Langmead Benjamin Thomas
Publication venue
Publication date: 01/01/2012
Field of study

As cost and throughput of second-generation sequencers continue to improve, even modestly resourced research laboratories can now perform DNA sequencing experiments that generate hundreds of billions of nucleotides of data, enough to cover the human genome dozens of times over, in about a week for a few thousand dollars. Such data are now being generated rapidly by research groups across the world, and large-scale analyses of these data appear often in high-profile publications such as Nature, Science, and The New England Journal of Medicine. But with these advances comes a serious problem: growth in per-sequencer throughput (currently about 4x per year) is drastically outpacing growth in computer speed (about 2x every 2 years). As the throughput gap widens over time, sequence analysis software is becoming a performance bottleneck, and the costs associated with building and maintaining the needed computing resources is burdensome for research laboratories. This thesis proposes two methods and describes four open source software tools that help to address these issues using novel algorithms and high-performance computing techniques. The proposed approaches build primarily on two insights. First, that the Burrows-Wheeler Transform and the FM Index, previously used for data compression and exact string matching, can be extended to facilitate fast and memory-efficient alignment of DNA sequences to long reference genomes such as the human genome. Second, that these algorithmic advances can be combined with MapReduce and cloud computing to solve comparative genomics problems in a manner that is scalable, fault tolerant, and usable even by small research groups

Digital Repository at the University of Maryland