Search CORE

289 research outputs found

PAIR: polymorphic Alu insertion recognition

Author: AH Salem
B Halldórsson
Bjarni V Halldórsson
C Papadimitriou
C Stewart
D Hartl
DF Gudbjartsson
E Ullu
F Hormozdiari
F Hormozdiari
F Hormozdiari
F Rivadeneira
H Holm
H Li
H Stefansson
I Hajirasouliha
J Jurka
J Korbel
J Wang
Jón Ingi Sveinbjörnsson
M Batzer
M Batzer
N Siva
P Medvedev
P Sulem
PL Deininger
R Durbin
RE Mills
U Styrkarsdottir
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Accelerating read mapping with FastHASH

Author: Alkan C.
Hormozdiari F.
Lee D.
Mutlu O.
Xin H.
Yedkar S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

With the introduction of next-generation sequencing (NGS) technologies, we are facing an exponential increase in the amount of genomic sequence data. The success of all medical and genetic applications of next-generation sequencing critically depends on the existence of computational techniques that can process and analyze the enormous amount of sequence data quickly and accurately. Unfortunately, the current read mapping algorithms have difficulties in coping with the massive amounts of data generated by NGS. We propose a new algorithm, FastHASH, which drastically improves the performance of the seed-and-extend type hash table based read mapping algorithms, while maintaining the high sensitivity and comprehensiveness of such methods. FastHASH is a generic algorithm compatible with all seed-and-extend class read mapping algorithms. It introduces two main techniques, namely Adjacency Filtering, and Cheap K-mer Selection. We implemented FastHASH and merged it into the codebase of the popular read mapping program, mrFAST. Depending on the edit distance cutoffs, we observed up to 19-fold speedup while still maintaining 100% sensitivity and high comprehensiveness. © 2013 Xin et al

Bilkent University Institutional Repository

PubMed Central

eScholarship - University of California

Fast and accurate mapping of Complete Genomics reads

Author: Alkan C.
Hach F.
Hormozdiari F.
Lee D.
Mutlu O.
Xin H.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Many recent advances in genomics and the expectations of personalized medicine are made possible thanks to power of high throughput sequencing (HTS) in sequencing large collections of human genomes. There are tens of different sequencing technologies currently available, and each HTS platform have different strengths and biases. This diversity both makes it possible to use different technologies to correct for shortcomings; but also requires to develop different algorithms for each platform due to the differences in data types and error models. The first problem to tackle in analyzing HTS data for resequencing applications is the read mapping stage, where many tools have been developed for the most popular HTS methods, but publicly available and open source aligners are still lacking for the Complete Genomics (CG) platform. Unfortunately, Burrows-Wheeler based methods are not practical for CG data due to the gapped nature of the reads generated by this method. Here we provide a sensitive read mapper (sirFAST) for the CG technology based on the seed-and-extend paradigm that can quickly map CG reads to a reference genome. We evaluate the performance and accuracy of sirFAST using both simulated and publicly available real data sets, showing high precision and recall rates. © 2014 Elsevier Inc

Crossref

Bilkent University Institutional Repository

PubMed Central

eScholarship - University of California

Consistency-based detection of potential tumor-specific deletions in matched normal/tumor genomes

Author: A Altmann
A Dalca
C Alkan
C Chauve
Cedric Chauve
D Koboldt
D Lasko
E Mardis
E Mardis
E Pleasance
E Tuzun
F Hormozdiari
F Hormozdiari
G Dalgliesh
H Li
J Korbel
J Stoye
K Chen
K Paszkiewicz
K Robinson
K Ye
L Ding
M Edmonson
M Meyerson
M Snyder
M Wendl
MR Garey
P Medvedev
Roland Wittler
S Jones
S Lee
S Lee
S Sindi
S Volik
T Ley
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Wittler R, Chauve C. Consistency-based detection of potential tumor-specific deletions in matched normal/tumor genomes. BMC Bioinformatics. 2011;12(Suppl. 9):S21

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Simon Fraser University Institutional Repository

Publications at Bielefeld University

On the power and the systematic biases of the detection of chromosomal inversions by paired-end genome sequencing

Author: A Bashir
AA Hoffmann
AJ Iafrate
AM Hillmer
AW Pang
B Zeitouni
C Alkan
CB Krimbas
DC Richter
E Tuzun
F Hormozdiari
F Hormozdiari
H Li
H Stefansson
J Cao
J Sebat
J Wang
JC Roach
JM Kidd
JM Kidd
JO Korbel
JO Korbel
José Ignacio Lucas Lledó
K Chen
KF Manly
KJ McKernan
L Feuk
M Onishi-Seebacher
Mario Cáceres
P Medvedev
PJ Campbell
PJ Stephens
R Xi
S Suzuki
SM Ahn
SS Sindi
T Rausch
Y Jiang
ZD Zhang
Zhanjiang Liu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, % of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions -SVDetect, GRIAL, and VariationHunter-, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

Directory of Open Access Journals

PubMed Central

Diposit Digital de Documents de la UAB

Resolving the complexity of the human genome using single-molecule sequencing

Author: Antonacci F.
Boitano M.
Chaisson M. J. P.
Dennis M. Y.
Eichler E. E.
Hormozdiari F.
Huddleston J.
Hunkapiller M. W.
Korlach J.
Landolin J. M.
Malig M.
Sandstrom R.
Stamatoyannopoulos J. A.
Sudmant P. H.
Surti U.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome - 78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology

Archivio istituzionale della ricerca - Università di Bari

Network Archaeology: Uncovering Ancient Networks from Present-day Interactions

Author: A Ahmed
A Kreimer
A Mithani
A Vazquez
A Vázquez
A Wagner
AC Gavin
AL Barabási
B Manna
BP Kelley
C Tantipathananandh
C Wiuf
Carl Kingsford
DJ de Solla Price
DJ Watts
DS Callaway
E Sprinzak
ED Levy
F Guo
F Hormozdiari
G Palla
H Ebel
H Huang
HA Simon
HB Fraser
I Bezáková
I Ispolatov
I Ispolatov
J Bar-Ilan
J Dutkowski
J Felsenstein
J Flannick
J Golbeck
J Hopcroft
J Leskovec
J Leskovec
J Leskovec
J Leskovec
J Leskovec
JB Pereira-Leal
JB Pereira-Leal
Joel S. Bader
JW Pinney
JW Thornton
L Hakes
LA Goodman
M Middendorf
P Shannon
R Kumar
R Milo
R Singh
RL Tatusov
S Hanneke
S Kerrien
S Li
S Navlakha
S Redner
Saket Navlakha
T Makino
TA Gibson
U Güldener
WK Kim
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 30/08/2010
Field of study

Often questions arise about old or extinct networks. What proteins interacted in a long-extinct ancestor species of yeast? Who were the central players in the Last.fm social network 3 years ago? Our ability to answer such questions has been limited by the unavailability of past versions of networks. To overcome these limitations, we propose several algorithms for reconstructing a network's history of growth given only the network as it exists today and a generative model by which the network is believed to have evolved. Our likelihood-based method finds a probable previous state of the network by reversing the forward growth model. This approach retains node identities so that the history of individual nodes can be tracked. We apply these algorithms to uncover older, non-extant biological and social networks believed to have grown via several models, including duplication-mutation with complementarity, forest fire, and preferential attachment. Through experiments on both synthetic and real-world data, we find that our algorithms can estimate node arrival times, identify anchor nodes from which new nodes copy links, and can reveal significant features of networks that have long since disappeared.Comment: 16 pages, 10 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

PubMed Central

deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data

Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central