Search CORE

200 research outputs found

A model of higher accuracy for the individual haplotyping problem based on weighted SNP fragments and genotype with errors

Author: Adkins
Akey
Altshuler
Greenberg
J. Chen
J. Wang
Kang
Lander
Levy
M. Xie
Venter
Xie
Zhao
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: In genetic studies of complex diseases, haplotypes provide more information than genotypes. However, haplotyping is much more difficult than genotyping using biological techniques. Therefore effective computational techniques have been in demand. The individual haplotyping problem is the computational problem of inducing a pair of haplotypes from an individual's aligned SNP fragments. Based on various optimal criteria and including different extra information, many models for the problem have been proposed. Higher accuracy of the models has been an important issue in the study of haplotype reconstruction

Crossref

PubMed Central

Joint Haplotype Assembly and Genotype Calling via Sequential Monte Carlo Algorithm

Author: Ahn Soyeon
Vikalo Haris
Publication venue
Publication date: 01/07/2015
Field of study

Genetic variations predispose individuals to hereditary diseases, play important role in the development of complex diseases, and impact drug metabolism. The full information about the DNA variations in the genome of an individual is given by haplotypes, the ordered lists of single nucleotide polymorphisms (SNPs) located on chromosomes. Affordable high-throughput DNA sequencing technologies enable routine acquisition of data needed for the assembly of single individual haplotypes. However, state-of-the-art high-throughput sequencing platforms generate data that is erroneous, which induces uncertainty in the SNP and genotype calling procedures and, ultimately, adversely affect the accuracy of haplotyping. When inferring haplotype phase information, the vast majority of the existing techniques for haplotype assembly assume that the genotype information is correct. This motivates the development of methods capable of joint genotype calling and haplotype assembly. Results: We present a haplotype assembly algorithm, ParticleHap, that relies on a probabilistic description of the sequencing data to jointly infer genotypes and assemble the most likely haplotypes. Our method employs a deterministic sequential Monte Carlo algorithm that associates single nucleotide polymorphisms with haplotypes by exhaustively exploring all possible extensions of the partial haplotypes. The algorithm relies on genotype likelihoods rather than on often erroneously called genotypes, thus ensuring a more accurate assembly of the haplotypes. Results on both the 1000 Genomes Project experimental data as well as simulation studies demonstrate that the proposed approach enables highly accurate solutions to the haplotype assembly problem while being computationally efficient and scalable, generally outperforming existing methods in terms of both accuracy and speed. Conclusions: The developed probabilistic framework and sequential Monte Carlo algorithm enable joint haplotype assembly and genotyping in a computationally efficient manner. Our results demonstrate fast and highly accurate haplotype assembly aided by the re-examination of erroneously called genotypes.National Science Foundation CCF-1320273Electrical and Computer Engineerin

Springer - Publisher Connector

PubMed Central

Texas ScholarWorks

Haplotype estimation in polyploids using DNA sequence data

Author: Motazedi Ehsan
Publication venue: Wageningen University
Publication date: 01/01/2019
Field of study

Polyploid organisms possess more than two copies of their core genome and therefore contain k>2 haplotypes for each set of ordered genomic variants. Polyploidy occurs often within the plant kingdom, among others in important corps such as potato (k=4) and wheat (k=6). Current sequencing technologies enable us to read the DNA and detect genomic variants, but cannot distinguish between the copies of the genome, each inherited from one of the parents. To detect inheritance patterns in populations, it is necessary to know the haplotypes, as alleles that are in linkage over the same chromosome tend to be inherited together. In this work, we develop mathematical optimisation algorithms to indirectly estimate haplotypes by looking into overlaps between the sequence reads of an individual, as well as into the expected inheritance of the alleles in a population. These algorithm deal with sequencing errors and random variations in the counts of reads observed from each haplotype. These methods are therefore of high importance for studying the genetics of polyploid crops. </p

Wageningen University & Research Publications

Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm

Author: A Panconesi
D Aguiar
D Aguiar
D He
F Deng
F Geraci
G Lancia
GR Abecasis
H Matsumoto
Haris Vikalo
J Duitama
JH Kim
K-C Liang
KC Liang
LM Li
M Xie
MR Hoehe
MS Arulampalam
MS Bayzid
R Cilibrasi
R Lippert
R Nielsen
R Schwartz
RA Gibbs
RS Wang
S Levy
Soyeon Ahn
V Bansal
V Bansal
Y Wang
YY Zhao
Z Chen
ZZ Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Read-based Phasing of Related Individuals

Author: Garg S.
Marschall T.
Martin M.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

Motivation: Read-based phasing deduces the haplotypes of an individual from sequencing reads that cover multiple variants, while genetic phasing takes only genotypes as input and applies the rules of Mendelian inheritance to infer haplotypes within a pedigree of individuals. Combining both into an approach that uses these two independent sources of information—reads and pedigree—has the potential to deliver results better than each individually. Results: We provide a theoretical framework combining read-based phasing with genetic haplotyping, and describe a fixed-parameter algorithm and its implementation for finding an optimal solution. We show that leveraging reads of related individuals jointly in this way yields more phased variants and at a higher accuracy than when phased separately, both in simulated and real data. Coverages as low as 2× for each member of a trio yield haplotypes that are as accurate as when analyzed separately at 15× coverage per individual. Availability and Implementation: https://bitbucket.org/whatshap/whatshap Contact: [email protected]

PubMed Central

MPG.PuRe

Algorithmic approaches for the single individual haplotyping problem

Author: Lancia Giuseppe
Publication venue: 'EDP Sciences'
Publication date: 01/01/2016
Field of study

Since its introduction in 2001, the Single Individual Haplotyping problem has received an ever-increasing attention from the scientific community. In this paper we survey, in the form of an annotated bibliography, the developments in the study of the problem from its origin until our days

Archivio istituzionale della ricerca - Università degli Studi di Udine

EDP Sciences OAI-PMH repository (1.2.0)

Numérisation de Documents Anciens Mathématiques

Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques

Author: Duitama Jorge
Hoehe Margret R.
Huebsch Thomas
McEwen Gayle K.
Palczewski Stefanie
Schulz Sabrina
Suk Eun-Kyung
Verstrepen Kevin
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Determining the underlying haplotypes of individual human genomes is an essential, but currently difficult, step toward a complete understanding of genome function. Fosmid pool-based next-generation sequencing allows genome-wide generation of 40-kb haploid DNA segments, which can be phased into contiguous molecular haplotypes computationally by Single Individual Haplotyping (SIH). Many SIH algorithms have been proposed, but the accuracy of such methods has been difficult to assess due to the lack of real benchmark data. To address this problem, we generated whole genome fosmid sequence data from a HapMap trio child, NA12878, for which reliable haplotypes have already been produced. We assembled haplotypes using eight algorithms for SIH and carried out direct comparisons of their accuracy, completeness and efficiency. Our comparisons indicate that fosmid-based haplotyping can deliver highly accurate results even at low coverage and that our SIH algorithm, ReFHap, is able to efficiently produce high-quality haplotypes. We expanded the haplotypes for NA12878 by combining the current haplotypes with our fosmid-based haplotypes, producing near-to-complete new gold-standard haplotypes containing almost 98% of heterozygous SNPs. This improvement includes notable fractions of disease-related and GWA SNPs. Integrated with other molecular biological data sets, this phase information will advance the emerging field of diploid genomics

PWHATSHAP: efficient haplotyping for future generation sequencing

Author: Aldinucci Marco
Bracciali Andrea
Marschall Tobias
Merelli Ivan
Patterson Murray
Pisanti Nadia
Torquati Massimo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Haplotype phasing is an important problem in the analysis of genomics information. Given a set of DNA fragments of an individual, it consists of determining which one of the possible alleles (alternative forms of a gene) each fragment comes from. Haplotype information is relevant to gene regulation, epigenetics, genome-wide association studies, evolutionary and population studies, and the study of mutations. Haplotyping is currently addressed as an optimisation problem aiming at solutions that minimise, for instance, error correction costs, where costs are a measure of the con dence in the accuracy of the information acquired from DNA sequencing. Solutions have typically an exponential computational complexity. WhatsHap is a recent optimal approach which moves computational complexity from DNA fragment length to fragment overlap, i.e. coverage, and is hence of particular interest when considering sequencing technology's current trends that are producing longer fragments.  Results: Given the potential relevance of ecient haplotyping in several analysis pipelines, we have designed and engineered pWhatsHap, a parallel, high-performance version of WhatsHap. pWhatsHap is embedded in a toolkit developed in Python and supports genomics datasets in standard le formats. Building on WhatsHap, pWhatsHap exhibits the same complexity exploring a number of possible solutions which is exponential in the coverage of the dataset. The parallel implementation on multi-core architectures allows for a relevant reduction of the execution time for haplotyping, while the provided results enjoy the same high accuracy as that provided by WhatsHap, which increases with coverage.  Conclusions: Due to its structure and management of the large datasets, the parallelisation of WhatsHap posed demanding technical challenges, which have been addressed exploiting a high-level parallel programming framework. The result, pWhatsHap, is a freely available toolkit that improves the eciency of the analysis of genomics information

Stirling Online Research Repository (RIOXX)

Springer - Publisher Connector

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

PubMed Central

Stirling Online Research Repository

MPG.PuRe

Hal-Diderot

Institutional Research Information System University of Turin

PWHATSHAP: efficient haplotyping for future generation sequencing

Author: A Menelaou
A Panconesi
Andrea Bracciali
AS Mikheyev
BN Howie
CS Chin
D He
D Leung
F Deng
G Glusman
G Lancia
GM Amdahl
Ivan Merelli
J Duitama
J Huang
J Marchini
M Aldinucci
M Aldinucci
M Aldinucci
M Carneiro
M Patterson
M Patterson
M Slatkin
MA DePristo
Marco Aldinucci
Massimo Torquati
Murray Patterson
Nadia Pisanti
P Fouilhoux
P Scheet
R Cilibrasi
R Roberts
RG Downey
S Levy
SR Browning
SR Mousavi
The 1000 Genomes Project Consortium
The Genome of the Netherlands Consortium
The International HapMap Consortium
Tobias Marschall
V Bansal
V Bansal
V Bansal
V Kuleshov
V Kuleshov
Y Li
Y Pirola
YT Zhao
ZZ Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem

Author: Altshuler
Bansal
Chen
Daly
F. Geraci
Frazer
Metzker
Via
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Single nucleotide polymorphisms are the most common form of variation in human DNA, and are involved in many research fields, from molecular biology to medical therapy. The technological opportunity to deal with long DNA sequences using shotgun sequencing has raised the problem of fragment recombination. In this regard, Single Individual Haplotyping (SIH) problem has received considerable attention over the past few years

Crossref

PubMed Central

PUblication MAnagement