Search CORE

414 research outputs found

NGS Based Haplotype Assembly Using Matrix Completion

Author: Kahaei MH
Majidian Sina
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

We apply matrix completion methods for haplotype assembly from NGS reads to develop the new HapSVT, HapNuc, and HapOPT algorithms. This is performed by applying a mathematical model to convert the reads to an incomplete matrix and estimating unknown components. This process is followed by quantizing and decoding the completed matrix in order to estimate haplotypes. These algorithms are compared to the state-of-the-art algorithms using simulated data as well as the real fosmid data. It is shown that the SNP missing rate and the haplotype block length of the proposed HapOPT are better than those of HapCUT2 with comparable accuracy in terms of reconstruction rate and switch error rate. A program implementing the proposed algorithms in MATLAB is freely available at https://github.com/smajidian/HapMC

arXiv.org e-Print Archive

Directory of Open Access Journals

Accurate estimation of homologue-specific DNA concentration-ratios in cancer samples allows long-range haplotyping

Author: Gad Getz
Matthew Meyerson
Scott L. Carter
Publication venue
Publication date: 05/10/2011
Field of study

Interpretation of allelic copy measurements at polymorphic markers in cancer samples presents distinctive challenges and opportunities. Due to frequent gross chromosomal alterations occurring in cancer (aneuploidy), many genomic regions are present at homologous-allele imbalance. Within such regions, the unequal contribution of alleles at heterozygous markers allows for direct phasing of the haplotype derived from each individual parent. In addition, genome-wide estimates of homologue specific copy- ratios (HSCRs) are important for interpretation of the cancer genome in terms of fixed integral copy-numbers. We describe HAPSEG, a probabilistic method to interpret bi- allelic marker data in cancer samples. HAPSEG operates by partitioning the genome into segments of distinct copy number and modeling the four distinct genotypes in each segment. We describe general methods for fitting these models to data which are suit- able for both SNP microarrays and massively parallel sequencing data. In addition, we demonstrate a specially tailored error-model for interpretation of systematic variations arising in microarray platforms. The ability to directly determine haplotypes from cancer samples represents an opportunity to expand reference panels of phased chromosomes, which may have general interest in various population genetic applications. In addition, this property may be exploited to interrogate the relationship between germline risk and cancer phenotype with greater sensitivity than is possible using unphased genotype. Finally, we exploit the statistical dependency of phased genotypes to enable the fitting of more elaborate sample-level error-model parameters, allowing more accurate estimation of HSCRs in cancer samples

Nature Precedings

Joint Haplotype Assembly and Genotype Calling via Sequential Monte Carlo Algorithm

Author: Ahn Soyeon
Vikalo Haris
Publication venue
Publication date: 01/07/2015
Field of study

Genetic variations predispose individuals to hereditary diseases, play important role in the development of complex diseases, and impact drug metabolism. The full information about the DNA variations in the genome of an individual is given by haplotypes, the ordered lists of single nucleotide polymorphisms (SNPs) located on chromosomes. Affordable high-throughput DNA sequencing technologies enable routine acquisition of data needed for the assembly of single individual haplotypes. However, state-of-the-art high-throughput sequencing platforms generate data that is erroneous, which induces uncertainty in the SNP and genotype calling procedures and, ultimately, adversely affect the accuracy of haplotyping. When inferring haplotype phase information, the vast majority of the existing techniques for haplotype assembly assume that the genotype information is correct. This motivates the development of methods capable of joint genotype calling and haplotype assembly. Results: We present a haplotype assembly algorithm, ParticleHap, that relies on a probabilistic description of the sequencing data to jointly infer genotypes and assemble the most likely haplotypes. Our method employs a deterministic sequential Monte Carlo algorithm that associates single nucleotide polymorphisms with haplotypes by exhaustively exploring all possible extensions of the partial haplotypes. The algorithm relies on genotype likelihoods rather than on often erroneously called genotypes, thus ensuring a more accurate assembly of the haplotypes. Results on both the 1000 Genomes Project experimental data as well as simulation studies demonstrate that the proposed approach enables highly accurate solutions to the haplotype assembly problem while being computationally efficient and scalable, generally outperforming existing methods in terms of both accuracy and speed. Conclusions: The developed probabilistic framework and sequential Monte Carlo algorithm enable joint haplotype assembly and genotyping in a computationally efficient manner. Our results demonstrate fast and highly accurate haplotype assembly aided by the re-examination of erroneously called genotypes.National Science Foundation CCF-1320273Electrical and Computer Engineerin

Springer - Publisher Connector

PubMed Central

Texas ScholarWorks

Algorithmic approaches for the single individual haplotyping problem

Author: Lancia Giuseppe
Publication venue: 'EDP Sciences'
Publication date: 01/01/2016
Field of study

Since its introduction in 2001, the Single Individual Haplotyping problem has received an ever-increasing attention from the scientific community. In this paper we survey, in the form of an annotated bibliography, the developments in the study of the problem from its origin until our days

Archivio istituzionale della ricerca - Università degli Studi di Udine

EDP Sciences OAI-PMH repository (1.2.0)

Numérisation de Documents Anciens Mathématiques

Haplotype-aware Diplotyping from Noisy Long Reads

Author: Ebler J.
Haukness M.
Marschall T.
Paten B.
Pesout T.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

MPG.PuRe

Haplotype estimation in polyploids using DNA sequence data

Author: Motazedi Ehsan
Publication venue: Wageningen University
Publication date: 01/01/2019
Field of study

Polyploid organisms possess more than two copies of their core genome and therefore contain k>2 haplotypes for each set of ordered genomic variants. Polyploidy occurs often within the plant kingdom, among others in important corps such as potato (k=4) and wheat (k=6). Current sequencing technologies enable us to read the DNA and detect genomic variants, but cannot distinguish between the copies of the genome, each inherited from one of the parents. To detect inheritance patterns in populations, it is necessary to know the haplotypes, as alleles that are in linkage over the same chromosome tend to be inherited together. In this work, we develop mathematical optimisation algorithms to indirectly estimate haplotypes by looking into overlaps between the sequence reads of an individual, as well as into the expected inheritance of the alleles in a population. These algorithm deal with sequencing errors and random variations in the counts of reads observed from each haplotype. These methods are therefore of high importance for studying the genetics of polyploid crops. </p

Wageningen University & Research Publications

Minimum error correction-based haplotype assembly: considerations for long read data

Author: de Ridder Dick
Kahaei Mohammad Hossein
Majidian Sina
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

The single nucleotide polymorphism (SNP) is the most widely studied type of genetic variation. A haplotype is defined as the sequence of alleles at SNP sites on each haploid chromosome. Haplotype information is essential in unravelling the genome-phenotype association. Haplotype assembly is a well-known approach for reconstructing haplotypes, exploiting reads generated by DNA sequencing devices. The Minimum Error Correction (MEC) metric is often used for reconstruction of haplotypes from reads. However, problems with the MEC metric have been reported. Here, we investigate the MEC approach to demonstrate that it may result in incorrectly reconstructed haplotypes for devices that produce error-prone long reads. Specifically, we evaluate this approach for devices developed by Illumina, Pacific BioSciences and Oxford Nanopore Technologies. We show that imprecise haplotypes may be reconstructed with a lower MEC than that of the exact haplotype. The performance of MEC is explored for different coverage levels and error rates of data. Our simulation results reveal that in order to avoid incorrect MEC-based haplotypes, a coverage of 25 is needed for reads generated by Pacific BioSciences RS systems.Comment: 17 pages, 6 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Identification of Variant Compositions in Related Strains Without Reference

Author: D Aguiar
D He
DS Correll
E Berger
F Deng
I Astrovskaya
J Neigenfind
JC Stephens
M Patterson
R Cilibrasi
R Lippert
R Tewhey
R Uricaru
S Bayzid
S Das
S Lin
SY Su
V Kuleshov
Z Chen
Publication venue: Springer International Publishing
Publication date: 01/01/2016
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

High performance computing for haplotyping: Models and platforms

Author: A Bracciali
A Rhoads
C Luo
D Maisto
D Sims
ES Lander
F Rodriguez
HJ Greenberg
J Hermisson
JC Na
K Zhang
KE McElroy
L Bianchi
L Rundo
M Jain
M Jain
M Patterson
MA Quail
MJ Daly
MW Nachman
O Delaneau
P Edge
PR Loh
R Wang
RJ Roberts
S Benedettini
S Das
S Levy
S Sheehan
SB Gabriel
SP Otto
SR Browning
TC Wang
V Bansal
V Kuleshov
V Kuleshov
Y Choi
Y Pirola
ZZ Chen
Publication venue: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publication date: 01/01/2019
Field of study

\u3cp\u3eThe reconstruction of the haplotype pair for each chromosome is a hot topic in Bioinformatics and Genome Analysis. In Haplotype Assembly (HA), all heterozygous Single Nucleotide Polymorphisms (SNPs) have to be assigned to exactly one of the two chromosomes. In this work, we outline the state-of-the-art on HA approaches and present an in-depth analysis of the computational performance of GenHap, a recent method based on Genetic Algorithms. GenHap was designed to tackle the computational complexity of the HA problem by means of a divide-et-impera strategy that effectively leverages multi-core architectures. In order to evaluate GenHap’s performance, we generated different instances of synthetic (yet realistic) data exploiting empirical error models of four different sequencing platforms (namely, Illumina NovaSeq, Roche/454, PacBio RS II and Oxford Nanopore Technologies MinION). Our results show that the processing time generally decreases along with the read length, involving a lower number of sub-problems to be distributed on multiple cores.\u3c/p\u3

Repository TU/e

Crossref

Apollo (Cambridge)