Search CORE

372 research outputs found

On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes

Author: Bonizzoni Paola
Dondi Riccardo
Klau Gunnar W.
Pirola Yuri
Pisanti Nadia
Zaccaria Simone
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2016
Field of study

International audienceFinding the global minimum energy conformation (GMEC) of a huge combinatorial search space is the key challenge in computational protein design (CPD) problems. Traditional algorithms lack a scalable and efficient distributed design scheme, preventing researchers from taking full advantage of current cloud infrastructures. We design cloud OSPREY (cOSPREY), an extension to a widely used protein design software OSPREY, to allow the original design framework to scale to the commercial cloud infrastructures. We propose several novel designs to integrate both algorithm and system optimizations, such as GMEC-specific pruning, state search partitioning, asynchronous algorithm state sharing, and fault tolerance. We evaluate cOSPREY on three different cloud platforms using different technologies and show that it can solve a number of large-scale protein design problems that have not been possible with previous approaches

VU Research Portal

Crossref

CWI's Institutional Repository

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data

Author: A Efros
A Williams
BL Browning
Bonnie Berger
D Aguiar
D Aguiar
D He
Deniz Yorukoglu
E Berger
Emily Berger
F Geraci
G Abecasis
Isidore Rigoutsos
Jian Peng
K Zhang
M Stephens
O Delaneau
P Scheet
R Lippert
SR Browning
V Bansal
V Bansal
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/10/2013
Field of study

As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studies have focused on diploid genomes and are rarely scalable to genomes with higher ploidy. Yet computational investigations into polyploid genomes carry great importance, impacting plant, yeast and fish genomics, as well as the studies of the evolution of modern-day eukaryotes and (epi)genetic interactions between copies of genes. In this paper, we describe a novel maximum-likelihood estimation framework, HapTree, for polyploid haplotype assembly of an individual genome using NGS read datasets. We evaluate the performance of HapTree on simulated polyploid sequencing read data modeled after Illumina sequencing technologies. For triploid and higher ploidy genomes, we demonstrate that HapTree substantially improves haplotype assembly accuracy and efficiency over the state-of-the-art; moreover, HapTree is the first scalable polyplotyping method for higher ploidy. As a proof of concept, we also test our method on real sequencing data from NA12878 (1000 Genomes Project) and evaluate the quality of assembled haplotypes with respect to trio-based diplotype annotation as the ground truth. The results indicate that HapTree significantly improves the switch accuracy within phased haplotype blocks as compared to existing haplotype assembly methods, while producing comparable minimum error correction (MEC) values. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.National Science Foundation (U.S.) (NSF/NIH BIGDATA Grant R01GM108348-01)National Science Foundation (U.S.) (Graduate Research Fellowship)Simons Foundatio

Public Library of Science (PLOS)

DSpace@MIT

Crossref

Directory of Open Access Journals

PubMed Central

Minimum error correction-based haplotype assembly: considerations for long read data

Author: de Ridder Dick
Kahaei Mohammad Hossein
Majidian Sina
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

The single nucleotide polymorphism (SNP) is the most widely studied type of genetic variation. A haplotype is defined as the sequence of alleles at SNP sites on each haploid chromosome. Haplotype information is essential in unravelling the genome-phenotype association. Haplotype assembly is a well-known approach for reconstructing haplotypes, exploiting reads generated by DNA sequencing devices. The Minimum Error Correction (MEC) metric is often used for reconstruction of haplotypes from reads. However, problems with the MEC metric have been reported. Here, we investigate the MEC approach to demonstrate that it may result in incorrectly reconstructed haplotypes for devices that produce error-prone long reads. Specifically, we evaluate this approach for devices developed by Illumina, Pacific BioSciences and Oxford Nanopore Technologies. We show that imprecise haplotypes may be reconstructed with a lower MEC than that of the exact haplotype. The performance of MEC is explored for different coverage levels and error rates of data. Our simulation results reveal that in order to avoid incorrect MEC-based haplotypes, a coverage of 25 is needed for reads generated by Pacific BioSciences RS systems.Comment: 17 pages, 6 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Haplotype estimation in polyploids using DNA sequence data

Author: Motazedi Ehsan
Publication venue: Wageningen University
Publication date: 01/01/2019
Field of study

Polyploid organisms possess more than two copies of their core genome and therefore contain k>2 haplotypes for each set of ordered genomic variants. Polyploidy occurs often within the plant kingdom, among others in important corps such as potato (k=4) and wheat (k=6). Current sequencing technologies enable us to read the DNA and detect genomic variants, but cannot distinguish between the copies of the genome, each inherited from one of the parents. To detect inheritance patterns in populations, it is necessary to know the haplotypes, as alleles that are in linkage over the same chromosome tend to be inherited together. In this work, we develop mathematical optimisation algorithms to indirectly estimate haplotypes by looking into overlaps between the sequence reads of an individual, as well as into the expected inheritance of the alleles in a population. These algorithm deal with sequencing errors and random variations in the counts of reads observed from each haplotype. These methods are therefore of high importance for studying the genetics of polyploid crops. </p

Wageningen University & Research Publications

De novo approaches to haplotype-aware genome assembly

Author: Baaijens J.A. (Jasmijn)
Publication venue
Publication date: 25/09/2019
Field of study

CWI's Institutional Repository

HapPart: partitioning algorithm for multiple haplotyping from haplotype conflict graph

Author: Abdullah Abu-Bakar Muhammad
Hossain Md. Monowar
Shill Pintu Chandra
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/06/2022
Field of study

Each chromosome in the human genome has two copies. The haplotype assembly challenge entails reconstructing two haplotypes (chromosomes) using aligned fragments genomic sequence. Plants viz. wheat, paddy and banana have more than two chromosomes. Multiple haplotype reconstruction has been a major research topic. For reconstructing multiple haplotypes for a polyploid organism, several approaches have been designed. The researchers are still fascinated to the computational challenge. This article introduces a partitioning algorithm, HapPart for dividing the fragments into k-groups focusing on reducing the computational time. HapPart uses minimum error correction curve to determine the value of k at which the growth of gain measures for two consecutive values of k-multiplied by its diversity is maximum. Haplotype conflict graph is used for constructing all possible number of groups. The dissimilarity between two haplotypes represents the distance between two nodes in graph. For merging two nodes with the minimum distance between them this algorithm ensures minimum error among fragments in same group. Experimental results on real and simulated data show that HapPart can partition fragments efficiently and with less computational time

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Identification of Variant Compositions in Related Strains Without Reference

Author: D Aguiar
D He
DS Correll
E Berger
F Deng
I Astrovskaya
J Neigenfind
JC Stephens
M Patterson
R Cilibrasi
R Lippert
R Tewhey
R Uricaru
S Bayzid
S Das
S Lin
SY Su
V Kuleshov
Z Chen
Publication venue: Springer International Publishing
Publication date: 01/01/2016
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph

Author: Asri Mobin
Cheng Haoyu
Koren Sergey
Li Heng
Lucas Julian
Publication venue
Publication date: 06/06/2023
Field of study

Despite recent advances in the length and the accuracy of long-read data, building haplotype-resolved genome assemblies from telomere to telomere still requires considerable computational resources. In this study, we present an efficient de novo assembly algorithm that combines multiple sequencing technologies to scale up population-wide telomere-to-telomere assemblies. By utilizing twenty-two human and two plant genomes, we demonstrate that our algorithm is around an order of magnitude cheaper than existing methods, while producing better diploid and haploid assemblies. Notably, our algorithm is the only feasible solution to the haplotype-resolved assembly of polyploid genomes.Comment: 14 pages, 4 fuhire

arXiv.org e-Print Archive

High performance computing for haplotyping: Models and platforms

Author: A Bracciali
A Rhoads
C Luo
D Maisto
D Sims
ES Lander
F Rodriguez
HJ Greenberg
J Hermisson
JC Na
K Zhang
KE McElroy
L Bianchi
L Rundo
M Jain
M Jain
M Patterson
MA Quail
MJ Daly
MW Nachman
O Delaneau
P Edge
PR Loh
R Wang
RJ Roberts
S Benedettini
S Das
S Levy
S Sheehan
SB Gabriel
SP Otto
SR Browning
TC Wang
V Bansal
V Kuleshov
V Kuleshov
Y Choi
Y Pirola
ZZ Chen
Publication venue: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publication date: 01/01/2019
Field of study

\u3cp\u3eThe reconstruction of the haplotype pair for each chromosome is a hot topic in Bioinformatics and Genome Analysis. In Haplotype Assembly (HA), all heterozygous Single Nucleotide Polymorphisms (SNPs) have to be assigned to exactly one of the two chromosomes. In this work, we outline the state-of-the-art on HA approaches and present an in-depth analysis of the computational performance of GenHap, a recent method based on Genetic Algorithms. GenHap was designed to tackle the computational complexity of the HA problem by means of a divide-et-impera strategy that effectively leverages multi-core architectures. In order to evaluate GenHap’s performance, we generated different instances of synthetic (yet realistic) data exploiting empirical error models of four different sequencing platforms (namely, Illumina NovaSeq, Roche/454, PacBio RS II and Oxford Nanopore Technologies MinION). Our results show that the processing time generally decreases along with the read length, involving a lower number of sub-problems to be distributed on multiple cores.\u3c/p\u3

Repository TU/e

Crossref

Apollo (Cambridge)