High performance computing for haplotyping: Models and platforms

A Bracciali; A Rhoads; C Luo; D Maisto; D Sims; ES Lander; F Rodriguez; HJ Greenberg; J Hermisson; JC Na; K Zhang; KE McElroy; L Bianchi; L Rundo; M Jain; M Jain; M Patterson; MA Quail; MJ Daly; MW Nachman; O Delaneau; P Edge; PR Loh; R Wang; RJ Roberts; S Benedettini; S Das; S Levy; S Sheehan; SB Gabriel; SP Otto; SR Browning; TC Wang; V Bansal; V Kuleshov; V Kuleshov; Y Choi; Y Pirola; ZZ Chen

High performance computing for haplotyping: Models and platforms

Authors: A Bracciali
A Rhoads
C Luo
D Maisto
D Sims
ES Lander
F Rodriguez
HJ Greenberg
J Hermisson
JC Na
K Zhang
KE McElroy
L Bianchi
L Rundo
M Jain
M Jain
M Patterson
MA Quail
MJ Daly
MW Nachman
O Delaneau
P Edge
PR Loh
R Wang
RJ Roberts
S Benedettini
S Das
S Levy
S Sheehan
SB Gabriel
SP Otto
SR Browning
TC Wang
V Bansal
V Kuleshov
V Kuleshov
Y Choi
Y Pirola
ZZ Chen
Publication date: 1 January 2019
Publisher: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Doi

Abstract

\u3cp\u3eThe reconstruction of the haplotype pair for each chromosome is a hot topic in Bioinformatics and Genome Analysis. In Haplotype Assembly (HA), all heterozygous Single Nucleotide Polymorphisms (SNPs) have to be assigned to exactly one of the two chromosomes. In this work, we outline the state-of-the-art on HA approaches and present an in-depth analysis of the computational performance of GenHap, a recent method based on Genetic Algorithms. GenHap was designed to tackle the computational complexity of the HA problem by means of a divide-et-impera strategy that effectively leverages multi-core architectures. In order to evaluate GenHap’s performance, we generated different instances of synthetic (yet realistic) data exploiting empirical error models of four different sequencing platforms (namely, Illumina NovaSeq, Roche/454, PacBio RS II and Oxford Nanopore Technologies MinION). Our results show that the processing time generally decreases along with the read length, involving a lower number of sub-problems to be distributed on multiple cores.\u3c/p\u3