Haplotype Assembly: An Information Theoretic View

Si, Hongbo; Vikalo, Haris; Vishwanath, Sriram

research

Haplotype Assembly: An Information Theoretic View

Authors: Hongbo Si
Haris Vikalo
Sriram Vishwanath
Publication date: 11 May 2014
Publisher
Doi

Abstract

This paper studies the haplotype assembly problem from an information theoretic perspective. A haplotype is a sequence of nucleotide bases on a chromosome, often conveniently represented by a binary string, that differ from the bases in the corresponding positions on the other chromosome in a homologous pair. Information about the order of bases in a genome is readily inferred using short reads provided by high-throughput DNA sequencing technologies. In this paper, the recovery of the target pair of haplotype sequences using short reads is rephrased as a joint source-channel coding problem. Two messages, representing haplotypes and chromosome memberships of reads, are encoded and transmitted over a channel with erasures and errors, where the channel model reflects salient features of high-throughput sequencing. The focus of this paper is on the required number of reads for reliable haplotype reconstruction, and both the necessary and sufficient conditions are presented with order-wise optimal bounds.Comment: 30 pages, 5 figures, 1 tabel, journa

Similar works

Full text

Available Versions

Crossref

info:doi/10.1109%2Fitw.2014.69...

Last time updated on 17/03/2019