On the power and the systematic biases of the detection of chromosomal inversions by paired-end genome sequencing

A Bashir; AA Hoffmann; AJ Iafrate; AM Hillmer; AW Pang; B Zeitouni; C Alkan; CB Krimbas; DC Richter; E Tuzun; F Hormozdiari; F Hormozdiari; H Li; H Stefansson; J Cao; J Sebat; J Wang; JC Roach; JM Kidd; JM Kidd; JO Korbel; JO Korbel; José Ignacio Lucas Lledó; K Chen; KF Manly; KJ McKernan; L Feuk; M Onishi-Seebacher; Mario Cáceres; P Medvedev; PJ Campbell; PJ Stephens; R Xi; S Suzuki; SM Ahn; SS Sindi; T Rausch; Y Jiang; ZD Zhang; Zhanjiang Liu

research

On the power and the systematic biases of the detection of chromosomal inversions by paired-end genome sequencing

Authors: A Bashir
AA Hoffmann
AJ Iafrate
AM Hillmer
AW Pang
B Zeitouni
C Alkan
CB Krimbas
DC Richter
E Tuzun
F Hormozdiari
F Hormozdiari
H Li
H Stefansson
J Cao
J Sebat
J Wang
JC Roach
JM Kidd
JM Kidd
JO Korbel
JO Korbel
José Ignacio Lucas Lledó
K Chen
KF Manly
KJ McKernan
L Feuk
M Onishi-Seebacher
Mario Cáceres
P Medvedev
PJ Campbell
PJ Stephens
R Xi
S Suzuki
SM Ahn
SS Sindi
T Rausch
Y Jiang
ZD Zhang
Zhanjiang Liu
Publication date: 1 January 2013
Publisher: 'Public Library of Science (PLoS)'
Doi

Abstract

One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, % of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions -SVDetect, GRIAL, and VariationHunter-, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects