A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data

AB Olshen; AJ Iafrate; BE Stranger; C Alkan; C Xie; Chihyun Park; DF Conrad; DP Locke; E Ben-Yaacov; F Hormozdiari; F Picard; GH Perry; H Lee; H Park; H Willenbrock; J Huang; JA Berger; Jaegyoon Ahn; JC Marioni; JI Kim; K Bleakley; K Wang; KK Wong; M Wigler; NP Carter; NR Zhang; Olivier Lespinet; OM Rueda; P Hupe; PHC Eilers; QY Zhang; R Pique-Regi; R Pique-Regi; R Redon; R Tibshirani; RM Durbin; S Levy; Sanghyun Park; SJ Diskin; SP Shah; T LaFramboise; TS Price; TW Yu; WR Lai; Youngmi Yoon

A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data

Authors: AB Olshen
AJ Iafrate
BE Stranger
C Alkan
C Xie
Chihyun Park
DF Conrad
DP Locke
E Ben-Yaacov
F Hormozdiari
F Picard
GH Perry
H Lee
H Park
H Willenbrock
J Huang
JA Berger
Jaegyoon Ahn
JC Marioni
JI Kim
K Bleakley
K Wang
KK Wong
M Wigler
NP Carter
NR Zhang
Olivier Lespinet
OM Rueda
P Hupe
PHC Eilers
QY Zhang
R Pique-Regi
R Pique-Regi
R Redon
R Tibshirani
RM Durbin
S Levy
Sanghyun Park
SJ Diskin
SP Shah
T LaFramboise
TS Price
TW Yu
WR Lai
Youngmi Yoon
Publication date
Publisher: Public Library of Science
Doi

Abstract

BACKGROUND: It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS: We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR). CONCLUSIONS AND SIGNIFICANCE: We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Directory of Open Access Journals

oai:doaj.org/article:1a0562b54...

Last time updated on 14/10/2017

Crossref

info:doi/10.1371%2Fjournal.pon...

Last time updated on 18/03/2019