Comparative analysis of algorithms for identifying copy number variation in array CGH data
- Publication date
- Publisher
Abstract
The aim of the study is comparative analysis of algorithms for identifying copy number variation in array CGH data. Circular Binary segmentation (CBS), CGH segmentation, Price —Smith—Waterman algorithm (the three algorithms are a part of ADaCGH tool) and Cluster Along Chromosome (CLAC) which is not publicly available were studied and used for the analysis of array CGH data. Simulated data set was generated by adding signals of various intensities such as I unit, 11 units and III units representing gains in the copy number (Copy Number Variation, CNV) at random positions corresponding to real probes covering human genome and by adding white Gaussian noise. The white Gaussian noise was generated with the help of program provided by Dr. Nadia Chuzhanova. Two studies of simulated data are carried out in this project (1) The aim of the first study was to investigate the minimum width of CNV (in probes / BAC clones) that can be detected by the all the algorithms for signal intensity of I, II and III units. (2) The aim of the second study was to detect the minimum gap (in probes/BAC clones) between two CNVs of width 2-12 probes /BAC clones by each algorithm for a signal intensity of I, II and 111 units. The results of the study showed that CGH segmentation and CBS can detect a CNV of minimum width 2 BAC clones/probes in simulated data set. However Price-Smith-Waterman algorithm failed to detect any generated CNV. Studies showed that the CNV detection by both algorithms is influenced by noise as well as the width of CNV and the intensity of CNV. The studies also showed that though CBS was able to detect most of the generated CNVs it is more efficient for detection of CNV having higher width. CGH segmentation was more efficient in detection of CNVs that have lower width. Thus CBS and CGH segmentation efficiently can be used for the analysis of array CGH data. The real data set was obtained from 24 malignant peripheral nerve sheath tumour samples and 3 neurofibromas samples (from 27 patients). This data was
then analysed with the help of CBS and CGH segmentation on the basis of the results obtained from simulated data set. In general, the ADaCGH is a very good tool for the analysis of array CGH data