Search CORE

39 research outputs found

CNV-CH: A Convex Hull Based Segmentation Approach to Detect Copy Number Variations (CNV) Using Next-Generation Sequencing Data

Author: Rajat K. De (242513)
Rituparna Sinha (785383)
Sandip Samaddar (785384)
Publication venue
Publication date: 20/08/2015
Field of study

<div>Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision.</div

Public Library of Science (PLOS)

Directory of Open Access Journals

FigShare

The overlapped percentage of base pairs of CNV regions detected by the aforesaid algorithms.

Author: Rajat K. De (242513)
Rituparna Sinha (785383)
Sandip Samaddar (785384)
Publication venue
Publication date
Field of study

The first column represents the CNVs detected by CNV-CH in the region 26200000bp–30000000bp of chromosome 20. The second column represents the corresponding percentage of overlap with the regions, reported in the Database of Genomic Variants (DGV), in the same chromosomal section. For a particular row, each entry shows the percentage of overlap of CNV-CH with DGV, EWT, cnMOPS, CNV-TV and CMDs respectively.The overlapped percentage of base pairs of CNV regions detected by the aforesaid algorithms.</p

FigShare

Results obtained by CNV-CH for the execution instance 1 of real data.

Author: Rajat K. De (242513)
Rituparna Sinha (785383)
Sandip Samaddar (785384)
Publication venue
Publication date
Field of study

The first column represents the samples taken into consideration for the execution instance 1. The second column of the table shows the total number of variants detected by CNV-CH, and the third column shows the number of detected variants that has been validated.Results obtained by CNV-CH for the execution instance 1 of real data.</p

FigShare

An instance of GC-corrected read count data, corresponding to the genomic region 1240000bp–1280000bp, of 2 test samples, as represented in (a) and (b).

Author: Rajat K. De (242513)
Rituparna Sinha (785383)
Sandip Samaddar (785384)
Publication venue
Publication date
Field of study

The copy number variation as duplication, was introduced in the genomic segment 1257695bp–1263695bp as represented in (a). The other sample has no variation in this region, as represented in (b).</p

FigShare

The overall precision and sensitivity of CNV-CH, EWT, cnMOPS, CMDs and CNV-TV on the basis of their performance on simulated data set.

Author: Rajat K. De (242513)
Rituparna Sinha (785383)
Sandip Samaddar (785384)
Publication venue
Publication date
Field of study

The black bar shows the precision value in [0, 1] and the gray bar shows the sensitivity value in [0, 1].</p

FigShare

The box plot based analysis of the detected length of variations, obtained from simulation results on the implanted CNV length of 1kbp, 3kbp and 6kbp with 10 samples.

Author: Rajat K. De (242513)
Rituparna Sinha (785383)
Sandip Samaddar (785384)
Publication venue
Publication date
Field of study

(a) represents the length (kbp) of the regions with variations, detected by CNV-CH in a set of 100 trials, for all combinations of coverage (C) and copy number (Z), and at fixed implanted variant length of 1kbp. The horizontal solid line within a box indicates the median variant length detected with a set of 100 trials, and the +’s indicate the outliers. (b) represents the length (kbp) of the regions with variations, detected by CNV-CH in a set of 100 trials, for all combinations of coverage (C) and copy number (Z), and at fixed implanted variant length of 3kbp, whereas in (c), the implanted variant length is 6kbp.</p

FigShare

The mappability score of the reference sequence and its correction, of a trial instance.

Author: Rajat K. De (242513)
Rituparna Sinha (785383)
Sandip Samaddar (785384)
Publication venue
Publication date
Field of study

(a). The mappability score varies widely in [0, 1], corresponding to high to low repeat rich region in the genome, respectively. (b) and (c), depicts the smoothed read count signal obtained after removing the mappability bias from the GC-corrected read count data.</p

FigShare

The breakpoints achieved across the genomic region 1240000bp–1280000bp.

Author: Rajat K. De (242513)
Rituparna Sinha (785383)
Sandip Samaddar (785384)
Publication venue
Publication date
Field of study

The start bin and end bin of the genomic region having copy number changes, has a high value of △.</p

FigShare

The overall precision and sensitivity of CNV-CH, EWT, cnMOPS, CMDs and CNV-TV, on the basis of their performance on the real data set considered in our work.

Author: Rajat K. De (242513)
Rituparna Sinha (785383)
Sandip Samaddar (785384)
Publication venue
Publication date
Field of study

The black bar shows the precision value in [0, 1] and the gray bar shows the sensitivity value in [0, 1].</p

FigShare