16 research outputs found

    A novel SNP analysis method to detect copy number alterations with an unbiased reference signal directly from tumor samples

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genomic instability in cancer leads to abnormal genome copy number alterations (CNA) as a mechanism underlying tumorigenesis. Using microarrays and other technologies, tumor CNA are detected by comparing tumor sample CN to normal reference sample CN. While advances in microarray technology have improved detection of copy number alterations, the increase in the number of measured signals, noise from array probes, variations in signal-to-noise ratio across batches and disparity across laboratories leads to significant limitations for the accurate identification of CNA regions when comparing tumor and normal samples.</p> <p>Methods</p> <p>To address these limitations, we designed a novel "Virtual Normal" algorithm (VN), which allowed for construction of an unbiased reference signal directly from test samples within an experiment using any publicly available normal reference set as a baseline thus eliminating the need for an in-lab normal reference set.</p> <p>Results</p> <p>The algorithm was tested using an optimal, paired tumor/normal data set as well as previously uncharacterized pediatric malignant gliomas for which a normal reference set was not available. Using Affymetrix 250K Sty microarrays, we demonstrated improved signal-to-noise ratio and detected significant copy number alterations using the VN algorithm that were validated by independent PCR analysis of the target CNA regions.</p> <p>Conclusions</p> <p>We developed and validated an algorithm to provide a virtual normal reference signal directly from tumor samples and minimize noise in the derivation of the raw CN signal. The algorithm reduces the variability of assays performed across different reagent and array batches, methods of sample preservation, multiple personnel, and among different laboratories. This approach may be valuable when matched normal samples are unavailable or the paired normal specimens have been subjected to variations in methods of preservation.</p

    Off-line detection of multiple change points with the Filtered Derivative with p-Value method

    Get PDF
    This paper deals with off-line detection of change points for time series of independent observations, when the number of change points is unknown. We propose a sequential analysis like method with linear time and memory complexity. Our method is based at first step, on Filtered Derivative method which detects the right change points but also false ones. We improve Filtered Derivative method by adding a second step in which we compute the p-values associated to each potential change points. Then we eliminate as false alarms the points which have p-value smaller than a given critical level. Next, our method is compared with the Penalized Least Square Criterion procedure on simulated data sets. Eventually, we apply Filtered Derivative with p-Value method to segmentation of heartbeat time series, and detection of change points in the average daily volume of financial time series

    Methylation profiling and evaluation of demethylating therapy in renal cell carcinoma.

    Get PDF
    BACKGROUND: Despite therapeutic advances in targeted therapy, metastatic renal cell carcinoma (RCC) remains incurable for the vast majority of patients. Key molecular events in the pathogenesis of RCC include inactivation of the VHL tumour suppressor gene (TSG), inactivation of chromosome 3p TSGs implicated in chromatin modification and remodelling and de novo tumour-specific promoter methylation of renal TSGs. In the light of these observations it can be proposed that, as in some haematological malignancies, demethylating agents such as azacitidine might be beneficial for the treatment of advanced RCC. RESULTS: Here we report that the treatment of RCC cell lines with azacitidine suppressed cell proliferation in all 15 lines tested. A marked response to azacitidine therapy (>50% reduction in colony formation assay) was detected in the three cell lines with VHL promoter methylation but some RCC cell lines without VHL TSG methylation also demonstrated a similar response suggesting that multiple methylated TSGs might determine the response to demethylating therapies. To identify novel candidate methylated TSGs implicated in RCC we undertook a combined analysis of copy number and CpG methylation array data. Candidate novel epigenetically inactivated TSGs were further prioritised by expression analysis of RCC cell lines pre and post-azacitidine therapy and comparative expression analysis of tumour/normal pairs. Thus, with subsequent investigation two candidate genes were found to be methylated in more than 25% of our series and in the TCGA methylation dataset for 199 RCC samples: RGS7 (25.6% and 35.2% of tumours respectively) and NEFM in (25.6% and 30.2%). In addition three candidate genes were methylated in >10% of both datasets (TMEM74 (15.4% and 14.6%), GCM2 (41.0% and 14.6%) and AEBP1 (30.8% and 13.1%)). Methylation of GCM2 (P = 0.0324), NEFM (P = 0.0024) and RGS7 (P = 0.0067) was associated with prognosis. CONCLUSIONS: These findings provide preclinical evidence that treatment with demethylating agents such as azacitidine might be useful for the treatment of advanced RCC and further insights into the role of epigenetic changes in the pathogenesis of RCC

    Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model

    Get PDF
    Abstract Background Copy number variants (CNVs) have been demonstrated to occur at a high frequency and are now widely believed to make a significant contribution to the phenotypic variation in human populations. Array-based comparative genomic hybridization (array-CGH) and newly developed read-depth approach through ultrahigh throughput genomic sequencing both provide rapid, robust, and comprehensive methods to identify CNVs on a whole-genome scale. Results We developed a Bayesian statistical analysis algorithm for the detection of CNVs from both types of genomic data. The algorithm can analyze such data obtained from PCR-based bacterial artificial chromosome arrays, high-density oligonucleotide arrays, and more recently developed high-throughput DNA sequencing. Treating parameters--e.g., the number of CNVs, the position of each CNV, and the data noise level--that define the underlying data generating process as random variables, our approach derives the posterior distribution of the genomic CNV structure given the observed data. Sampling from the posterior distribution using a Markov chain Monte Carlo method, we get not only best estimates for these unknown parameters but also Bayesian credible intervals for the estimates. We illustrate the characteristics of our algorithm by applying it to both synthetic and experimental data sets in comparison to other segmentation algorithms. Conclusions In particular, the synthetic data comparison shows that our method is more sensitive than other approaches at low false positive rates. Furthermore, given its Bayesian origin, our method can also be seen as a technique to refine CNVs identified by fast point-estimate methods and also as a framework to integrate array-CGH and sequencing data with other CNV-related biological knowledge, all through informative priors.</p

    Single-cell copy number variation detection

    Get PDF
    Detection of chromosomal aberrations from a single cell by array comparative genomic hybridization (single-cell array CGH), instead of from a population of cells, is an emerging technique. However, such detection is challenging because of the genome artifacts and the DNA amplification process inherent to the single cell approach. Current normalization algorithms result in inaccurate aberration detection for single-cell data. We propose a normalization method based on channel, genome composition and recurrent genome artifact corrections. We demonstrate that the proposed channel clone normalization significantly improves the copy number variation detection in both simulated and real single-cell array CGH data

    Wavelet-based identification of DNA focal genomic aberrations from single nucleotide polymorphism arrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Copy number aberrations (CNAs) are an important molecular signature in cancer initiation, development, and progression. However, these aberrations span a wide range of chromosomes, making it hard to distinguish cancer related genes from other genes that are not closely related to cancer but are located in broadly aberrant regions. With the current availability of high-resolution data sets such as single nucleotide polymorphism (SNP) microarrays, it has become an important issue to develop a computational method to detect driving genes related to cancer development located in the focal regions of CNAs.</p> <p>Results</p> <p>In this study, we introduce a novel method referred to as the wavelet-based identification of focal genomic aberrations (WIFA). The use of the wavelet analysis, because it is a multi-resolution approach, makes it possible to effectively identify focal genomic aberrations in broadly aberrant regions. The proposed method integrates multiple cancer samples so that it enables the detection of the consistent aberrations across multiple samples. We then apply this method to glioblastoma multiforme and lung cancer data sets from the SNP microarray platform. Through this process, we confirm the ability to detect previously known cancer related genes from both cancer types with high accuracy. Also, the application of this approach to a lung cancer data set identifies focal amplification regions that contain known oncogenes, though these regions are not reported using a recent CNAs detecting algorithm GISTIC: SMAD7 (chr18q21.1) and FGF10 (chr5p12).</p> <p>Conclusions</p> <p>Our results suggest that WIFA can be used to reveal cancer related genes in various cancer data sets.</p

    A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data

    Get PDF
    BACKGROUND: It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS: We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR). CONCLUSIONS AND SIGNIFICANCE: We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php

    Parsimonious Higher-Order Hidden Markov Models for Improved Array-CGH Analysis with Applications to Arabidopsis thaliana

    Get PDF
    Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM)
    corecore