1,267 research outputs found

    Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing

    Get PDF
    We propose a flexible change-point model for inhomogeneous Poisson Processes, which arise naturally from next-generation DNA sequencing, and derive score and generalized likelihood statistics for shifts in intensity functions. We construct a modified Bayesian information criterion (mBIC) to guide model selection, and point-wise approximate Bayesian confidence intervals for assessing the confidence in the segmentation. The model is applied to DNA Copy Number profiling with sequencing data and evaluated on simulated spike-in and real data sets.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS517 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A robust penalized method for the analysis of noisy DNA copy number data

    Get PDF
    BackgroundDeletions and amplifications of the human genomic DNA copy number are the causes ofnumerous diseases, such as, various forms of cancer. Therefore, the detection of DNAcopy number variations (CNV) is important in understanding the genetic basis of manydiseases. Various techniques and platforms have been developed for genome-wideanalysis of DNA copy number, such as, array-based comparative genomic hybridization(aCGH) and high-resolution mapping with high-density tiling oligonucleotide arrays.Since complicated biological and experimental processes are often associated with theseplatforms, data can be potentially contaminated by outliers.ResultsWe propose a penalized LAD regression model with the adaptive fused lasso penalty fordetecting CNV. This method contains robust properties and incorporates both the spatialdependence and sparsity of CNV into the analysis. Our simulation studies and real dataanalysis indicate that the proposed method can correctly detect the numbers and locationsof the true breakpoints while appropriately controlling the false positives.ConclusionsThe proposed method has three advantages for detecting CNV change points: it containsrobustness properties; incorporates both spatial dependence and sparsity; and estimatesthe true values at each marker accurately

    A robust penalized method for the analysis of noisy DNA copy number data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Deletions and amplifications of the human genomic DNA copy number are the causes of numerous diseases, such as, various forms of cancer. Therefore, the detection of DNA copy number variations (CNV) is important in understanding the genetic basis of many diseases. Various techniques and platforms have been developed for genome-wide analysis of DNA copy number, such as, array-based comparative genomic hybridization (aCGH) and high-resolution mapping with high-density tiling oligonucleotide arrays. Since complicated biological and experimental processes are often associated with these platforms, data can be potentially contaminated by outliers.</p> <p>Results</p> <p>We propose a penalized LAD regression model with the adaptive fused lasso penalty for detecting CNV. This method contains robust properties and incorporates both the spatial dependence and sparsity of CNV into the analysis. Our simulation studies and real data analysis indicate that the proposed method can correctly detect the numbers and locations of the true breakpoints while appropriately controlling the false positives.</p> <p>Conclusions</p> <p>The proposed method has three advantages for detecting CNV change points: it contains robustness properties; incorporates both spatial dependence and sparsity; and estimates the true values at each marker accurately.</p

    bcp: An R Package for Performing a Bayesian Analysis of Change Point Problems

    Get PDF
    Barry and Hartigan (1993) propose a Bayesian analysis for change point problems. We provide a brief summary of selected work on change point problems, both preceding and following Barry and Hartigan. We outline Barry and Hartigan's approach and offer a new R package, pkgbcp (Erdman and Emerson 2007), implementing their analysis. We discuss two frequentist alternatives to the Bayesian analysis, the recursive circular binary segmentation algorithm (Olshen and Venkatraman 2004) and the dynamic programming algorithm of (Bai and Perron 2003). We illustrate the application of bcp with economic and microarray data from the literature.

    Bayesian DNA copy number analysis

    Get PDF
    BACKGROUND: Some diseases, like tumors, can be related to chromosomal aberrations, leading to changes of DNA copy number. The copy number of an aberrant genome can be represented as a piecewise constant function, since it can exhibit regions of deletions or gains. Instead, in a healthy cell the copy number is two because we inherit one copy of each chromosome from each our parents. Bayesian Piecewise Constant Regression (BPCR) is a Bayesian regression method for data that are noisy observations of a piecewise constant function. The method estimates the unknown segment number, the endpoints of the segments and the value of the segment levels of the underlying piecewise constant function. The Bayesian Regression Curve (BRC) estimates the same data with a smoothing curve. However, in the original formulation, some estimators failed to properly determine the corresponding parameters. For example, the boundary estimator did not take into account the dependency among the boundaries and succeeded in estimating more than one breakpoint at the same position, losing segments. RESULTS: We derived an improved version of the BPCR (called mBPCR) and BRC, changing the segment number estimator and the boundary estimator to enhance the fitting procedure. We also proposed an alternative estimator of the variance of the segment levels, which is useful in case of data with high noise. Using artificial data, we compared the original and the modified version of BPCR and BRC with other regression methods, showing that our improved version of BPCR generally outperformed all the others. Similar results were also observed on real data. CONCLUSION: We propose an improved method for DNA copy number estimation, mBPCR, which performed very well compared to previously published algorithms. In particular, mBPCR was more powerful in the detection of the true position of the breakpoints and of small aberrations in very noisy data. Hence, from a biological point of view, our method can be very useful, for example, to find targets of genomic aberrations in clinical cancer samples

    Exploiting noise in array CGH data to improve detection of DNA copy number change

    Get PDF
    Developing effective methods for analyzing array-CGH data to detect chromosomal aberrations is very important for the diagnosis of pathogenesis of cancer and other diseases. Current analysis methods, being largely based on smoothing and/or segmentation, are not quite capable of detecting both the aberration regions and the boundary break points very accurately. Furthermore, when evaluating the accuracy of an algorithm for analyzing array-CGH data, it is commonly assumed that noise in the data follows normal distribution. A fundamental question is whether noise in array-CGH is indeed Gaussian, and if not, can one exploit the characteristics of noise to develop novel analysis methods that are capable of detecting accurately the aberration regions as well as the boundary break points simultaneously? By analyzing bacterial artificial chromosomes (BACs) arrays with an average 1 mb resolution, 19 k oligo arrays with the average probe spacing <100 kb and 385 k oligo arrays with the average probe spacing of about 6 kb, we show that when there are aberrations, noise in all three types of arrays is highly non-Gaussian and possesses long-range spatial correlations, and that such noise leads to worse performance of existing methods for detecting aberrations in array-CGH than the Gaussian noise case. We further develop a novel method, which has optimally exploited the character of the noise, and is capable of identifying both aberration regions as well as the boundary break points very accurately. Finally, we propose a new concept, posteriori signal-to-noise ratio (p-SNR), to assign certain confidence level to an aberration region and boundaries detected

    WaveCNV: allele-specific copy number alterations in primary tumors and xenograft models from next-generation sequencing.

    Get PDF
    MotivationCopy number variations (CNVs) are a major source of genomic variability and are especially significant in cancer. Until recently microarray technologies have been used to characterize CNVs in genomes. However, advances in next-generation sequencing technology offer significant opportunities to deduce copy number directly from genome sequencing data. Unfortunately cancer genomes differ from normal genomes in several aspects that make them far less amenable to copy number detection. For example, cancer genomes are often aneuploid and an admixture of diploid/non-tumor cell fractions. Also patient-derived xenograft models can be laden with mouse contamination that strongly affects accurate assignment of copy number. Hence, there is a need to develop analytical tools that can take into account cancer-specific parameters for detecting CNVs directly from genome sequencing data.ResultsWe have developed WaveCNV, a software package to identify copy number alterations by detecting breakpoints of CNVs using translation-invariant discrete wavelet transforms and assign digitized copy numbers to each event using next-generation sequencing data. We also assign alleles specifying the chromosomal ratio following duplication/loss. We verified copy number calls using both microarray (correlation coefficient 0.97) and quantitative polymerase chain reaction (correlation coefficient 0.94) and found them to be highly concordant. We demonstrate its utility in pancreatic primary and xenograft sequencing data.Availability and implementationSource code and executables are available at https://github.com/WaveCNV. The segmentation algorithm is implemented in MATLAB, and copy number assignment is implemented [email protected] informationSupplementary data are available at Bioinformatics online

    Learning smoothing models of copy number profiles using breakpoint annotations

    Get PDF
    Many models have been proposed to detect breakpoints in chromosomal copy number profiles, but it is usually not obvious to decide which is most effective for a given data set. Furthermore, most methods have a smoothing parameter that determines the number of breakpoints and must be chosen using various heuristics. We present three contributions toward automatic training of smoothing models. First, we propose to select the model and degree of smoothness that maximizes agreement with visual breakpoint region annotations. Second, we develop cross-validation procedures to estimate the error of the trained models. Third, we apply these methods to a new database of annotated neuroblastoma copy number profiles, which we make available as a public benchmark for testing new algorithms. Whereas previous studies have been qualitative or limited to simulated data, our approach is quantitative and suggests which algorithms are fastest and most accurate in practice on real data
    corecore