10 research outputs found

    cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate

    Get PDF
    Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor

    FABIA: factor analysis for bicluster acquisition

    Get PDF
    Motivation: Biclustering of transcriptomic data groups genes and samples simultaneously. It is emerging as a standard tool for extracting knowledge from gene expression measurements. We propose a novel generative approach for biclustering called ‘FABIA: Factor Analysis for Bicluster Acquisition’. FABIA is based on a multiplicative model, which accounts for linear dependencies between gene expression and conditions, and also captures heavy-tailed distributions as observed in real-world transcriptomic data. The generative framework allows to utilize well-founded model selection methods and to apply Bayesian techniques

    cn.FARMS: a probabilistic model to detect DNA copy numbers

    No full text
    Motivation: Existing pre-processing methods for DNA microarrays designed to detect copy number variations (CNVs) lead to high false discovery rates (FDRs). High FDRs misguide researchers especially in the medical context where CNVs are wrongly associated with diseases. We propose a probabilistic latent variable model, cn.FARMS, for array-based CNV analysis which controls the FDR without loss of sensitivity. At a DNA region, cn.FARMS constructs a model by a Bayesian maximum a posteriori estimation where the unobserved, latent variable represents the copy number that is measured by observed genetic markers (probes). The latent variable’s prior prefers parameters which represent the null hypothesis, (same copy number for all samples), from which the posterior can only deviate by a high information content in the data. The more probes agree on the region’s copy number, the less is the uncertainty about the latent variable’s value, the higher is the information content. 
Results: We compared cn.FARMS on a HapMap Mapping250K_Nsp and SNP6.0 benchmark data set to CRMAv2 and dChip. The comparison is based on the sex determination based on the data from the X chromosome, where males possess one copy and females two. The ROC curve serves to compare the FDR for different true positive rates. In both experiments cn.FARMS yielded the best classification results.
Availability: This approach is publicly available in R at http://www.bioinf.jku.at/softwar
    corecore