4 research outputs found

    Detecting Differential Expression from RNA-seq Data with Expression Measurement Uncertainty

    Full text link
    High-throughput RNA sequencing (RNA-seq) has emerged as a revolutionary and powerful technology for expression profiling. Most proposed methods for detecting differentially expressed (DE) genes from RNA-seq are based on statistics that compare normalized read counts between conditions. However, there are few methods considering the expression measurement uncertainty into DE detection. Moreover, most methods are only capable of detecting DE genes, and few methods are available for detecting DE isoforms. In this paper, a Bayesian framework (BDSeq) is proposed to detect DE genes and isoforms with consideration of expression measurement uncertainty. This expression measurement uncertainty provides useful information which can help to improve the performance of DE detection. Three real RAN-seq data sets are used to evaluate the performance of BDSeq and results show that the inclusion of expression measurement uncertainty improves accuracy in detection of DE genes and isoforms. Finally, we develop a GamSeq-BDSeq RNA-seq analysis pipeline to facilitate users, which is freely available at the website http://parnec.nuaa.edu.cn/liux/GSBD/GamSeq-BDSeq.html.Comment: 20 pages, 9 figure

    A Fast Algorithm for Robust Mixtures in the Presence of Measurement Errors

    No full text
    In experimental and observational sciences, detecting atypical, peculiar data from large sets of measurements has the potential of highlighting candidates of interesting new types of objects that deserve more detailed domain-specific followup study. However, measurement data is nearly never free of measurement errors. These errors can generate false outliers that are not truly interesting. Although many approaches exist for finding outliers, they have no means to tell to what extent the peculiarity is not simply due to measurement errors. To address this issue, we have developed a model-based approach to infer genuine outliers from multivariate data sets when measurement error information is available. This is based on a probabilistic mixture of hierarchical density models, in which parameter estimation is made feasible by a tree-structured variational expectation-maximization algorithm. Here, we further develop an algorithmic enhancement to address the scalability of this approach, in order to make it applicable to large data sets, via a K-dimensional-tree based partitioning of the variational posterior assignments. This creates a non-trivial tradeoff between a more detailed noise model to enhance the detection accuracy, and the coarsened posterior representation to obtain computational speedup. Hence, we conduct extensive experimental validation to study the accuracy/speed tradeoffs achievable in a variety of data conditions. We find that, at low-to-moderate error levels, a speedup factor that is at least linear in the number of data points can be achieved without significantly sacrificing the detection accuracy. The benefits of including measurement error information into the modeling is evident in all situations, and the gain roughly recovers the loss incurred by the speedup procedure in large error conditions. We analyze and discuss in detail the characteristics of our algorithm based on results obtained on appropriately designed synthetic data experimen- - ts, and we also demonstrate its working in a real application example

    A Fast Algorithm for Robust Mixtures in the Presence of Measurement Errors

    No full text
    In experimental and observational sciences, detecting atypical, peculiar data from large sets of measurements has the potential of highlighting candidates of interesting new types of objects that deserve more detailed domain-specific followup study. However, measurement data is nearly never free of measurement errors. These errors can generate false outliers that are not truly interesting. Although many approaches exist for finding outliers, they have no means to tell to what extent the peculiarity is not simply due to measurement errors. To address this issue, we have developed a model-based approach to infer genuine outliers from multivariate data sets when measurement error information is available. This is based on a probabilistic mixture of hierarchical density models, in which parameter estimation is made feasible by a tree-structured variational expectation-maximization algorithm. Here, we further develop an algorithmic enhancement to address the scalability of this approach, in order to make it applicable to large data sets, via a K-dimensional-tree based partitioning of the variational posterior assignments. This creates a non-trivial tradeoff between a more detailed noise model to enhance the detection accuracy, and the coarsened posterior representation to obtain computational speedup. Hence, we conduct extensive experimental validation to study the accuracy/speed tradeoffs achievable in a variety of data conditions. We find that, at low-to-moderate error levels, a speedup factor that is at least linear in the number of data points can be achieved without significantly sacrificing the detection accuracy. The benefits of including measurement error information into the modeling is evident in all situations, and the gain roughly recovers the loss incurred by the speedup procedure in large error conditions. We analyze and discuss in detail the characteristics of our algorithm based on results obtained on appropriately designed synthetic data experimen- - ts, and we also demonstrate its working in a real application example
    corecore