381 research outputs found

    Bayesian Change Point Analysis of Copy Number Variants Using Human Next Generation Sequencing Data

    Get PDF
    Title from PDF of title page, viewed on June 8, 2015Dissertation advisor: Jie ChenVitaIncludes bibliographic references (pages 127-134)Thesis (Ph.D.)--Department of Mathematics and Statistics and School of Biological Sciences. University of Missouri--Kansas City, 2014Read count analysis is the principal strategy implemented in detection of copy number variants using human next generation sequencing (NGS) data. Read count data from NGS has been demonstrated to follow non homogeneous Poisson distributions. The current change point analysis methods for detection of copy number variants are based on normal distribution assumption and used ordinary normal approximation in their algorithms. To improve sensitivity and reduce false positive rate for detection of copy number variants, we developed three models: one Bayesian Anscombe normal approximation model for single genome, one Bayesian Poisson model for single genome, and a Bayesian Anscome normal approximation model for paired genome. The Bayesian statistics have been optimized for detection of change points and copy numbers at single and multiple change points through Monte Carlo simulations. Three R packages based on these models have been built up to simulate Poisson distribution data, estimate and display copy number variants in table and graphics. The high sensitivity and specificity of these models have been demonstrated in simulated read count data with known Poisson distribution and in human NGS read count data as well in comparison to other popular packages.Background -- Single genome Bayesian approaches in NGS read count analysis -- Normal approximation Batesian change point model for paired genomes -- Conclusion and future wor

    Precise Request Tracing and Performance Debugging for Multi-tier Services of Black Boxes

    Full text link
    As more and more multi-tier services are developed from commercial components or heterogeneous middleware without the source code available, both developers and administrators need a precise request tracing tool to help understand and debug performance problems of large concurrent services of black boxes. Previous work fails to resolve this issue in several ways: they either accept the imprecision of probabilistic correlation methods, or rely on knowledge of protocols to isolate requests in pursuit of tracing accuracy. This paper introduces a tool named PreciseTracer to help debug performance problems of multi-tier services of black boxes. Our contributions are two-fold: first, we propose a precise request tracing algorithm for multi-tier services of black boxes, which only uses application-independent knowledge; secondly, we present a component activity graph abstraction to represent causal paths of requests and facilitate end-to-end performance debugging. The low overhead and tolerance of noise make PreciseTracer a promising tracing tool for using on production systems

    Multiple Descent in the Multiple Random Feature Model

    Full text link
    Recent works have demonstrated a double descent phenomenon in over-parameterized learning. Although this phenomenon has been investigated by recent works, it has not been fully understood in theory. In this paper, we investigate the multiple descent phenomenon in a class of multi-component prediction models. We first consider a ''double random feature model'' (DRFM) concatenating two types of random features, and study the excess risk achieved by the DRFM in ridge regression. We calculate the precise limit of the excess risk under the high dimensional framework where the training sample size, the dimension of data, and the dimension of random features tend to infinity proportionally. Based on the calculation, we further theoretically demonstrate that the risk curves of DRFMs can exhibit triple descent. We then provide a thorough experimental study to verify our theory. At last, we extend our study to the ''multiple random feature model'' (MRFM), and show that MRFMs ensembling KK types of random features may exhibit (K+1)(K+1)-fold descent. Our analysis points out that risk curves with a specific number of descent generally exist in learning multi-component prediction models.Comment: 89 pages, 9 figures. Version 3 adds new description of triple descent in certain double random feature model, deletes the discussion of NTK regimes, and adds more literature reference
    • …
    corecore