16,607 research outputs found

    Measuring reproducibility of high-throughput experiments

    Full text link
    Reproducibility is essential to reliable scientific discovery in high-throughput experiments. In this work we propose a unified approach to measure the reproducibility of findings identified from replicate experiments and identify putative discoveries using reproducibility. Unlike the usual scalar measures of reproducibility, our approach creates a curve, which quantitatively assesses when the findings are no longer consistent across replicates. Our curve is fitted by a copula mixture model, from which we derive a quantitative reproducibility score, which we call the "irreproducible discovery rate" (IDR) analogous to the FDR. This score can be computed at each set of paired replicate ranks and permits the principled setting of thresholds both for assessing reproducibility and combining replicates. Since our approach permits an arbitrary scale for each replicate, it provides useful descriptive measures in a wide variety of situations to be explored. We study the performance of the algorithm using simulations and give a heuristic analysis of its theoretical properties. We demonstrate the effectiveness of our method in a ChIP-seq experiment.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS466 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Unifying review of linear gaussian models

    Get PDF
    Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model.We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models

    Noisy independent component analysis of auto-correlated components

    Full text link
    We present a new method for the separation of superimposed, independent, auto-correlated components from noisy multi-channel measurement. The presented method simultaneously reconstructs and separates the components, taking all channels into account and thereby increases the effective signal-to-noise ratio considerably, allowing separations even in the high noise regime. Characteristics of the measurement instruments can be included, allowing for application in complex measurement situations. Independent posterior samples can be provided, permitting error estimates on all desired quantities. Using the concept of information field theory, the algorithm is not restricted to any dimensionality of the underlying space or discretization scheme thereof

    Maximum-Likelihood Comparisons of Tully-Fisher and Redshift Data: Constraints on Omega and Biasing

    Full text link
    We compare Tully-Fisher (TF) data for 838 galaxies within cz=3000 km/sec from the Mark III catalog to the peculiar velocity and density fields predicted from the 1.2 Jy IRAS redshift survey. Our goal is to test the relation between the galaxy density and velocity fields predicted by gravitational instability theory and linear biasing, and thereby to estimate βI=Ω0.6/bI,\beta_I = \Omega^{0.6}/b_I, where bIb_I is the linear bias parameter for IRAS galaxies. Adopting the IRAS velocity and density fields as a prior model, we maximize the likelihood of the raw TF observables, taking into account the full range of selection effects and properly treating triple-valued zones in the redshift-distance relation. Extensive tests with realistic simulated galaxy catalogs demonstrate that the method produces unbiased estimates of βI\beta_I and its error. When we apply the method to the real data, we model the presence of a small but significant velocity quadrupole residual (~3.3% of Hubble flow), which we argue is due to density fluctuations incompletely sampled by IRAS. The method then yields a maximum likelihood estimate βI=0.49±0.07\beta_I=0.49\pm 0.07 (1-sigma error). We discuss the constraints on Ω\Omega and biasing that follow if we assume a COBE-normalized CDM power spectrum. Our model also yields the 1-D noise noise in the velocity field, including IRAS prediction errors, which we find to be be 125 +/- 20 km/sec.Comment: 53 pages, 20 encapsulated figures, two tables. Submitted to the Astrophysical Journal. Also available at http://astro.stanford.edu/jeff
    • …
    corecore