13,307 research outputs found

    Probing local non-Gaussianities within a Bayesian framework

    Full text link
    Aims: We outline the Bayesian approach to inferring f_NL, the level of non-Gaussianity of local type. Phrasing f_NL inference in a Bayesian framework takes advantage of existing techniques to account for instrumental effects and foreground contamination in CMB data and takes into account uncertainties in the cosmological parameters in an unambiguous way. Methods: We derive closed form expressions for the joint posterior of f_NL and the reconstructed underlying curvature perturbation, Phi, and deduce the conditional probability densities for f_NL and Phi. Completing the inference problem amounts to finding the marginal density for f_NL. For realistic data sets the necessary integrations are intractable. We propose an exact Hamiltonian sampling algorithm to generate correlated samples from the f_NL posterior. For sufficiently high signal-to-noise ratios, we can exploit the assumption of weak non-Gaussianity to find a direct Monte Carlo technique to generate independent samples from the posterior distribution for f_NL. We illustrate our approach using a simplified toy model of CMB data for the simple case of a 1-D sky. Results: When applied to our toy problem, we find that, in the limit of high signal-to-noise, the sampling efficiency of the approximate algorithm outperforms that of Hamiltonian sampling by two orders of magnitude. When f_NL is not significantly constrained by the data, the more efficient, approximate algorithm biases the posterior density towards f_NL = 0.Comment: 11 pages, 7 figures. Accepted for publication in Astronomy and Astrophysic

    Calculation of Densities of States and Spectral Functions by Chebyshev Recursion and Maximum Entropy

    Full text link
    We present an efficient algorithm for calculating spectral properties of large sparse Hamiltonian matrices such as densities of states and spectral functions. The combination of Chebyshev recursion and maximum entropy achieves high energy resolution without significant roundoff error, machine precision or numerical instability limitations. If controlled statistical or systematic errors are acceptable, cpu and memory requirements scale linearly in the number of states. The inference of spectral properties from moments is much better conditioned for Chebyshev moments than for power moments. We adapt concepts from the kernel polynomial approximation, a linear Chebyshev approximation with optimized Gibbs damping, to control the accuracy of Fourier integrals of positive non-analytic functions. We compare the performance of kernel polynomial and maximum entropy algorithms for an electronic structure example.Comment: 8 pages RevTex, 3 postscript figure

    BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees

    Full text link
    The rising volume of datasets has made training machine learning (ML) models a major computational cost in the enterprise. Given the iterative nature of model and parameter tuning, many analysts use a small sample of their entire data during their initial stage of analysis to make quick decisions (e.g., what features or hyperparameters to use) and use the entire dataset only in later stages (i.e., when they have converged to a specific model). This sampling, however, is performed in an ad-hoc fashion. Most practitioners cannot precisely capture the effect of sampling on the quality of their model, and eventually on their decision-making process during the tuning phase. Moreover, without systematic support for sampling operators, many optimizations and reuse opportunities are lost. In this paper, we introduce BlinkML, a system for fast, quality-guaranteed ML training. BlinkML allows users to make error-computation tradeoffs: instead of training a model on their full data (i.e., full model), BlinkML can quickly train an approximate model with quality guarantees using a sample. The quality guarantees ensure that, with high probability, the approximate model makes the same predictions as the full model. BlinkML currently supports any ML model that relies on maximum likelihood estimation (MLE), which includes Generalized Linear Models (e.g., linear regression, logistic regression, max entropy classifier, Poisson regression) as well as PPCA (Probabilistic Principal Component Analysis). Our experiments show that BlinkML can speed up the training of large-scale ML tasks by 6.26x-629x while guaranteeing the same predictions, with 95% probability, as the full model.Comment: 22 pages, SIGMOD 201
    • …
    corecore