13,769 research outputs found
Probing local non-Gaussianities within a Bayesian framework
Aims: We outline the Bayesian approach to inferring f_NL, the level of
non-Gaussianity of local type. Phrasing f_NL inference in a Bayesian framework
takes advantage of existing techniques to account for instrumental effects and
foreground contamination in CMB data and takes into account uncertainties in
the cosmological parameters in an unambiguous way.
Methods: We derive closed form expressions for the joint posterior of f_NL
and the reconstructed underlying curvature perturbation, Phi, and deduce the
conditional probability densities for f_NL and Phi. Completing the inference
problem amounts to finding the marginal density for f_NL. For realistic data
sets the necessary integrations are intractable. We propose an exact
Hamiltonian sampling algorithm to generate correlated samples from the f_NL
posterior. For sufficiently high signal-to-noise ratios, we can exploit the
assumption of weak non-Gaussianity to find a direct Monte Carlo technique to
generate independent samples from the posterior distribution for f_NL. We
illustrate our approach using a simplified toy model of CMB data for the simple
case of a 1-D sky.
Results: When applied to our toy problem, we find that, in the limit of high
signal-to-noise, the sampling efficiency of the approximate algorithm
outperforms that of Hamiltonian sampling by two orders of magnitude. When f_NL
is not significantly constrained by the data, the more efficient, approximate
algorithm biases the posterior density towards f_NL = 0.Comment: 11 pages, 7 figures. Accepted for publication in Astronomy and
Astrophysic
Calculation of Densities of States and Spectral Functions by Chebyshev Recursion and Maximum Entropy
We present an efficient algorithm for calculating spectral properties of
large sparse Hamiltonian matrices such as densities of states and spectral
functions. The combination of Chebyshev recursion and maximum entropy achieves
high energy resolution without significant roundoff error, machine precision or
numerical instability limitations. If controlled statistical or systematic
errors are acceptable, cpu and memory requirements scale linearly in the number
of states. The inference of spectral properties from moments is much better
conditioned for Chebyshev moments than for power moments. We adapt concepts
from the kernel polynomial approximation, a linear Chebyshev approximation with
optimized Gibbs damping, to control the accuracy of Fourier integrals of
positive non-analytic functions. We compare the performance of kernel
polynomial and maximum entropy algorithms for an electronic structure example.Comment: 8 pages RevTex, 3 postscript figure
BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees
The rising volume of datasets has made training machine learning (ML) models
a major computational cost in the enterprise. Given the iterative nature of
model and parameter tuning, many analysts use a small sample of their entire
data during their initial stage of analysis to make quick decisions (e.g., what
features or hyperparameters to use) and use the entire dataset only in later
stages (i.e., when they have converged to a specific model). This sampling,
however, is performed in an ad-hoc fashion. Most practitioners cannot precisely
capture the effect of sampling on the quality of their model, and eventually on
their decision-making process during the tuning phase. Moreover, without
systematic support for sampling operators, many optimizations and reuse
opportunities are lost.
In this paper, we introduce BlinkML, a system for fast, quality-guaranteed ML
training. BlinkML allows users to make error-computation tradeoffs: instead of
training a model on their full data (i.e., full model), BlinkML can quickly
train an approximate model with quality guarantees using a sample. The quality
guarantees ensure that, with high probability, the approximate model makes the
same predictions as the full model. BlinkML currently supports any ML model
that relies on maximum likelihood estimation (MLE), which includes Generalized
Linear Models (e.g., linear regression, logistic regression, max entropy
classifier, Poisson regression) as well as PPCA (Probabilistic Principal
Component Analysis). Our experiments show that BlinkML can speed up the
training of large-scale ML tasks by 6.26x-629x while guaranteeing the same
predictions, with 95% probability, as the full model.Comment: 22 pages, SIGMOD 201
- …