6 research outputs found

    LimberJack.jl: auto-differentiable methods for angular power spectra analyses

    Full text link
    We present LimberJack.jl, a fully auto-differentiable code for cosmological analyses of 2 point auto- and cross-correlation measurements from galaxy clustering, CMB lensing and weak lensing data written in Julia. Using Julia's auto-differentiation ecosystem, LimberJack.jl can obtain gradients for its outputs up to an order of magnitude faster than traditional finite difference methods. This makes LimberJack.jl greatly synergistic with gradient-based sampling methods, such as Hamiltonian Monte Carlo, capable of efficiently exploring parameter spaces with hundreds of dimensions. We first prove LimberJack.jl's reliability by reanalysing the DES Y1 3×\times2-point data. We then showcase its capabilities by using a O(100) parameters Gaussian Process to reconstruct the cosmic growth from a combination of DES Y1 galaxy clustering and weak lensing data, eBOSS QSO's, CMB lensing and redshift-space distortions. Our Gaussian process reconstruction of the growth factor is statistically consistent with the Λ\LambdaCDM Planck 2018 prediction at all redshifts. Moreover, we show that the addition of RSD data is extremely beneficial to this type of analysis, reducing the uncertainty in the reconstructed growth factor by 20%20\% on average across redshift. LimberJack.jl is a fully open-source project available on Julia's general repository of packages and GitHub.Comment: Prepared for OJA. Fixed minor typos. Comments welcomed

    Kernel-based emulator for the 3D matter power spectrum from CLASS

    Get PDF
    The 3D matter power spectrum, is a fundamental quantity in the analysis of cosmological data such as large-scale structure, 21 cm observations, and weak lensing. Existing computer models (Boltzmann codes) such as CLASS can provide it at the expense of immoderate computational cost. In this paper, we propose a fast Bayesian method to generate the 3D matter power spectrum, for a given set of wavenumbers, and redshifts, . Our code allows one to calculate the following quantities: the linear matter power spectrum at a given redshift (the default is set to 0); the non-linear 3D matter power spectrum with/without baryon feedback; the weak lensing power spectrum. The gradient of the 3D matter power spectrum with respect to the input cosmological parameters is also returned and this is useful for Hamiltonian Monte Carlo samplers. The derivatives are also useful for Fisher matrix calculations. In our application, the emulator is accurate when evaluated at a set of cosmological parameters, drawn from the prior, with the fractional uncertainty, centred on 0. It is also times faster compared to CLASS, hence making the emulator amenable to sampling cosmological and nuisance parameters in a Monte Carlo routine. In addition, once the 3D matter power spectrum is calculated, it can be used with a specific redshift distribution, to calculate the weak lensing and intrinsic alignment power spectra, which can then be used to derive constraints on cosmological parameters in a weak lensing data analysis problem. The software (emuPK) can be trained with any set of points and is distributed on Github, and comes with a pre-trained set of Gaussian Process (GP) models, based on 1000 Latin Hypercube (LH) samples, which follow roughly the current priors for current weak lensing analyses

    Extreme data compression for Bayesian model comparison

    No full text
    We develop extreme data compression for use in Bayesian model comparison via the MOPED algorithm, as well as more general score compression. We find that Bayes factors from data compressed with the MOPED algorithm are identical to those from their uncompressed datasets when the models are linear and the errors Gaussian. In other nonlinear cases, whether nested or not, we find negligible differences in the Bayes factors, and show this explicitly for the Pantheon-SH0ES supernova dataset. We also investigate the sampling properties of the Bayesian Evidence as a frequentist statistic, and find that extreme data compression reduces the sampling variance of the Evidence, but has no impact on the sampling distribution of Bayes factors. Since model comparison can be a very computationally-intensive task, MOPED extreme data compression may present significant advantages in computational time

    Parameter Inference for Weak Lensing using Gaussian Processes and MOPED

    No full text
    In this paper, we propose a Gaussian Process (GP) emulator for the calculation both of tomographic weak lensing band powers, and of coefficients of summary data massively compressed with the MOPED algorithm. In the former case cosmological parameter inference is accelerated by a factor of ∼10–30 compared with Boltzmann solver class applied to KiDS-450 weak lensing data. Much larger gains of order 103 will come with future data, and MOPED with GPs will be fast enough to permit the Limber approximation to be dropped, with acceleration in this case of ∼105. A potential advantage of GPs is that an error on the emulated function can be computed and this uncertainty incorporated into the likelihood. However, it is known that the GP error can be unreliable when applied to deterministic functions, and we find, using the Kullback–Leibler divergence between the emulator and class likelihoods, and from the uncertainties on the parameters, that agreement is better when the GP uncertainty is not used. In future, weak lensing surveys such as Euclid, and the Legacy Survey of Space and Time, will have up to ∼104 summary statistics, and inference will be correspondingly more challenging. However, since the speed of MOPED is determined not the number of summary data, but by the number of parameters, MOPED analysis scales almost perfectly, provided that a fast way to compute the theoretical MOPED coefficients is available. The GP provides such a fast mechanism

    Comparing Multiclass, Binary, and Hierarchical Machine Learning Classification schemes for variable stars

    Get PDF
    Upcoming synoptic surveys are set to generate an unprecedented amount of data. This requires an automatic framework that can quickly and efficiently provide classification labels for several new object classification challenges. Using data describing 11 types of variable stars from the Catalina Real-Time Transient Survey (CRTS), we illustrate how to capture the most important information from computed features and describe detailed methods of how to robustly use information theory for feature selection and evaluation. We apply three machine learning algorithms and demonstrate how to optimize these classifiers via cross-validation techniques. For the CRTS data set, we find that the random forest classifier performs best in terms of balanced accuracy and geometric means. We demonstrate substantially improved classification results by converting the multiclass problem into a binary classification task, achieving a balanced-accuracy rate of ∼99 per cent for the classification of δ Scuti and anomalous Cepheids. Additionally, we describe how classification performance can be improved via converting a ‘flat multiclass’ problem into a hierarchical taxonomy. We develop a new hierarchical structure and propose a new set of classification features, enabling the accurate identification of subtypes of Cepheids, RR Lyrae, and eclipsing binary stars in CRTS data

    Imbalance learning for variable star classification

    Get PDF
    The accurate automated classification of variable stars into their respective subtypes is difficult. Machine learning–based solutions often fall foul of the imbalanced learning problem, which causes poor generalization performance in practice, especially on rare variable star subtypes. In previous work, we attempted to overcome such deficiencies via the development of a hierarchical machine learning classifier. This ‘algorithm-level’ approach to tackling imbalance yielded promising results on Catalina Real-Time Survey (CRTS) data, outperforming the binary and multiclass classification schemes previously applied in this area. In this work, we attempt to further improve hierarchical classification performance by applying ‘data-level’ approaches to directly augment the training data so that they better describe underrepresented classes. We apply and report results for three data augmentation methods in particular: Randomly Augmented Sampled Light curves from magnitude Error (RASLE), augmenting light curves with Gaussian Process modelling (GpFit) and the Synthetic Minority Oversampling Technique (SMOTE). When combining the ‘algorithm-level’ (i.e. the hierarchical scheme) together with the ‘data-level’ approach, we further improve variable star classification accuracy by 1–4 per cent. We found that a higher classification rate is obtained when using GpFit in the hierarchical model. Further improvement of the metric scores requires a better standard set of correctly identified variable stars, and perhaps enhanced features are needed
    corecore