34 research outputs found
Mining for cosmological information: Simulation-based methods for Redshift Space Distortions and Galaxy Clustering
The standard model of cosmology describes the complex large scale structure of the Universe through less than 10 free parameters. However, concordance with observations requires that about 95\% of the energy content of the universe is invisible to us. Most of this energy is postulated to be in the form of a cosmological constant, , which drives the observed accelerated expansion of the Universe. Its nature is, however, unknown. This mystery forces cosmologists to look for inconsistencies between theory and data, searching for clues. But finding statistically significant contradictions requires extremely accurate measurements of the composition of the Universe, which are at present limited by our inability to extract all the information contained in the data, rather than being limited by the data itself. In this Thesis, we study how we can overcome these limitations by i) modelling how galaxies cluster on small scales with simulation-based methods, where perturbation theory fails to provide accurate predictions, and ii) developing summary statistics of the density field that are capable of extracting more information than the commonly used two-point functions. In the first half, we show how the real to redshift space mapping can be modelled accurately by going beyond the Gaussian approximation for the pairwise velocity distribution. We then show that simulation-based models can accurately predict the full shape of galaxy clustering in real space, increasing the constraining power on some of the cosmological parameters by a factor of 2 compared to perturbation theory methods. In the second half, we measure the information content of density dependent clustering. We show that it can improve the constraints on all cosmological parameters by factors between 3 and 8 over the two-point function. In particular, exploiting the environment dependence can constrain the mass of neutrinos by a factor of 8$ better than the two-point correlation function alone. We hope that the techniques described in this thesis will contribute to extracting all the cosmological information contained in ongoing and upcoming galaxy surveys, and provide insight into the nature of the accelerated expansion of the universe
A point cloud approach to generative modeling for galaxy surveys at the field level
We introduce a diffusion-based generative model to describe the distribution
of galaxies in our Universe directly as a collection of points in 3-D space
(coordinates) optionally with associated attributes (e.g., velocities and
masses), without resorting to binning or voxelization. The custom diffusion
model can be used both for emulation, reproducing essential summary statistics
of the galaxy distribution, as well as inference, by computing the conditional
likelihood of a galaxy field. We demonstrate a first application to massive
dark matter haloes in the Quijote simulation suite. This approach can be
extended to enable a comprehensive analysis of cosmological data, circumventing
limitations inherent to summary statistic -- as well as neural simulation-based
inference methods.Comment: 15+3 pages, 7+4 figure
Cosmological Field Emulation and Parameter Inference with Diffusion Models
Cosmological simulations play a crucial role in elucidating the effect of
physical parameters on the statistics of fields and on constraining parameters
given information on density fields. We leverage diffusion generative models to
address two tasks of importance to cosmology -- as an emulator for cold dark
matter density fields conditional on input cosmological parameters
and , and as a parameter inference model that can return constraints
on the cosmological parameters of an input field. We show that the model is
able to generate fields with power spectra that are consistent with those of
the simulated target distribution, and capture the subtle effect of each
parameter on modulations in the power spectrum. We additionally explore their
utility as parameter inference models and find that we can obtain tight
constraints on cosmological parameters.Comment: 7 pages, 5 figures, Accepted at the Machine Learning and the Physical
Sciences workshop, NeurIPS 202
Probabilistic reconstruction of Dark Matter fields from biased tracers using diffusion models
Galaxies are biased tracers of the underlying cosmic web, which is dominated
by dark matter components that cannot be directly observed. The relationship
between dark matter density fields and galaxy distributions can be sensitive to
assumptions in cosmology and astrophysical processes embedded in the galaxy
formation models, that remain uncertain in many aspects. Based on
state-of-the-art galaxy formation simulation suites with varied cosmological
parameters and sub-grid astrophysics, we develop a diffusion generative model
to predict the unbiased posterior distribution of the underlying dark matter
fields from the given stellar mass fields, while being able to marginalize over
the uncertainties in cosmology and galaxy formation
Learning an Effective Evolution Equation for Particle-Mesh Simulations Across Cosmologies
Particle-mesh simulations trade small-scale accuracy for speed compared to
traditional, computationally expensive N-body codes in cosmological
simulations. In this work, we show how a data-driven model could be used to
learn an effective evolution equation for the particles, by correcting the
errors of the particle-mesh potential incurred on small scales during
simulations. We find that our learnt correction yields evolution equations that
generalize well to new, unseen initial conditions and cosmologies. We further
demonstrate that the resulting corrected maps can be used in a simulation-based
inference framework to yield an unbiased inference of cosmological parameters.
The model, a network implemented in Fourier space, is exclusively trained on
the particle positions and velocities.Comment: 7 pages, 4 figures, Machine Learning and the Physical Sciences
Workshop, NeurIPS 202
MGLENS: Modified gravity weak lensing simulations for emulation-based cosmological inference
We present MGLENS, a large series of modified gravity lensing simulations tailored for cosmic shear data analyses and forecasts in which cosmological and modified gravity parameters are varied simultaneously. Based on the FORGE and BRIDGE N-body simulation suites presented in companion papers, we construct 100 × 5000 deg2 of mock Stage-IV lensing data from two 4D Latin hypercubes that sample cosmological and gravitational parameters in f(R) and nDGP gravity, respectively. These are then used to validate our inference analysis pipeline based on the lensing power spectrum, exploiting our implementation of these modified gravity models within the COSMOSIS cosmological inference package. Sampling this new likelihood, we find that cosmic shear can achieve 95 per cent CL constraints on the modified gravity parameters of log10[fR0 ] 0.09, after marginalizing over intrinsic alignments of galaxies and including scales up to = 5000. We also investigate the impact of photometric uncertainty, scale cuts, and covariance matrices. We finally explore the consequences of analysing MGLENS data with the wrong gravity model, and report catastrophic biases for a number of possible scenarios. The Stage-IV MGLENS simulations,the FORGE and BRIDGE emulators and the COSMOSIS interface modules will be made publicly available upon journal acceptance
Constraining CDM with density-split clustering
The dependence of galaxy clustering on local density provides an effective
method for extracting non-Gaussian information from galaxy surveys. The
two-point correlation function (2PCF) provides a complete statistical
description of a Gaussian density field. However, the late-time density field
becomes non-Gaussian due to non-linear gravitational evolution and higher-order
summary statistics are required to capture all of its cosmological information.
Using a Fisher formalism based on halo catalogues from the Quijote simulations,
we explore the possibility of retrieving this information using the
density-split clustering (DS) method, which combines clustering statistics from
regions of different environmental density. We show that DS provides more
precise constraints on the parameters of the CDM model compared to
the 2PCF, and we provide suggestions for where the extra information may come
from. DS improves the constraints on the sum of neutrino masses by a factor of
and by factors of 5, 3, 4, 6, and 6 for , , , ,
and , respectively. We compare DS statistics when the local density
environment is estimated from the real or redshift-space positions of haloes.
The inclusion of DS autocorrelation functions, in addition to the
cross-correlation functions between DS environments and haloes, recovers most
of the information that is lost when using the redshift-space halo positions to
estimate the environment. We discuss the possibility of constructing
simulation-based methods to model DS clustering statistics in different
scenarios.Comment: Submitted to MNRAS. Source code for all figures in the paper is
provided in the caption
Simulation-based Inference for Exoplanet Atmospheric Retrieval: Insights from winning the Ariel Data Challenge 2023 using Normalizing Flows
Advancements in space telescopes have opened new avenues for gathering vast
amounts of data on exoplanet atmosphere spectra. However, accurately extracting
chemical and physical properties from these spectra poses significant
challenges due to the non-linear nature of the underlying physics.
This paper presents novel machine learning models developed by the AstroAI
team for the Ariel Data Challenge 2023, where one of the models secured the top
position among 293 competitors. Leveraging Normalizing Flows, our models
predict the posterior probability distribution of atmospheric parameters under
different atmospheric assumptions.
Moreover, we introduce an alternative model that exhibits higher performance
potential than the winning model, despite scoring lower in the challenge. These
findings highlight the need to reevaluate the evaluation metric and prompt
further exploration of more efficient and accurate approaches for exoplanet
atmosphere spectra analysis.
Finally, we present recommendations to enhance the challenge and models,
providing valuable insights for future applications on real observational data.
These advancements pave the way for more effective and timely analysis of
exoplanet atmospheric properties, advancing our understanding of these distant
worlds.Comment: Conference proceeding for the ECML PKDD 202
A Parameter-Masked Mock Data Challenge for Beyond-Two-Point Galaxy Clustering Statistics
The last few years have seen the emergence of a wide array of novel
techniques for analyzing high-precision data from upcoming galaxy surveys,
which aim to extend the statistical analysis of galaxy clustering data beyond
the linear regime and the canonical two-point (2pt) statistics. We test and
benchmark some of these new techniques in a community data challenge
"Beyond-2pt", initiated during the Aspen 2022 Summer Program "Large-Scale
Structure Cosmology beyond 2-Point Statistics," whose first round of results we
present here. The challenge dataset consists of high-precision mock galaxy
catalogs for clustering in real space, redshift space, and on a light cone.
Participants in the challenge have developed end-to-end pipelines to analyze
mock catalogs and extract unknown ("masked") cosmological parameters of the
underlying CDM models with their methods. The methods represented are
density-split clustering, nearest neighbor statistics, BACCO power spectrum
emulator, void statistics, LEFTfield field-level inference using effective
field theory (EFT), and joint power spectrum and bispectrum analyses using both
EFT and simulation-based inference. In this work, we review the results of the
challenge, focusing on problems solved, lessons learned, and future research
needed to perfect the emerging beyond-2pt approaches. The unbiased parameter
recovery demonstrated in this challenge by multiple statistics and the
associated modeling and inference frameworks supports the credibility of
cosmology constraints from these methods. The challenge data set is publicly
available and we welcome future submissions from methods that are not yet
represented.Comment: New submissions welcome! Challenge data available at
https://github.com/ANSalcedo/Beyond2ptMoc