As the next generation of large galaxy surveys come online, it is becoming
increasingly important to develop and understand the machine learning tools
that analyze big astronomical data. Neural networks are powerful and capable of
probing deep patterns in data, but must be trained carefully on large and
representative data sets. We developed and generated a new `hump' of the
Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project:
CAMELS-SAM, encompassing one thousand dark-matter only simulations of (100
h−1 cMpc)3 with different cosmological parameters (Ωm and
σ8) and run through the Santa Cruz semi-analytic model for galaxy
formation over a broad range of astrophysical parameters. As a proof-of-concept
for the power of this vast suite of simulated galaxies in a large volume and
broad parameter space, we probe the power of simple clustering summary
statistics to marginalize over astrophysics and constrain cosmology using
neural networks. We use the two-point correlation function, count-in-cells, and
the Void Probability Function, and probe non-linear and linear scales across
0.68< R <27h−1 cMpc. Our cosmological constraints cluster around
3-8% error on ΩM and σ8, and we explore the effect
of various galaxy selections, galaxy sampling, and choice of clustering
statistics on these constraints. We additionally explore how these clustering
statistics constrain and inform key stellar and galactic feedback parameters in
the Santa Cruz SAM. CAMELS-SAM has been publicly released alongside the rest of
CAMELS, and offers great potential to many applications of machine learning in
astrophysics: https://camels-sam.readthedocs.io.Comment: 40 pages, 22 figures (11 made of subfigures