3,353 research outputs found
Exploring galaxy evolution with generative models
Context. Generative models open up the possibility to interrogate scientific
data in a more data-driven way. Aims: We propose a method that uses generative
models to explore hypotheses in astrophysics and other areas. We use a neural
network to show how we can independently manipulate physical attributes by
encoding objects in latent space. Methods: By learning a latent space
representation of the data, we can use this network to forward model and
explore hypotheses in a data-driven way. We train a neural network to generate
artificial data to test hypotheses for the underlying physical processes.
Results: We demonstrate this process using a well-studied process in
astrophysics, the quenching of star formation in galaxies as they move from
low-to high-density environments. This approach can help explore astrophysical
and other phenomena in a way that is different from current methods based on
simulations and observations.Comment: Published in A&A. For code and further details, see
http://space.ml/proj/explor
A Preferential Attachment Model for the Stellar Initial Mass Function
Accurate specification of a likelihood function is becoming increasingly
difficult in many inference problems in astronomy. As sample sizes resulting
from astronomical surveys continue to grow, deficiencies in the likelihood
function lead to larger biases in key parameter estimates. These deficiencies
result from the oversimplification of the physical processes that generated the
data, and from the failure to account for observational limitations.
Unfortunately, realistic models often do not yield an analytical form for the
likelihood. The estimation of a stellar initial mass function (IMF) is an
important example. The stellar IMF is the mass distribution of stars initially
formed in a given cluster of stars, a population which is not directly
observable due to stellar evolution and other disruptions and observational
limitations of the cluster. There are several difficulties with specifying a
likelihood in this setting since the physical processes and observational
challenges result in measurable masses that cannot legitimately be considered
independent draws from an IMF. This work improves inference of the IMF by using
an approximate Bayesian computation approach that both accounts for
observational and astrophysical effects and incorporates a physically-motivated
model for star cluster formation. The methodology is illustrated via a
simulation study, demonstrating that the proposed approach can recover the true
posterior in realistic situations, and applied to observations from
astrophysical simulation data
A high-reproducibility and high-accuracy method for automated topic classification
Much of human knowledge sits in large databases of unstructured text.
Leveraging this knowledge requires algorithms that extract and record metadata
on unstructured text documents. Assigning topics to documents will enable
intelligent search, statistical characterization, and meaningful
classification. Latent Dirichlet allocation (LDA) is the state-of-the-art in
topic classification. Here, we perform a systematic theoretical and numerical
analysis that demonstrates that current optimization techniques for LDA often
yield results which are not accurate in inferring the most suitable model
parameters. Adapting approaches for community detection in networks, we propose
a new algorithm which displays high-reproducibility and high-accuracy, and also
has high computational efficiency. We apply it to a large set of documents in
the English Wikipedia and reveal its hierarchical structure. Our algorithm
promises to make "big data" text analysis systems more reliable.Comment: 23 pages, 24 figure
Prototype selection for parameter estimation in complex models
Parameter estimation in astrophysics often requires the use of complex
physical models. In this paper we study the problem of estimating the
parameters that describe star formation history (SFH) in galaxies. Here,
high-dimensional spectral data from galaxies are appropriately modeled as
linear combinations of physical components, called simple stellar populations
(SSPs), plus some nonlinear distortions. Theoretical data for each SSP is
produced for a fixed parameter vector via computer modeling. Though the
parameters that define each SSP are continuous, optimizing the signal model
over a large set of SSPs on a fine parameter grid is computationally infeasible
and inefficient. The goal of this study is to estimate the set of parameters
that describes the SFH of each galaxy. These target parameters, such as the
average ages and chemical compositions of the galaxy's stellar populations, are
derived from the SSP parameters and the component weights in the signal model.
Here, we introduce a principled approach of choosing a small basis of SSP
prototypes for SFH parameter estimation. The basic idea is to quantize the
vector space and effective support of the model components. In addition to
greater computational efficiency, we achieve better estimates of the SFH target
parameters. In simulations, our proposed quantization method obtains a
substantial improvement in estimating the target parameters over the common
method of employing a parameter grid. Sparse coding techniques are not
appropriate for this problem without proper constraints, while constrained
sparse coding methods perform poorly for parameter estimation because their
objective is signal reconstruction, not estimation of the target parameters.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS500 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Deep Fluids: A Generative Network for Parameterized Fluid Simulations
This paper presents a novel generative model to synthesize fluid simulations
from a set of reduced parameters. A convolutional neural network is trained on
a collection of discrete, parameterizable fluid simulation velocity fields. Due
to the capability of deep learning architectures to learn representative
features of the data, our generative model is able to accurately approximate
the training data set, while providing plausible interpolated in-betweens. The
proposed generative model is optimized for fluids by a novel loss function that
guarantees divergence-free velocity fields at all times. In addition, we
demonstrate that we can handle complex parameterizations in reduced spaces, and
advance simulations in time by integrating in the latent space with a second
network. Our method models a wide variety of fluid behaviors, thus enabling
applications such as fast construction of simulations, interpolation of fluids
with different parameters, time re-sampling, latent space simulations, and
compression of fluid simulation data. Reconstructed velocity fields are
generated up to 700x faster than re-simulating the data with the underlying CPU
solver, while achieving compression rates of up to 1300x.Comment: Computer Graphics Forum (Proceedings of EUROGRAPHICS 2019),
additional materials: http://www.byungsoo.me/project/deep-fluids
- …