13,745 research outputs found
Robust Machine Learning Applied to Astronomical Datasets I: Star-Galaxy Classification of the SDSS DR3 Using Decision Trees
We provide classifications for all 143 million non-repeat photometric objects
in the Third Data Release of the Sloan Digital Sky Survey (SDSS) using decision
trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate
that these star/galaxy classifications are expected to be reliable for
approximately 22 million objects with r < ~20. The general machine learning
environment Data-to-Knowledge and supercomputing resources enabled extensive
investigation of the decision tree parameter space. This work presents the
first public release of objects classified in this way for an entire SDSS data
release. The objects are classified as either galaxy, star or nsng (neither
star nor galaxy), with an associated probability for each class. To demonstrate
how to effectively make use of these classifications, we perform several
important tests. First, we detail selection criteria within the probability
space defined by the three classes to extract samples of stars and galaxies to
a given completeness and efficiency. Second, we investigate the efficacy of the
classifications and the effect of extrapolating from the spectroscopic regime
by performing blind tests on objects in the SDSS, 2dF Galaxy Redshift and 2dF
QSO Redshift (2QZ) surveys. Given the photometric limits of our spectroscopic
training data, we effectively begin to extrapolate past our star-galaxy
training set at r ~ 18. By comparing the number counts of our training sample
with the classified sources, however, we find that our efficiencies appear to
remain robust to r ~ 20. As a result, we expect our classifications to be
accurate for 900,000 galaxies and 6.7 million stars, and remain robust via
extrapolation for a total of 8.0 million galaxies and 13.9 million stars.
[Abridged]Comment: 27 pages, 12 figures, to be published in ApJ, uses emulateapj.cl
Damped Lyman-alpha and Lyman Limit Absorbers in the Cold Dark Matter Model
We study the formation of damped \lya and Lyman limit absorbers in a
hierarchical clustering scenario using a gas dynamical simulation of an , cold dark matter universe. In the simulation, these high column density
systems are associated with forming galaxies. Damped \lya absorption, N_{HI}
\simgt 10^{20.2}\cm^{-2}, arises along lines of sight that pass near the
centers of relatively massive, dense protogalaxies. Lyman limit absorption,
10^{17}\cm^{-2} \simlt N_{HI} \simlt 10^{20.2}\cm^{-2}, develops on lines of
sight that pass through the outer parts of such objects or near the centers of
smaller protogalaxies. The number of Lyman limit systems is less than observed,
while the number of damped \lya systems is quite close to the observed
abundance. Damped absorbers are typically kpc in radius, but the
population has a large total cross section because the systems are much more
numerous than present day galaxies. Our results demonstrate that high
column density systems like those observed arise naturally in a hierarchical
theory of galaxy formation and that it is now possible to study these absorbers
directly from numerical simulations.Comment: compressed postscript, 12 pages including 2 embedded figures. A
version that also includes embedded Figure 1, a 6 Mbyte color postscript
image (which prints reasonable grey scale on a b/w printer) is available from
ftp://bessel.mps.ohio-state.edu/pub/dhw/Preprints Submitted to ApJ Letter
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
Leveraging OpenStack and Ceph for a Controlled-Access Data Cloud
While traditional HPC has and continues to satisfy most workflows, a new
generation of researchers has emerged looking for sophisticated, scalable,
on-demand, and self-service control of compute infrastructure in a cloud-like
environment. Many also seek safe harbors to operate on or store sensitive
and/or controlled-access data in a high capacity environment.
To cater to these modern users, the Minnesota Supercomputing Institute
designed and deployed Stratus, a locally-hosted cloud environment powered by
the OpenStack platform, and backed by Ceph storage. The subscription-based
service complements existing HPC systems by satisfying the following unmet
needs of our users: a) on-demand availability of compute resources, b)
long-running jobs (i.e., days), c) container-based computing with
Docker, and d) adequate security controls to comply with controlled-access data
requirements.
This document provides an in-depth look at the design of Stratus with respect
to security and compliance with the NIH's controlled-access data policy.
Emphasis is placed on lessons learned while integrating OpenStack and Ceph
features into a so-called "walled garden", and how those technologies
influenced the security design. Many features of Stratus, including tiered
secure storage with the introduction of a controlled-access data "cache",
fault-tolerant live-migrations, and fully integrated two-factor authentication,
depend on recent OpenStack and Ceph features.Comment: 7 pages, 5 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
Revising the age for the Baptistina asteroid family using WISE/NEOWISE data
We have used numerical routines to model the evolution of a simulated
Baptistina family to constrain its age in light of new measurements of the
diameters and albedos of family members from the Wide-field Infrared Survey
Explorer. We also investigate the effect of varying the assumed physical and
orbital parameters on the best-fitting age. We find that the physically allowed
range of assumed values for the density and thermal conductivity induces a
large uncertainty in the rate of evolution. When realistic uncertainties in the
family members' physical parameters are taken into account we find the
best-fitting age can fall anywhere in the range of 140-320 Myr. Without more
information on the physical properties of the family members it is difficult to
place a more firm constraint on Baptistina's age.Comment: 27 pages, 16 figures, accepted to Ap
Grids and the Virtual Observatory
We consider several projects from astronomy that benefit from the Grid paradigm and
associated technology, many of which involve either massive datasets or the federation
of multiple datasets. We cover image computation (mosaicking, multi-wavelength
images, and synoptic surveys); database computation (representation through XML,
data mining, and visualization); and semantic interoperability (publishing, ontologies,
directories, and service descriptions)
Photoionization, Numerical Resolution, and Galaxy Formation
Using cosmological simulations that incorporate gas dynamics and
gravitational forces, we investigate the influence of photoionization by a UV
radiation background on the formation of galaxies. In our highest resolution
simulations, we find that photoionization has essentially no effect on the
baryonic mass function of galaxies at , down to our resolution limit of
5e9 M_\sun. We do, however, find a strong interplay between the mass
resolution of a simulation and the microphysics included in the computation of
heating and cooling rates. At low resolution, a photoionizing background can
appear to suppress the formation of even relatively massive galaxies. However,
when the same initial conditions are evolved with a factor of eight better mass
resolution, this effect disappears. Our results demonstrate the need for care
in interpreting the results of cosmological simulations that incorporate
hydrodynamics and radiation physics. For example, we conclude that a simulation
with limited resolution may yield more realistic results if it ignores some
relevant physical processes, such as photoionization. At higher resolution, the
simulated population of massive galaxies is insensitive to the treatment of
photoionization and star formation, but it does depend significantly on the
amplitude of the initial density fluctuations. By , an cold
dark matter model normalized to produce the observed masses of present-day
clusters has already formed galaxies with baryon masses exceeding 1e11
M_\sun.Comment: 25 pages, w/ embedded figures. Submitted to ApJ. Also available at
http://www-astronomy.mps.ohio-state.edu/~dhw/Docs/preprints.htm
- …