13,745 research outputs found

    Robust Machine Learning Applied to Astronomical Datasets I: Star-Galaxy Classification of the SDSS DR3 Using Decision Trees

    Get PDF
    We provide classifications for all 143 million non-repeat photometric objects in the Third Data Release of the Sloan Digital Sky Survey (SDSS) using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with r < ~20. The general machine learning environment Data-to-Knowledge and supercomputing resources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness and efficiency. Second, we investigate the efficacy of the classifications and the effect of extrapolating from the spectroscopic regime by performing blind tests on objects in the SDSS, 2dF Galaxy Redshift and 2dF QSO Redshift (2QZ) surveys. Given the photometric limits of our spectroscopic training data, we effectively begin to extrapolate past our star-galaxy training set at r ~ 18. By comparing the number counts of our training sample with the classified sources, however, we find that our efficiencies appear to remain robust to r ~ 20. As a result, we expect our classifications to be accurate for 900,000 galaxies and 6.7 million stars, and remain robust via extrapolation for a total of 8.0 million galaxies and 13.9 million stars. [Abridged]Comment: 27 pages, 12 figures, to be published in ApJ, uses emulateapj.cl

    Damped Lyman-alpha and Lyman Limit Absorbers in the Cold Dark Matter Model

    Full text link
    We study the formation of damped \lya and Lyman limit absorbers in a hierarchical clustering scenario using a gas dynamical simulation of an Ω=1\Omega = 1, cold dark matter universe. In the simulation, these high column density systems are associated with forming galaxies. Damped \lya absorption, N_{HI} \simgt 10^{20.2}\cm^{-2}, arises along lines of sight that pass near the centers of relatively massive, dense protogalaxies. Lyman limit absorption, 10^{17}\cm^{-2} \simlt N_{HI} \simlt 10^{20.2}\cm^{-2}, develops on lines of sight that pass through the outer parts of such objects or near the centers of smaller protogalaxies. The number of Lyman limit systems is less than observed, while the number of damped \lya systems is quite close to the observed abundance. Damped absorbers are typically 10\sim 10 kpc in radius, but the population has a large total cross section because the systems are much more numerous than present day LL_* galaxies. Our results demonstrate that high column density systems like those observed arise naturally in a hierarchical theory of galaxy formation and that it is now possible to study these absorbers directly from numerical simulations.Comment: compressed postscript, 12 pages including 2 embedded figures. A version that also includes embedded Figure 1, a 6 Mbyte color postscript image (which prints reasonable grey scale on a b/w printer) is available from ftp://bessel.mps.ohio-state.edu/pub/dhw/Preprints Submitted to ApJ Letter

    Data Mining and Machine Learning in Astronomy

    Full text link
    We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

    Leveraging OpenStack and Ceph for a Controlled-Access Data Cloud

    Full text link
    While traditional HPC has and continues to satisfy most workflows, a new generation of researchers has emerged looking for sophisticated, scalable, on-demand, and self-service control of compute infrastructure in a cloud-like environment. Many also seek safe harbors to operate on or store sensitive and/or controlled-access data in a high capacity environment. To cater to these modern users, the Minnesota Supercomputing Institute designed and deployed Stratus, a locally-hosted cloud environment powered by the OpenStack platform, and backed by Ceph storage. The subscription-based service complements existing HPC systems by satisfying the following unmet needs of our users: a) on-demand availability of compute resources, b) long-running jobs (i.e., >30> 30 days), c) container-based computing with Docker, and d) adequate security controls to comply with controlled-access data requirements. This document provides an in-depth look at the design of Stratus with respect to security and compliance with the NIH's controlled-access data policy. Emphasis is placed on lessons learned while integrating OpenStack and Ceph features into a so-called "walled garden", and how those technologies influenced the security design. Many features of Stratus, including tiered secure storage with the introduction of a controlled-access data "cache", fault-tolerant live-migrations, and fully integrated two-factor authentication, depend on recent OpenStack and Ceph features.Comment: 7 pages, 5 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, US

    Revising the age for the Baptistina asteroid family using WISE/NEOWISE data

    Get PDF
    We have used numerical routines to model the evolution of a simulated Baptistina family to constrain its age in light of new measurements of the diameters and albedos of family members from the Wide-field Infrared Survey Explorer. We also investigate the effect of varying the assumed physical and orbital parameters on the best-fitting age. We find that the physically allowed range of assumed values for the density and thermal conductivity induces a large uncertainty in the rate of evolution. When realistic uncertainties in the family members' physical parameters are taken into account we find the best-fitting age can fall anywhere in the range of 140-320 Myr. Without more information on the physical properties of the family members it is difficult to place a more firm constraint on Baptistina's age.Comment: 27 pages, 16 figures, accepted to Ap

    Grids and the Virtual Observatory

    Get PDF
    We consider several projects from astronomy that benefit from the Grid paradigm and associated technology, many of which involve either massive datasets or the federation of multiple datasets. We cover image computation (mosaicking, multi-wavelength images, and synoptic surveys); database computation (representation through XML, data mining, and visualization); and semantic interoperability (publishing, ontologies, directories, and service descriptions)

    Photoionization, Numerical Resolution, and Galaxy Formation

    Get PDF
    Using cosmological simulations that incorporate gas dynamics and gravitational forces, we investigate the influence of photoionization by a UV radiation background on the formation of galaxies. In our highest resolution simulations, we find that photoionization has essentially no effect on the baryonic mass function of galaxies at z=2z=2, down to our resolution limit of 5e9 M_\sun. We do, however, find a strong interplay between the mass resolution of a simulation and the microphysics included in the computation of heating and cooling rates. At low resolution, a photoionizing background can appear to suppress the formation of even relatively massive galaxies. However, when the same initial conditions are evolved with a factor of eight better mass resolution, this effect disappears. Our results demonstrate the need for care in interpreting the results of cosmological simulations that incorporate hydrodynamics and radiation physics. For example, we conclude that a simulation with limited resolution may yield more realistic results if it ignores some relevant physical processes, such as photoionization. At higher resolution, the simulated population of massive galaxies is insensitive to the treatment of photoionization and star formation, but it does depend significantly on the amplitude of the initial density fluctuations. By z=2z=2, an Ω=1\Omega=1 cold dark matter model normalized to produce the observed masses of present-day clusters has already formed galaxies with baryon masses exceeding 1e11 M_\sun.Comment: 25 pages, w/ embedded figures. Submitted to ApJ. Also available at http://www-astronomy.mps.ohio-state.edu/~dhw/Docs/preprints.htm
    corecore