Search CORE

4,018 research outputs found

Robust Machine Learning Applied to Astronomical Datasets I: Star-Galaxy Classification of the SDSS DR3 Using Decision Trees

Author: Adam D. Myers
David Tcheng
Djorgovski S. G.
Nicholas M. Ball
Owens E. A.
Quinlan J. R.
Robert J. Brunner
Storrie-Lombardi M. C.
Voges W.
Publication venue: 'University of Chicago Press'
Publication date: 21/06/2006
Field of study

We provide classifications for all 143 million non-repeat photometric objects in the Third Data Release of the Sloan Digital Sky Survey (SDSS) using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with r < ~20. The general machine learning environment Data-to-Knowledge and supercomputing resources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness and efficiency. Second, we investigate the efficacy of the classifications and the effect of extrapolating from the spectroscopic regime by performing blind tests on objects in the SDSS, 2dF Galaxy Redshift and 2dF QSO Redshift (2QZ) surveys. Given the photometric limits of our spectroscopic training data, we effectively begin to extrapolate past our star-galaxy training set at r ~ 18. By comparing the number counts of our training sample with the classified sources, however, we find that our efficiencies appear to remain robust to r ~ 20. As a result, we expect our classifications to be accurate for 900,000 galaxies and 6.7 million stars, and remain robust via extrapolation for a total of 8.0 million galaxies and 13.9 million stars. [Abridged]Comment: 27 pages, 12 figures, to be published in ApJ, uses emulateapj.cl

arXiv.org e-Print Archive

Crossref

CERN Document Server

Data Management and Mining in Astrophysical Databases

Author: De Angelis A.
Frailis M.
Roberto V.
Publication venue
Publication date: 01/01/2003
Field of study

We analyse the issues involved in the management and mining of astrophysical data. The traditional approach to data management in the astrophysical field is not able to keep up with the increasing size of the data gathered by modern detectors. An essential role in the astrophysical research will be assumed by automatic tools for information extraction from large datasets, i.e. data mining techniques, such as clustering and classification algorithms. This asks for an approach to data management based on data warehousing, emphasizing the efficiency and simplicity of data access; efficiency is obtained using multidimensional access methods and simplicity is achieved by properly handling metadata. Clustering and classification techniques, on large datasets, pose additional requirements: computational and memory scalability with respect to the data size, interpretability and objectivity of clustering or classification results. In this study we address some possible solutions.Comment: 10 pages, Late

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Udine

On Machine-Learned Classification of Variable Stars with Sparse and Noisy Time-Series Data

Author: Arien Crellin-Quick
Bailey
Ball
Barning
Blockeel
Blomme
Breiman
Breiman
Burman
Butler
Cesa-Bianchi
Cheeseman
Covey
Dan L. Starr
Doering
Eyer
Eyer
Eyer
Eyer
Flores
Freund
Friedman
Hastie
Ivezić
John M. Brewer
Joseph W. Richards
Joshua S. Bloom
Justin Higgins
Knerr
LSST Science Collaborations .
Maxime Rischard
Millan-Gabet
Moffat
Nathaniel R. Butler
O'Keefe
Perryman
Press
Quinlan
Rachel Kennedy
Rebbapragada
Sesar
Stankov
Suchkov
Udalski
Vapnik
Walkowicz
Wasserman
Willemsen
Woźniak
Wu
Publication venue: 'IOP Publishing'
Publication date: 10/01/2011
Field of study

With the coming data deluge from synoptic surveys, there is a growing need for frameworks that can quickly and automatically produce calibrated classification probabilities for newly-observed variables based on a small number of time-series measurements. In this paper, we introduce a methodology for variable-star classification, drawing from modern machine-learning techniques. We describe how to homogenize the information gleaned from light curves by selection and computation of real-numbered metrics ("feature"), detail methods to robustly estimate periodic light-curve features, introduce tree-ensemble methods for accurate variable star classification, and show how to rigorously evaluate the classification results using cross validation. On a 25-class data set of 1542 well-studied variable stars, we achieve a 22.8% overall classification error using the random forest classifier; this represents a 24% improvement over the best previous classifier on these data. This methodology is effective for identifying samples of specific science classes: for pulsational variables used in Milky Way tomography we obtain a discovery efficiency of 98.2% and for eclipsing systems we find an efficiency of 99.1%, both at 95% purity. We show that the random forest (RF) classifier is superior to other machine-learned methods in terms of accuracy, speed, and relative immunity to features with no useful class information; the RF classifier can also be used to estimate the importance of each feature in classification. Additionally, we present the first astronomical use of hierarchical classification methods to incorporate a known class taxonomy in the classifier, which further reduces the catastrophic error rate to 7.8%. Excluding low-amplitude sources, our overall error rate improves to 14%, with a catastrophic error rate of 3.5%.Comment: 23 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Multivariate Approaches to Classification in Extragalactic Astronomy

Author: Chattopadhyay Asis Kumar
Fraix-Burnet Didier
Thuillard Marc
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2015
Field of study

Clustering objects into synthetic groups is a natural activity of any science. Astrophysics is not an exception and is now facing a deluge of data. For galaxies, the one-century old Hubble classification and the Hubble tuning fork are still largely in use, together with numerous mono-or bivariate classifications most often made by eye. However, a classification must be driven by the data, and sophisticated multivariate statistical tools are used more and more often. In this paper we review these different approaches in order to situate them in the general context of unsupervised and supervised learning. We insist on the astrophysical outcomes of these studies to show that multivariate analyses provide an obvious path toward a renewal of our classification of galaxies and are invaluable tools to investigate the physics and evolution of galaxies.Comment: Open Access paper. http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>. \<10.3389/fspas.2015.00003 \&g

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

Frontiers - Publisher Connector

HAL Descartes

HAL-INSU

HAL Université de Savoie

Identifying uncertain galaxy morphologies using unsupervised learning

Author: Edwards K.
Gaber M.
Publication venue
Publication date: 09/06/2013
Field of study

Portsmouth University Research Portal (Pure)

Using machine learning techniques to automate sky survey catalog generation

Author: Djorgovski S. G.
Doyle R. J.
Fayyad Usama M.
Roden J. C.
Weir Nicholas
Publication venue
Publication date
Field of study

We describe the application of machine classification techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Palomar Observatory Sky Survey provides comprehensive photographic coverage of the northern celestial hemisphere. The photographic plates are being digitized into images containing on the order of 10(exp 7) galaxies and 10(exp 8) stars. Since the size of this data set precludes manual analysis and classification of objects, our approach is to develop a software system which integrates independently developed techniques for image processing and data classification. Image processing routines are applied to identify and measure features of sky objects. Selected features are used to determine the classification of each object. GID3* and O-BTree, two inductive learning techniques, are used to automatically learn classification decision trees from examples. We describe the techniques used, the details of our specific application, and the initial encouraging results which indicate that our approach is well-suited to the problem. The benefits of the approach are increased data reduction throughput, consistency of classification, and the automated derivation of classification rules that will form an objective, examinable basis for classifying sky objects. Furthermore, astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems given automatically cataloged data

NASA Technical Reports Server

Photometric redshift estimation via deep learning

Author: D'Isanto Antonio
Polsterer Kai Lars
Publication venue: 'EDP Sciences'
Publication date: 08/09/2017
Field of study

The need to analyze the available large synoptic multi-band surveys drives the development of new data-analysis methods. Photometric redshift estimation is one field of application where such new methods improved the results, substantially. Up to now, the vast majority of applied redshift estimation methods have utilized photometric features. We aim to develop a method to derive probabilistic photometric redshift directly from multi-band imaging data, rendering pre-classification of objects and feature extraction obsolete. A modified version of a deep convolutional network was combined with a mixture density network. The estimates are expressed as Gaussian mixture models representing the probability density functions (PDFs) in the redshift space. In addition to the traditional scores, the continuous ranked probability score (CRPS) and the probability integral transform (PIT) were applied as performance criteria. We have adopted a feature based random forest and a plain mixture density network to compare performances on experiments with data from SDSS (DR9). We show that the proposed method is able to predict redshift PDFs independently from the type of source, for example galaxies, quasars or stars. Thereby the prediction performance is better than both presented reference methods and is comparable to results from the literature. The presented method is extremely general and allows us to solve of any kind of probabilistic regression problems based on imaging data, for example estimating metallicity or star formation rate of galaxies. This kind of methodology is tremendously important for the next generation of surveys.Comment: 16 pages, 12 figures, 6 tables. Accepted for publication on A&

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)