Search CORE

1,250 research outputs found

An Active Instance-based Machine Learning method for Stellar Population Studies

Author: Fuentes Olac
Solorio Thamar
Terlevich Elena
Terlevich Roberto
Publication venue: 'Wiley'
Publication date: 21/07/2005
Field of study

We have developed a method for fast and accurate stellar population parameters determination in order to apply it to high resolution galaxy spectra. The method is based on an optimization technique that combines active learning with an instance-based machine learning algorithm. We tested the method with the retrieval of the star-formation history and dust content in "synthetic" galaxies with a wide range of S/N ratios. The "synthetic" galaxies where constructed using two different grids of high resolution theoretical population synthesis models. The results of our controlled experiment shows that our method can estimate with good speed and accuracy the parameters of the stellar populations that make up the galaxy even for very low S/N input. For a spectrum with S/N=5 the typical average deviation between the input and fitted spectrum is less than 10**{-5}. Additional improvements are achieved using prior knowledge.Comment: 14 pages, 25 figures, accepted by Monthly Notice

arXiv.org e-Print Archive

Crossref

CERN Document Server

The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch

Author: McCollum Bruce
Pesenson Isaac Z.
Pesenson Meyer Z.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2010
Field of study

Recent and forthcoming advances in instrumentation, and giant new surveys, are creating astronomical data sets that are not amenable to the methods of analysis familiar to astronomers. Traditional methods are often inadequate not merely because of the size in bytes of the data sets, but also because of the complexity of modern data sets. Mathematical limitations of familiar algorithms and techniques in dealing with such data sets create a critical need for new paradigms for the representation, analysis and scientific visualization (as opposed to illustrative visualization) of heterogeneous, multiresolution data across application domains. Some of the problems presented by the new data sets have been addressed by other disciplines such as applied mathematics, statistics and machine learning and have been utilized by other sciences such as space-based geosciences. Unfortunately, valuable results pertaining to these problems are mostly to be found only in publications outside of astronomy. Here we offer brief overviews of a number of concepts, techniques and developments, some "old" and some new. These are generally unknown to most of the astronomical community, but are vital to the analysis and visualization of complex datasets and images. In order for astronomers to take advantage of the richness and complexity of the new era of data, and to be able to identify, adopt, and apply new solutions, the astronomical community needs a certain degree of awareness and understanding of the new concepts. One of the goals of this paper is to help bridge the gap between applied mathematics, artificial intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in Astronomy, special issue "Robotic Astronomy

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Caltech Authors

Proceedings of the 2011 New York Workshop on Computer, Earth and Space Science

Author: Naud Catherine
Way Michael J.
Publication venue
Publication date: 11/04/2011
Field of study

The purpose of the New York Workshop on Computer, Earth and Space Sciences is to bring together the New York area's finest Astronomers, Statisticians, Computer Scientists, Space and Earth Scientists to explore potential synergies between their respective fields. The 2011 edition (CESS2011) was a great success, and we would like to thank all of the presenters and participants for attending. This year was also special as it included authors from the upcoming book titled "Advances in Machine Learning and Data Mining for Astronomy". Over two days, the latest advanced techniques used to analyze the vast amounts of information now available for the understanding of our universe and our planet were presented. These proceedings attempt to provide a small window into what the current state of research is in this vast interdisciplinary field and we'd like to thank the speakers who spent the time to contribute to this volume.Comment: Author lists modified. 82 pages. Workshop Proceedings from CESS 2011 in New York City, Goddard Institute for Space Studie

arXiv.org e-Print Archive

CERN Document Server

Data Mining and Machine Learning in Astronomy

Author: Aha D. W.
Aizerman M. A.
Benjamini Y.
Bertin E.
Borne K.
Breiman L.
de Vaucouleurs G.
Dempster A.
Drake A. J.
Ebisuzaki T.
Faundez-Abans M.
Goebel J.
Karhunen K.
Levy S.
Li L.-L.
Maddox S. J.
Molinari E.
Moore G. E.
Naim A.
NICHOLAS M. BALL
P. A.
Patterson F. S.
ROBERT J. BRUNNER
Salzberg S. L.
Scaringi S.
Serra-Ricart M.
Steinhaus H.
Urunkar N.
Wells D. C.
Won E.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 10/08/2010
Field of study

We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

arXiv.org e-Print Archive

Crossref

Photometric identification of blue horizontal branch stars

Author: Abazajain
Adelman-McCarthy
Bailer-Jones
C. A. L. Bailer-Jones
Cortes
Dorman
Gao
Harris
Hayfield
Huertas-Company
K. W. Smith
Kinman
Marengo
R. J. Klement
Richards
Richards
Schlegel
Sirko
Tsalmantza
Tsalmantza
X. X. Xue
Xue
Xue
Yanny
Publication venue: 'EDP Sciences'
Publication date: 01/01/2010
Field of study

We investigate the performance of some common machine learning techniques in identifying BHB stars from photometric data. To train the machine learning algorithms, we use previously published spectroscopic identifications of BHB stars from SDSS data. We investigate the performance of three different techniques, namely k nearest neighbour classification, kernel density estimation and a support vector machine (SVM). We discuss the performance of the methods in terms of both completeness and contamination. We discuss the prospect of trading off these values, achieving lower contamination at the expense of lower completeness, by adjusting probability thresholds for the classification. We also discuss the role of prior probabilities in the classification performance, and we assess via simulations the reliability of the dataset used for training. Overall it seems that no-prior gives the best completeness, but adopting a prior lowers the contamination. We find that the SVM generally delivers the lowest contamination for a given level of completeness, and so is our method of choice. Finally, we classify a large sample of SDSS DR7 photometry using the SVM trained on the spectroscopic sample. We identify 27,074 probable BHB stars out of a sample of 294,652 stars. We derive photometric parallaxes and demonstrate that our results are reasonable by comparing to known distances for a selection of globular clusters. We attach our classifications, including probabilities, as an electronic table, so that they can be used either directly as a BHB star catalogue, or as priors to a spectroscopic or other classification method. We also provide our final models so that they can be directly applied to new data.Comment: To appear in A&A. 19 pages, 22 figures. Tables 7, A3 and A4 available electronically onlin

arXiv.org e-Print Archive

CiteSeerX

Crossref

EDP Sciences OAI-PMH repository (1.2.0)