1,250 research outputs found
An Active Instance-based Machine Learning method for Stellar Population Studies
We have developed a method for fast and accurate stellar population
parameters determination in order to apply it to high resolution galaxy
spectra. The method is based on an optimization technique that combines active
learning with an instance-based machine learning algorithm. We tested the
method with the retrieval of the star-formation history and dust content in
"synthetic" galaxies with a wide range of S/N ratios. The "synthetic" galaxies
where constructed using two different grids of high resolution theoretical
population synthesis models. The results of our controlled experiment shows
that our method can estimate with good speed and accuracy the parameters of the
stellar populations that make up the galaxy even for very low S/N input. For a
spectrum with S/N=5 the typical average deviation between the input and fitted
spectrum is less than 10**{-5}. Additional improvements are achieved using
prior knowledge.Comment: 14 pages, 25 figures, accepted by Monthly Notice
The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch
Recent and forthcoming advances in instrumentation, and giant new surveys,
are creating astronomical data sets that are not amenable to the methods of
analysis familiar to astronomers. Traditional methods are often inadequate not
merely because of the size in bytes of the data sets, but also because of the
complexity of modern data sets. Mathematical limitations of familiar algorithms
and techniques in dealing with such data sets create a critical need for new
paradigms for the representation, analysis and scientific visualization (as
opposed to illustrative visualization) of heterogeneous, multiresolution data
across application domains. Some of the problems presented by the new data sets
have been addressed by other disciplines such as applied mathematics,
statistics and machine learning and have been utilized by other sciences such
as space-based geosciences. Unfortunately, valuable results pertaining to these
problems are mostly to be found only in publications outside of astronomy. Here
we offer brief overviews of a number of concepts, techniques and developments,
some "old" and some new. These are generally unknown to most of the
astronomical community, but are vital to the analysis and visualization of
complex datasets and images. In order for astronomers to take advantage of the
richness and complexity of the new era of data, and to be able to identify,
adopt, and apply new solutions, the astronomical community needs a certain
degree of awareness and understanding of the new concepts. One of the goals of
this paper is to help bridge the gap between applied mathematics, artificial
intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in
Astronomy, special issue "Robotic Astronomy
Proceedings of the 2011 New York Workshop on Computer, Earth and Space Science
The purpose of the New York Workshop on Computer, Earth and Space Sciences is
to bring together the New York area's finest Astronomers, Statisticians,
Computer Scientists, Space and Earth Scientists to explore potential synergies
between their respective fields. The 2011 edition (CESS2011) was a great
success, and we would like to thank all of the presenters and participants for
attending. This year was also special as it included authors from the upcoming
book titled "Advances in Machine Learning and Data Mining for Astronomy". Over
two days, the latest advanced techniques used to analyze the vast amounts of
information now available for the understanding of our universe and our planet
were presented. These proceedings attempt to provide a small window into what
the current state of research is in this vast interdisciplinary field and we'd
like to thank the speakers who spent the time to contribute to this volume.Comment: Author lists modified. 82 pages. Workshop Proceedings from CESS 2011
in New York City, Goddard Institute for Space Studie
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
Photometric identification of blue horizontal branch stars
We investigate the performance of some common machine learning techniques in
identifying BHB stars from photometric data. To train the machine learning
algorithms, we use previously published spectroscopic identifications of BHB
stars from SDSS data. We investigate the performance of three different
techniques, namely k nearest neighbour classification, kernel density
estimation and a support vector machine (SVM). We discuss the performance of
the methods in terms of both completeness and contamination. We discuss the
prospect of trading off these values, achieving lower contamination at the
expense of lower completeness, by adjusting probability thresholds for the
classification. We also discuss the role of prior probabilities in the
classification performance, and we assess via simulations the reliability of
the dataset used for training. Overall it seems that no-prior gives the best
completeness, but adopting a prior lowers the contamination. We find that the
SVM generally delivers the lowest contamination for a given level of
completeness, and so is our method of choice. Finally, we classify a large
sample of SDSS DR7 photometry using the SVM trained on the spectroscopic
sample. We identify 27,074 probable BHB stars out of a sample of 294,652 stars.
We derive photometric parallaxes and demonstrate that our results are
reasonable by comparing to known distances for a selection of globular
clusters. We attach our classifications, including probabilities, as an
electronic table, so that they can be used either directly as a BHB star
catalogue, or as priors to a spectroscopic or other classification method. We
also provide our final models so that they can be directly applied to new data.Comment: To appear in A&A. 19 pages, 22 figures. Tables 7, A3 and A4 available
electronically onlin
- …