2,266 research outputs found
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
Multivariate Approaches to Classification in Extragalactic Astronomy
Clustering objects into synthetic groups is a natural activity of any
science. Astrophysics is not an exception and is now facing a deluge of data.
For galaxies, the one-century old Hubble classification and the Hubble tuning
fork are still largely in use, together with numerous mono-or bivariate
classifications most often made by eye. However, a classification must be
driven by the data, and sophisticated multivariate statistical tools are used
more and more often. In this paper we review these different approaches in
order to situate them in the general context of unsupervised and supervised
learning. We insist on the astrophysical outcomes of these studies to show that
multivariate analyses provide an obvious path toward a renewal of our
classification of galaxies and are invaluable tools to investigate the physics
and evolution of galaxies.Comment: Open Access paper.
http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>.
\<10.3389/fspas.2015.00003 \&g
SkyDOT (Sky Database for Objects in the Time Domain): A Virtual Observatory for Variability Studies at LANL
The mining of Virtual Observatories (VOs) is becoming a powerful new method
for discovery in astronomy. Here we report on the development of SkyDOT (Sky
Database for Objects in the Time domain), a new Virtual Observatory, which is
dedicated to the study of sky variability. The site will confederate a number
of massive variability surveys and enable exploration of the time domain in
astronomy. We discuss the architecture of the database and the functionality of
the user interface. An important aspect of SkyDOT is that it is continuously
updated in near real time so that users can access new observations in a timely
manner. The site will also utilize high level machine learning tools that will
allow sophisticated mining of the archive. Another key feature is the real time
data stream provided by RAPTOR (RAPid Telescopes for Optical Response), a new
sky monitoring experiment under construction at Los Alamos National Laboratory
(LANL).Comment: to appear in SPIE proceedings vol. 4846, 11 pages, 5 figure
Genetic Classification of Populations using Supervised Learning
There are many instances in genetics in which we wish to determine whether
two candidate populations are distinguishable on the basis of their genetic
structure. Examples include populations which are geographically separated,
case--control studies and quality control (when participants in a study have
been genotyped at different laboratories). This latter application is of
particular importance in the era of large scale genome wide association
studies, when collections of individuals genotyped at different locations are
being merged to provide increased power. The traditional method for detecting
structure within a population is some form of exploratory technique such as
principal components analysis. Such methods, which do not utilise our prior
knowledge of the membership of the candidate populations. are termed
\emph{unsupervised}. Supervised methods, on the other hand are able to utilise
this prior knowledge when it is available.
In this paper we demonstrate that in such cases modern supervised approaches
are a more appropriate tool for detecting genetic differences between
populations. We apply two such methods, (neural networks and support vector
machines) to the classification of three populations (two from Scotland and one
from Bulgaria). The sensitivity exhibited by both these methods is considerably
higher than that attained by principal components analysis and in fact
comfortably exceeds a recently conjectured theoretical limit on the sensitivity
of unsupervised methods. In particular, our methods can distinguish between the
two Scottish populations, where principal components analysis cannot. We
suggest, on the basis of our results that a supervised learning approach should
be the method of choice when classifying individuals into pre-defined
populations, particularly in quality control for large scale genome wide
association studies.Comment: Accepted PLOS On
- …