141 research outputs found
Automated novelty detection in the WISE survey with one-class support vector machines
Wide-angle photometric surveys of previously uncharted sky areas or
wavelength regimes will always bring in unexpected sources whose existence and
properties cannot be easily predicted from earlier observations: novelties or
even anomalies. Such objects can be efficiently sought for with novelty
detection algorithms. Here we present an application of such a method, called
one-class support vector machines (OCSVM), to search for anomalous patterns
among sources preselected from the mid-infrared AllWISE catalogue covering the
whole sky. To create a model of expected data we train the algorithm on a set
of objects with spectroscopic identifications from the SDSS DR13 database,
present also in AllWISE. OCSVM detects as anomalous those sources whose
patterns - WISE photometric measurements in this case - are inconsistent with
the model. Among the detected anomalies we find artefacts, such as objects with
spurious photometry due to blending, but most importantly also real sources of
genuine astrophysical interest. Among the latter, OCSVM has identified a sample
of heavily reddened AGN/quasar candidates distributed uniformly over the sky
and in a large part absent from other WISE-based AGN catalogues. It also
allowed us to find a specific group of sources of mixed types, mostly stars and
compact galaxies. By combining the semi-supervised OCSVM algorithm with
standard classification methods it will be possible to improve the latter by
accounting for sources which are not present in the training sample but are
otherwise well-represented in the target set. Anomaly detection adds
flexibility to automated source separation procedures and helps verify the
reliability and representativeness of the training samples. It should be thus
considered as an essential step in supervised classification schemes to ensure
completeness and purity of produced catalogues.Comment: 14 pages, 15 figure
Finding rare objects and building pure samples: Probabilistic quasar classification from low resolution Gaia spectra
We develop and demonstrate a probabilistic method for classifying rare
objects in surveys with the particular goal of building very pure samples. It
works by modifying the output probabilities from a classifier so as to
accommodate our expectation (priors) concerning the relative frequencies of
different classes of objects. We demonstrate our method using the Discrete
Source Classifier, a supervised classifier currently based on Support Vector
Machines, which we are developing in preparation for the Gaia data analysis.
DSC classifies objects using their very low resolution optical spectra. We look
in detail at the problem of quasar classification, because identification of a
pure quasar sample is necessary to define the Gaia astrometric reference frame.
By varying a posterior probability threshold in DSC we can trade off sample
completeness and contamination. We show, using our simulated data, that it is
possible to achieve a pure sample of quasars (upper limit on contamination of 1
in 40,000) with a completeness of 65% at magnitudes of G=18.5, and 50% at
G=20.0, even when quasars have a frequency of only 1 in every 2000 objects. The
star sample completeness is simultaneously 99% with a contamination of 0.7%.
Including parallax and proper motion in the classifier barely changes the
results. We further show that not accounting for class priors in the target
population leads to serious misclassifications and poor predictions for sample
completeness and contamination. (Truncated)Comment: MNRAS accepte
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
The Extremely Luminous Quasar Survey (ELQS) in the SDSS footprint I.: Infrared Based Candidate Selection
Studies of the most luminous quasars at high redshift directly probe the
evolution of the most massive black holes in the early Universe and their
connection to massive galaxy formation. However, extremely luminous quasars at
high redshift are very rare objects. Only wide area surveys have a chance to
constrain their population. The Sloan Digital Sky Survey (SDSS) has so far
provided the most widely adopted measurements of the quasar luminosity function
(QLF) at . However, a careful re-examination of the SDSS quasar sample
revealed that the SDSS quasar selection is in fact missing a significant
fraction of quasars at the brightest end. We have identified the
purely optical color selection of SDSS, where quasars at these redshifts are
strongly contaminated by late-type dwarfs, and the spectroscopic incompleteness
of the SDSS footprint as the main reasons. Therefore we have designed the
Extremely Luminous Quasar Survey (ELQS), based on a novel near-infrared JKW2
color cut using WISE AllWISE and 2MASS all-sky photometry, to yield high
completeness for very bright () quasars in the redshift
range of . It effectively uses random forest machine-learning
algorithms on SDSS and WISE photometry for quasar-star classification and
photometric redshift estimation. The ELQS will spectroscopically follow-up
new quasar candidates in an area of in the
SDSS footprint, to obtain a well-defined and complete quasars sample for an
accurate measurement of the bright-end quasar luminosity function at . In this paper we present the quasar selection algorithm and the
quasar candidate catalog.Comment: 16 pages, 8 figures, 9 tables; ApJ in pres
Estimating Photometric Redshifts of Quasars via K-nearest Neighbor Approach Based on Large Survey Databases
We apply one of lazy learning methods named k-nearest neighbor algorithm
(kNN) to estimate the photometric redshifts of quasars, based on various
datasets from the Sloan Digital Sky Survey (SDSS), UKIRT Infrared Deep Sky
Survey (UKIDSS) and Wide-field Infrared Survey Explorer (WISE) (the SDSS
sample, the SDSS-UKIDSS sample, the SDSS-WISE sample and the SDSS-UKIDSS-WISE
sample). The influence of the k value and different input patterns on the
performance of kNN is discussed. kNN arrives at the best performance when k is
different with a special input pattern for a special dataset. The best result
belongs to the SDSS-UKIDSS-WISE sample. The experimental results show that
generally the more information from more bands, the better performance of
photometric redshift estimation with kNN. The results also demonstrate that kNN
using multiband data can effectively solve the catastrophic failure of
photometric redshift estimation, which is met by many machine learning methods.
By comparing the performance of various methods for photometric redshift
estimation of quasars, kNN based on KD-Tree shows its superiority with the best
accuracy for our case.Comment: 28 pages, 4 figures, 3 tables, accepted for publication in A
Machine Learning in Astronomy: A Case Study in Quasar-Star Classification
We present the results of various automated classification methods, based on
machine learning (ML), of objects from data releases 6 and 7 (DR6 and DR7) of
the Sloan Digital Sky Survey (SDSS), primarily distinguishing stars from
quasars. We provide a careful scrutiny of approaches available in the
literature and have highlighted the pitfalls in those approaches based on the
nature of data used for the study. The aim is to investigate the
appropriateness of the application of certain ML methods. The manuscript argues
convincingly in favor of the efficacy of asymmetric AdaBoost to classify
photometric data. The paper presents a critical review of existing study and
puts forward an application of asymmetric AdaBoost, as an offspring of that
exercise.Comment: 10 pages, 8 figure
Support Vector Machine classification of strong gravitational lenses
The imminent advent of very large-scale optical sky surveys, such as Euclid
and LSST, makes it important to find efficient ways of discovering rare objects
such as strong gravitational lens systems, where a background object is
multiply gravitationally imaged by a foreground mass. As well as finding the
lens systems, it is important to reject false positives due to intrinsic
structure in galaxies, and much work is in progress with machine learning
algorithms such as neural networks in order to achieve both these aims. We
present and discuss a Support Vector Machine (SVM) algorithm which makes use of
a Gabor filterbank in order to provide learning criteria for separation of
lenses and non-lenses, and demonstrate using blind challenges that under
certain circumstances it is a particularly efficient algorithm for rejecting
false positives. We compare the SVM engine with a large-scale human examination
of 100000 simulated lenses in a challenge dataset, and also apply the SVM
method to survey images from the Kilo-Degree Survey.Comment: Accepted by MNRA
- …