3,012 research outputs found
Applying Machine Learning to Catalogue Matching in Astrophysics
We present the results of applying automated machine learning techniques to
the problem of matching different object catalogues in astrophysics. In this
study we take two partially matched catalogues where one of the two catalogues
has a large positional uncertainty. The two catalogues we used here were taken
from the HI Parkes All Sky Survey (HIPASS), and SuperCOSMOS optical survey.
Previous work had matched 44% (1887 objects) of HIPASS to the SuperCOSMOS
catalogue.
A supervised learning algorithm was then applied to construct a model of the
matched portion of our catalogue. Validation of the model shows that we
achieved a good classification performance (99.12% correct).
Applying this model, to the unmatched portion of the catalogue found 1209 new
matches. This increases the catalogue size from 1887 matched objects to 3096.
The combination of these procedures yields a catalogue that is 72% matched.Comment: 8 Pages, 5 Figure
MiraBest : a data set of morphologically classified radio galaxies for machine learning
The volume of data from current and future observatories has motivated the increased development and application of automated machine learning methodologies for astronomy. However, less attention has been given to the production of standardised datasets for assessing the performance of different machine learning algorithms within astronomy and astrophysics. Here we describe in detail the MiraBest dataset, a publicly available batched dataset of 1256 radio-loud AGN from NVSS and FIRST, filtered to , manually labelled by Miraghaei and Best (2017) according to the Fanaroff-Riley morphological classification, created for machine learning applications and compatible for use with standard deep learning libraries. We outline the principles underlying the construction of the dataset, the sample selection and pre-processing methodology, dataset structure and composition, as well as a comparison of MiraBest to other datasets used in the literature. Existing applications that utilise the MiraBest dataset are reviewed, and an extended dataset of 2100 sources is created by cross-matching MiraBest with other catalogues of radio-loud AGN that have been used more widely in the literature for machine learning applications
XMMPZCAT: A catalogue of photometric redshifts for X-ray sources
The third version of the XMM-Newton serendipitous catalogue (3XMM),
containing almost half million sources, is now the largest X-ray catalogue.
However, its full scientific potential remains untapped due to the lack of
distance information (i.e. redshifts) for the majority of its sources. Here we
present XMMPZCAT, a catalogue of photometric redshifts (photo-z) for 3XMM
sources. We searched for optical counterparts of 3XMM-DR6 sources outside the
Galactic plane in the SDSS and Pan-STARRS surveys, with the addition of near-
(NIR) and mid-infrared (MIR) data whenever possible (2MASS, UKIDSS, VISTA-VHS,
and AllWISE). We used this photometry data set in combination with a training
sample of 5157 X-ray selected sources and the MLZ-TPZ package, a supervised
machine learning algorithm based on decision trees and random forests for the
calculation of photo-z. We have estimated photo-z for 100,178 X-ray sources,
about 50% of the total number of 3XMM sources (205,380) in the XMM-Newton
fields selected to build this catalogue (4208 out of 9159). The accuracy of our
results highly depends on the available photometric data, with a rate of
outliers ranging from 4% for sources with data in the optical+NIR+MIR, up to
40% for sources with only optical data. We also addressed the reliability
level of our results by studying the shape of the photo-z probability density
distributions.Comment: 16 pages, 14 figures, A&A accepte
Decision table for classifying point sources based on FIRST and 2MASS databases
With the availability of multiwavelength, multiscale and multiepoch
astronomical catalogues, the number of features to describe astronomical
objects has increases. The better features we select to classify objects, the
higher the classification accuracy is. In this paper, we have used data sets of
stars and quasars from near infrared band and radio band. Then best-first
search method was applied to select features. For the data with selected
features, the algorithm of decision table was implemented. The classification
accuracy is more than 95.9%. As a result, the feature selection method improves
the effectiveness and efficiency of the classification method. Moreover the
result shows that decision table is robust and effective for discrimination of
celestial objects and used for preselecting quasar candidates for large survey
projects.Comment: 10 pages. accepted by Advances in Space Researc
Identification of Young Stellar Object candidates in the DR2 x AllWISE catalogue with machine learning methods
The second Data Release (DR2) contains astrometric and photometric
data for more than 1.6 billion objects with mean magnitude 20.7,
including many Young Stellar Objects (YSOs) in different evolutionary stages.
In order to explore the YSO population of the Milky Way, we combined the
DR2 database with WISE and Planck measurements and made an all-sky
probabilistic catalogue of YSOs using machine learning techniques, such as
Support Vector Machines, Random Forests, or Neural Networks. Our input
catalogue contains 103 million objects from the DR2xAllWISE cross-match table.
We classified each object into four main classes: YSOs, extragalactic objects,
main-sequence stars and evolved stars. At a 90% probability threshold we
identified 1,129,295 YSO candidates. To demonstrate the quality and potential
of our YSO catalogue, here we present two applications of it. (1) We explore
the 3D structure of the Orion A star forming complex and show that the spatial
distribution of the YSOs classified by our procedure is in agreement with
recent results from the literature. (2) We use our catalogue to classify
published Science Alerts. As measures the sources at multiple
epochs, it can efficiently discover transient events, including sudden
brightness changes of YSOs caused by dynamic processes of their circumstellar
disk. However, in many cases the physical nature of the published alert sources
are not known. A cross-check with our new catalogue shows that about 30% more
of the published alerts can most likely be attributed to YSO activity.
The catalogue can be also useful to identify YSOs among future alerts.Comment: 19 pages, 12 figures, 3 table
Machine-learning identification of galaxies in the WISExSuperCOSMOS all-sky catalogue
The two currently largest all-sky photometric datasets, WISE and SuperCOSMOS,
were cross-matched by Bilicki et al. (2016) (B16) to construct a novel
photometric redshift catalogue on 70% of the sky. Galaxies were therein
separated from stars and quasars through colour cuts, which may leave
imperfections because of mixing different source types which overlap in colour
space. The aim of the present work is to identify galaxies in the
WISExSuperCOSMOS catalogue through an alternative approach of machine learning.
This allows us to define more complex separations in the multi-colour space
than possible with simple colour cuts, and should provide more reliable source
classification. For the automatised classification we use the support vector
machines learning algorithm, employing SDSS spectroscopic sources cross-matched
with WISExSuperCOSMOS as the training and verification set. We perform a number
of tests to examine the behaviour of the classifier (completeness, purity and
accuracy) as a function of source apparent magnitude and Galactic latitude. We
then apply the classifier to the full-sky data and analyse the resulting
catalogue of candidate galaxies. We also compare thus produced dataset with the
one presented in B16. The tests indicate very high accuracy, completeness and
purity (>95%) of the classifier at the bright end, deteriorating for the
faintest sources, but still retaining acceptable levels of 85%. No significant
variation of classification quality with Galactic latitude is observed.
Application of the classifier to all-sky WISExSuperCOSMOS data gives 15 million
galaxies after masking problematic areas. The resulting sample is purer than
the one in B16, at a price of lower completeness over the sky. The automatic
classification gives a successful alternative approach to defining a reliable
galaxy sample as compared to colour cuts.Comment: 12 pages, 15 figures, accepted for publication in A&A. Obtained
catalogue will be included in the public release of the WISExSuperCOSMOS
galaxy catalogue available from http://ssa.roe.ac.uk/WISExSCO
Estimating Photometric Redshifts for X-ray sources in the X-ATLAS field, using machine-learning techniques
We present photometric redshifts for 1,031 X-ray sources in the X-ATLAS
field, using the machine learning technique TPZ (Carrasco Kind & Brunner 2013).
X-ATLAS covers 7.1 deg2 observed with the XMM-Newton within the Science
Demonstration Phase (SDP) of the H-ATLAS field, making it one of the largest
contiguous areas of the sky with both XMMNewton and Herschel coverage. All of
the sources have available SDSS photometry while 810 have additionally mid-IR
and/or near-IR photometry. A spectroscopic sample of 5,157 sources primarily in
the XMM/XXL field, but also from several X-ray surveys and the SDSS DR13
redshift catalogue, is used for the training of the algorithm. Our analysis
reveals that the algorithm performs best when the sources are split, based on
their optical morphology, into point-like and extended sources. Optical
photometry alone is not enough for the estimation of accurate photometric
redshifts, but the results greatly improve when, at least, mid-IR photometry is
added in the training process. In particular, our measurements show that the
estimated photometric redshifts for the X-ray sources of the training sample,
have a normalized absolute median deviation, n_mad=0.06, and the percentage of
outliers, eta=10-14 percent, depending on whether the sources are extended or
point-like. Our final catalogue contains photometric redshifts for 933 out of
the 1,031 X-ray sources with a median redshift of 0.9.Comment: 10 pages, 13 figures, A&A accepte
Compact continuum source-finding for next generation radio surveys
We present a detailed analysis of four of the most widely used radio source
finding packages in radio astronomy, and a program being developed for the
Australian Square Kilometer Array Pathfinder (ASKAP) telescope. The four
packages; SExtractor, SFind, IMSAD and Selavy are shown to produce source
catalogues with high completeness and reliability. In this paper we analyse the
small fraction (~1%) of cases in which these packages do not perform well. This
small fraction of sources will be of concern for the next generation of radio
surveys which will produce many thousands of sources on a daily basis, in
particular for blind radio transients surveys. From our analysis we identify
the ways in which the underlying source finding algorithms fail. We demonstrate
a new source finding algorithm Aegean, based on the application of a Laplacian
kernel, which can avoid these problems and can produce complete and reliable
source catalogues for the next generation of radio surveys.Comment: 14 pages, 12 figures, accepted for publication in MNRA
- …