3,012 research outputs found

    Applying Machine Learning to Catalogue Matching in Astrophysics

    Full text link
    We present the results of applying automated machine learning techniques to the problem of matching different object catalogues in astrophysics. In this study we take two partially matched catalogues where one of the two catalogues has a large positional uncertainty. The two catalogues we used here were taken from the HI Parkes All Sky Survey (HIPASS), and SuperCOSMOS optical survey. Previous work had matched 44% (1887 objects) of HIPASS to the SuperCOSMOS catalogue. A supervised learning algorithm was then applied to construct a model of the matched portion of our catalogue. Validation of the model shows that we achieved a good classification performance (99.12% correct). Applying this model, to the unmatched portion of the catalogue found 1209 new matches. This increases the catalogue size from 1887 matched objects to 3096. The combination of these procedures yields a catalogue that is 72% matched.Comment: 8 Pages, 5 Figure

    MiraBest : a data set of morphologically classified radio galaxies for machine learning

    Get PDF
    The volume of data from current and future observatories has motivated the increased development and application of automated machine learning methodologies for astronomy. However, less attention has been given to the production of standardised datasets for assessing the performance of different machine learning algorithms within astronomy and astrophysics. Here we describe in detail the MiraBest dataset, a publicly available batched dataset of 1256 radio-loud AGN from NVSS and FIRST, filtered to 0.03<z<0.10.03 < z < 0.1, manually labelled by Miraghaei and Best (2017) according to the Fanaroff-Riley morphological classification, created for machine learning applications and compatible for use with standard deep learning libraries. We outline the principles underlying the construction of the dataset, the sample selection and pre-processing methodology, dataset structure and composition, as well as a comparison of MiraBest to other datasets used in the literature. Existing applications that utilise the MiraBest dataset are reviewed, and an extended dataset of 2100 sources is created by cross-matching MiraBest with other catalogues of radio-loud AGN that have been used more widely in the literature for machine learning applications

    XMMPZCAT: A catalogue of photometric redshifts for X-ray sources

    Full text link
    The third version of the XMM-Newton serendipitous catalogue (3XMM), containing almost half million sources, is now the largest X-ray catalogue. However, its full scientific potential remains untapped due to the lack of distance information (i.e. redshifts) for the majority of its sources. Here we present XMMPZCAT, a catalogue of photometric redshifts (photo-z) for 3XMM sources. We searched for optical counterparts of 3XMM-DR6 sources outside the Galactic plane in the SDSS and Pan-STARRS surveys, with the addition of near- (NIR) and mid-infrared (MIR) data whenever possible (2MASS, UKIDSS, VISTA-VHS, and AllWISE). We used this photometry data set in combination with a training sample of 5157 X-ray selected sources and the MLZ-TPZ package, a supervised machine learning algorithm based on decision trees and random forests for the calculation of photo-z. We have estimated photo-z for 100,178 X-ray sources, about 50% of the total number of 3XMM sources (205,380) in the XMM-Newton fields selected to build this catalogue (4208 out of 9159). The accuracy of our results highly depends on the available photometric data, with a rate of outliers ranging from 4% for sources with data in the optical+NIR+MIR, up to \sim40% for sources with only optical data. We also addressed the reliability level of our results by studying the shape of the photo-z probability density distributions.Comment: 16 pages, 14 figures, A&A accepte

    Decision table for classifying point sources based on FIRST and 2MASS databases

    Full text link
    With the availability of multiwavelength, multiscale and multiepoch astronomical catalogues, the number of features to describe astronomical objects has increases. The better features we select to classify objects, the higher the classification accuracy is. In this paper, we have used data sets of stars and quasars from near infrared band and radio band. Then best-first search method was applied to select features. For the data with selected features, the algorithm of decision table was implemented. The classification accuracy is more than 95.9%. As a result, the feature selection method improves the effectiveness and efficiency of the classification method. Moreover the result shows that decision table is robust and effective for discrimination of celestial objects and used for preselecting quasar candidates for large survey projects.Comment: 10 pages. accepted by Advances in Space Researc

    Identification of Young Stellar Object candidates in the GaiaGaia DR2 x AllWISE catalogue with machine learning methods

    Get PDF
    The second GaiaGaia Data Release (DR2) contains astrometric and photometric data for more than 1.6 billion objects with mean GaiaGaia GG magnitude <<20.7, including many Young Stellar Objects (YSOs) in different evolutionary stages. In order to explore the YSO population of the Milky Way, we combined the GaiaGaia DR2 database with WISE and Planck measurements and made an all-sky probabilistic catalogue of YSOs using machine learning techniques, such as Support Vector Machines, Random Forests, or Neural Networks. Our input catalogue contains 103 million objects from the DR2xAllWISE cross-match table. We classified each object into four main classes: YSOs, extragalactic objects, main-sequence stars and evolved stars. At a 90% probability threshold we identified 1,129,295 YSO candidates. To demonstrate the quality and potential of our YSO catalogue, here we present two applications of it. (1) We explore the 3D structure of the Orion A star forming complex and show that the spatial distribution of the YSOs classified by our procedure is in agreement with recent results from the literature. (2) We use our catalogue to classify published GaiaGaia Science Alerts. As GaiaGaia measures the sources at multiple epochs, it can efficiently discover transient events, including sudden brightness changes of YSOs caused by dynamic processes of their circumstellar disk. However, in many cases the physical nature of the published alert sources are not known. A cross-check with our new catalogue shows that about 30% more of the published GaiaGaia alerts can most likely be attributed to YSO activity. The catalogue can be also useful to identify YSOs among future GaiaGaia alerts.Comment: 19 pages, 12 figures, 3 table

    Machine-learning identification of galaxies in the WISExSuperCOSMOS all-sky catalogue

    Get PDF
    The two currently largest all-sky photometric datasets, WISE and SuperCOSMOS, were cross-matched by Bilicki et al. (2016) (B16) to construct a novel photometric redshift catalogue on 70% of the sky. Galaxies were therein separated from stars and quasars through colour cuts, which may leave imperfections because of mixing different source types which overlap in colour space. The aim of the present work is to identify galaxies in the WISExSuperCOSMOS catalogue through an alternative approach of machine learning. This allows us to define more complex separations in the multi-colour space than possible with simple colour cuts, and should provide more reliable source classification. For the automatised classification we use the support vector machines learning algorithm, employing SDSS spectroscopic sources cross-matched with WISExSuperCOSMOS as the training and verification set. We perform a number of tests to examine the behaviour of the classifier (completeness, purity and accuracy) as a function of source apparent magnitude and Galactic latitude. We then apply the classifier to the full-sky data and analyse the resulting catalogue of candidate galaxies. We also compare thus produced dataset with the one presented in B16. The tests indicate very high accuracy, completeness and purity (>95%) of the classifier at the bright end, deteriorating for the faintest sources, but still retaining acceptable levels of 85%. No significant variation of classification quality with Galactic latitude is observed. Application of the classifier to all-sky WISExSuperCOSMOS data gives 15 million galaxies after masking problematic areas. The resulting sample is purer than the one in B16, at a price of lower completeness over the sky. The automatic classification gives a successful alternative approach to defining a reliable galaxy sample as compared to colour cuts.Comment: 12 pages, 15 figures, accepted for publication in A&A. Obtained catalogue will be included in the public release of the WISExSuperCOSMOS galaxy catalogue available from http://ssa.roe.ac.uk/WISExSCO

    Estimating Photometric Redshifts for X-ray sources in the X-ATLAS field, using machine-learning techniques

    Full text link
    We present photometric redshifts for 1,031 X-ray sources in the X-ATLAS field, using the machine learning technique TPZ (Carrasco Kind & Brunner 2013). X-ATLAS covers 7.1 deg2 observed with the XMM-Newton within the Science Demonstration Phase (SDP) of the H-ATLAS field, making it one of the largest contiguous areas of the sky with both XMMNewton and Herschel coverage. All of the sources have available SDSS photometry while 810 have additionally mid-IR and/or near-IR photometry. A spectroscopic sample of 5,157 sources primarily in the XMM/XXL field, but also from several X-ray surveys and the SDSS DR13 redshift catalogue, is used for the training of the algorithm. Our analysis reveals that the algorithm performs best when the sources are split, based on their optical morphology, into point-like and extended sources. Optical photometry alone is not enough for the estimation of accurate photometric redshifts, but the results greatly improve when, at least, mid-IR photometry is added in the training process. In particular, our measurements show that the estimated photometric redshifts for the X-ray sources of the training sample, have a normalized absolute median deviation, n_mad=0.06, and the percentage of outliers, eta=10-14 percent, depending on whether the sources are extended or point-like. Our final catalogue contains photometric redshifts for 933 out of the 1,031 X-ray sources with a median redshift of 0.9.Comment: 10 pages, 13 figures, A&A accepte

    Compact continuum source-finding for next generation radio surveys

    Full text link
    We present a detailed analysis of four of the most widely used radio source finding packages in radio astronomy, and a program being developed for the Australian Square Kilometer Array Pathfinder (ASKAP) telescope. The four packages; SExtractor, SFind, IMSAD and Selavy are shown to produce source catalogues with high completeness and reliability. In this paper we analyse the small fraction (~1%) of cases in which these packages do not perform well. This small fraction of sources will be of concern for the next generation of radio surveys which will produce many thousands of sources on a daily basis, in particular for blind radio transients surveys. From our analysis we identify the ways in which the underlying source finding algorithms fail. We demonstrate a new source finding algorithm Aegean, based on the application of a Laplacian kernel, which can avoid these problems and can produce complete and reliable source catalogues for the next generation of radio surveys.Comment: 14 pages, 12 figures, accepted for publication in MNRA
    corecore