37 research outputs found

    Unsupervised Machine Learning for the Classification of Astrophysical X-ray Sources

    Full text link
    The automatic classification of X-ray detections is a necessary step in extracting astrophysical information from compiled catalogs of astrophysical sources. Classification is useful for the study of individual objects, statistics for population studies, as well as for anomaly detection, i.e., the identification of new unexplored phenomena, including transients and spectrally extreme sources. Despite the importance of this task, classification remains challenging in X-ray astronomy due to the lack of optical counterparts and representative training sets. We develop an alternative methodology that employs an unsupervised machine learning approach to provide probabilistic classes to Chandra Source Catalog sources with a limited number of labeled sources, and without ancillary information from optical and infrared catalogs. We provide a catalog of probabilistic classes for 8,756 sources, comprising a total of 14,507 detections, and demonstrate the success of the method at identifying emission from young stellar objects, as well as distinguishing between small-scale and large-scale compact accretors with a significant level of confidence. We investigate the consistency between the distribution of features among classified objects and well-established astrophysical hypotheses such as the unified AGN model. This provides interpretability to the probabilistic classifier. Code and tables are available publicly through GitHub. We provide a web playground for readers to explore our final classification at https://umlcaxs-playground.streamlit.app.Comment: 21 pages, 11 figures. Accepted in MNRA

    The submillimetre and near-infrared properties of Herschel -ATLAS sources.

    Get PDF
    PhDIn this thesis I investigate a sample of galaxies that are detected with the Herschel Space Telescope in the sub-millimetre wavelength range and that also have near-infrared detections with the VISTA telescope in Chile as part of the VIKING survey. The first necessity is to find the near-infrared galaxies that are most likely to be the counterparts to the Herschel galaxies. I accomplish this by using a likelihood ratio method which I modify to allow for an appropriate estimate of the probability of finding a genuine near-infrared counterpart above the magnitude limit to a SPIRE source. This probability is found to be Q0 0.73. 51% of the SPIRE sources have a best VIKING counterpart with a reliability R = 0.8, and the false identification rate of these is estimated to be 4.2%. I expect to miss 5 per cent of true VIKING counterparts. There is evidence from Z - J and J - Ks colours that the reliable counterparts to SPIRE galaxies are marginally redder than the field population. I obtain photometric redshifts for 68% of all (non-stellar) VIKING candidates with a median redshift of ˜z = 0.405. I have spectroscopic redshifts for 3147 ( 28%) of the reliable counterparts from existing redshift surveys. Comparing to the results of the optical identifications supplied with the Phase I catalogue, I find that the use of medium-deep near-infrared data improves the identification rate of reliable counterparts from 36% to 51%. I investigate the evolution of the sub-millimetre luminosity function (LF) using the sample of SPIRE sources with reliable counterparts in VIKING with z 1. I find strong evolution of the 250 μm LF out to about redshift z = 0.6 and possibly out to z = 0.8 in broad agreement with previous studies. A double-power law seems to fit the local LF (z 0.2) slightly better than a Schechter function and we find a flatter slope at lower luminosities as compared to recent studies. i Finally, I construct the star formation rates (SFR) from far-infrared (FIR) and ultra-violet (UV) luminosities of the SPIRE sample with reliable VIKING counterparts (SFRFIR and SFRUV respectively) and show that the contribution of the SFRFIR increases with increasing luminosity. UV observations are hence crucial for all but the brightest SPIRE galaxies in calculating a total SFR. Calculating the slope of the UV continuum and comparing with the ratio LFIR/LUV leads to dust attenuation corrected SFRUV that represent the total SFR well in the low to medium LFIR range.STF

    Statistical Methods in the Era of Large Astronomical Surveys

    Get PDF
    Statistical methods play a crucial role in modern astronomical research. The development and understanding of these methods will be of fundamental importance to future work on large astronomical surveys. In this thesis I showcase three different statistical approaches to survey data. I first apply a semi-supervised dimensionality reduction technique to cluster similar high resolution spectra from the GALAH survey to identify 54 candidate extremely metal-poor stars. The approach shows promising potential for implementation in future large-scale stellar spectroscopic surveys. Next, I employ a method to classify sources in the Gaia survey as stars, galaxies or quasars, making use of additional infrared photometry from CatWISE2020 and discussing the importance of applying adjusted priors to probabilistic classification. Lastly, I utilise a method to estimate the rotational parameters of star clusters in Gaia, with an application to open clusters. This is done by considering the rotation of a cluster as a 3D solid body, and finding the best fitting parameters by sampling constructed likelihood functions. The methods developed in this thesis underscore the significant contributions statistical methodologies make to astronomy, and illustrate how the development and application of statistical methods will be essential for extracting meaningful insights from future large scale astronomical surveys

    On the application of machine learning approaches in astronomy: Exploring novel representations of high-dimensional and complex astronomical data

    Get PDF
    The goal of the presented work is the application of data-driven methods on complex and high- dimensional astronomical databases. The focus of the work is the exploration of novel data representations in order to enable the use of statistical learning approaches in the analysis of data. With the help of diverse science cases, the advantages of the introduced approaches for classication, visualization and regression tasks are shown by applying the developed methodology to astronomical data. In the first part, an alternative approach for estimating redshifts of spectra by using the knowledge about the redshifts provided by the SDSS pipeline is presented. A novel data repre- sentation is employed which contains only information relevant for estimating the redshift and the detection of multiple redshift systems. Subsequently, a novel data representation for regu- larly sampled light curves based on recurrent networks is presented. This allows an explorative investigation of huge databases with unlabeled data. Finally, a new way of representing the static part of irregularly sampled light curves by a mixture of Gaussians is discussed. This represen- tation is more general than the extraction of features, as it allows the inclusion of photometric uncertainties and avoids the introduction of observational biases

    Radio and infrared surveys for active galactic nuclei behind the Magellanic Clouds

    Get PDF
    I present an analysis of a new 120 deg2 radio continuum image of the Large Magellanic Cloud (LMC) at 888 MHz with a bandwidth of 288 MHz and beam size of 13.′′9×12.′′1, from the Australian Square Kilometre Array Pathfinder (ASKAP). I constructed a catalogue of 54,612 sources reaching down to <0.2 mJy and explore the sources by cross-matching with surveys at other wavelengths. I find sources are predominantly extragalactic, display synchrotron emission associated with AGN, and star-forming galaxies become more prominent below 3 mJy compared to AGN. I employ machine learning to separate the stellar from the extragalactic in the Magellanic Clouds. The t-SNE algorithm is used with multi-wavelength data from Gaia EDR3, VISTA survey of the Magellanic Clouds (VMC), AllWISE and ASKAP to cluster similar radio sources together. This separates AGN, galaxies, blazars and stellar sources. The probabilistic random forest classifier is trained on known sources with data from optical to mid-IR. This yielded accuracies of 0.93 ± 0.01 (SMC) and 0.91 ± 0.01 (LMC) when tested on known sources. I classify the 31,169,627 sources in the VMC SMC field to find that classes distribute across colour-colour plots and the SMC field as expected, except for in the highest density regions where there is an over-density of AGN due to blending and photometry mismatches. Following the discovery of SAGE0536AGN (z ∼ 0.14), with the strongest 10-μm silicate emission ever observed for an AGN, I discovered SAGE0534AGN (z ∼ 1.01), a similar AGN but with less extreme silicate emission. Both originally mistaken as evolved stars in the Magellanic Clouds. Lack of star-formation implies we are seeing the central engine of the AGN without contribution from the host galaxy. They could be a key link in galaxy evolution. I searched for more of these sources using the SMC t- SNE clusters to find they are grouped with AGN (0.13 < z < 1.23) separated from the rest, suggesting a rare class. Their host galaxies appear to be either in or transitioning into the green valley, where AGN properties, such as the torus width, X-ray luminosity, radio loudness/spectral index and Eddington ratio, appear to be tracing the transition

    National Astronomy Meeting 2019 Abstract Book

    Get PDF
    The National Astronomy Meeting 2019 Abstract Book. Abstracts accepted and presented, including both oral and poster presentations, at the Royal Astronomical Society's NAM2019 conference, held at Lancaster University between 30 June and 4 July 2019
    corecore