37 research outputs found
Unsupervised Machine Learning for the Classification of Astrophysical X-ray Sources
The automatic classification of X-ray detections is a necessary step in
extracting astrophysical information from compiled catalogs of astrophysical
sources. Classification is useful for the study of individual objects,
statistics for population studies, as well as for anomaly detection, i.e., the
identification of new unexplored phenomena, including transients and spectrally
extreme sources. Despite the importance of this task, classification remains
challenging in X-ray astronomy due to the lack of optical counterparts and
representative training sets. We develop an alternative methodology that
employs an unsupervised machine learning approach to provide probabilistic
classes to Chandra Source Catalog sources with a limited number of labeled
sources, and without ancillary information from optical and infrared catalogs.
We provide a catalog of probabilistic classes for 8,756 sources, comprising a
total of 14,507 detections, and demonstrate the success of the method at
identifying emission from young stellar objects, as well as distinguishing
between small-scale and large-scale compact accretors with a significant level
of confidence. We investigate the consistency between the distribution of
features among classified objects and well-established astrophysical hypotheses
such as the unified AGN model. This provides interpretability to the
probabilistic classifier. Code and tables are available publicly through
GitHub. We provide a web playground for readers to explore our final
classification at https://umlcaxs-playground.streamlit.app.Comment: 21 pages, 11 figures. Accepted in MNRA
The submillimetre and near-infrared properties of Herschel -ATLAS sources.
PhDIn this thesis I investigate a sample of galaxies that are detected with the Herschel
Space Telescope in the sub-millimetre wavelength range and that also have near-infrared
detections with the VISTA telescope in Chile as part of the VIKING survey.
The first necessity is to find the near-infrared galaxies that are most likely to be
the counterparts to the Herschel galaxies. I accomplish this by using a likelihood ratio
method which I modify to allow for an appropriate estimate of the probability of finding
a genuine near-infrared counterpart above the magnitude limit to a SPIRE source. This
probability is found to be Q0 0.73. 51% of the SPIRE sources have a best VIKING
counterpart with a reliability R = 0.8, and the false identification rate of these is
estimated to be 4.2%. I expect to miss 5 per cent of true VIKING counterparts.
There is evidence from Z - J and J - Ks colours that the reliable counterparts to SPIRE
galaxies are marginally redder than the field population.
I obtain photometric redshifts for 68% of all (non-stellar) VIKING candidates
with a median redshift of Ëœz = 0.405. I have spectroscopic redshifts for 3147 ( 28%)
of the reliable counterparts from existing redshift surveys. Comparing to the results of
the optical identifications supplied with the Phase I catalogue, I find that the use of
medium-deep near-infrared data improves the identification rate of reliable counterparts
from 36% to 51%.
I investigate the evolution of the sub-millimetre luminosity function (LF) using the
sample of SPIRE sources with reliable counterparts in VIKING with z 1. I find
strong evolution of the 250 μm LF out to about redshift z = 0.6 and possibly out to
z = 0.8 in broad agreement with previous studies. A double-power law seems to fit the
local LF (z 0.2) slightly better than a Schechter function and we find a flatter slope
at lower luminosities as compared to recent studies.
i
Finally, I construct the star formation rates (SFR) from far-infrared (FIR) and
ultra-violet (UV) luminosities of the SPIRE sample with reliable VIKING counterparts
(SFRFIR and SFRUV respectively) and show that the contribution of the SFRFIR
increases with increasing luminosity. UV observations are hence crucial for all but the
brightest SPIRE galaxies in calculating a total SFR. Calculating the slope of the UV
continuum and comparing with the ratio LFIR/LUV leads to dust attenuation corrected
SFRUV that represent the total SFR well in the low to medium LFIR range.STF
Statistical Methods in the Era of Large Astronomical Surveys
Statistical methods play a crucial role in modern astronomical research. The development and understanding of these methods will be of fundamental importance to future work on large astronomical surveys. In this thesis I showcase three different statistical approaches to survey data. I first apply a semi-supervised dimensionality reduction technique to cluster similar high resolution spectra from the GALAH survey to identify 54 candidate extremely metal-poor stars. The approach shows promising potential for implementation in future large-scale stellar spectroscopic surveys. Next, I employ a method to classify sources in the Gaia survey as stars, galaxies or quasars, making use of additional infrared photometry from CatWISE2020 and discussing the importance of applying adjusted priors to probabilistic classification. Lastly, I utilise a method to estimate the rotational parameters of star clusters in Gaia, with an application to open clusters. This is done by considering the rotation of a cluster as a 3D solid body, and finding the best fitting parameters by sampling constructed likelihood functions. The methods developed in this thesis underscore the significant contributions statistical methodologies make to astronomy, and illustrate how the development and application of statistical methods will be essential for extracting meaningful insights from future large scale astronomical surveys
On the application of machine learning approaches in astronomy: Exploring novel representations of high-dimensional and complex astronomical data
The goal of the presented work is the application of data-driven methods on complex and high-
dimensional astronomical databases. The focus of the work is the exploration of novel data
representations in order to enable the use of statistical learning approaches in the analysis of
data. With the help of diverse science cases, the advantages of the introduced approaches for
classication, visualization and regression tasks are shown by applying the developed methodology
to astronomical data.
In the first part, an alternative approach for estimating redshifts of spectra by using the
knowledge about the redshifts provided by the SDSS pipeline is presented. A novel data repre-
sentation is employed which contains only information relevant for estimating the redshift and
the detection of multiple redshift systems. Subsequently, a novel data representation for regu-
larly sampled light curves based on recurrent networks is presented. This allows an explorative
investigation of huge databases with unlabeled data. Finally, a new way of representing the static
part of irregularly sampled light curves by a mixture of Gaussians is discussed. This represen-
tation is more general than the extraction of features, as it allows the inclusion of photometric
uncertainties and avoids the introduction of observational biases
Radio and infrared surveys for active galactic nuclei behind the Magellanic Clouds
I present an analysis of a new 120 deg2 radio continuum image of the Large Magellanic Cloud (LMC) at 888 MHz with a bandwidth of 288 MHz and beam size of 13.′′9×12.′′1, from the Australian Square Kilometre Array Pathfinder (ASKAP). I constructed a catalogue of 54,612 sources reaching down to <0.2 mJy and explore the sources by cross-matching with surveys at other wavelengths. I find sources are predominantly extragalactic, display synchrotron emission associated with AGN, and star-forming galaxies become more prominent below 3 mJy compared to AGN. I employ machine learning to separate the stellar from the extragalactic in the Magellanic Clouds. The t-SNE algorithm is used with multi-wavelength data from Gaia EDR3, VISTA survey of the Magellanic Clouds (VMC), AllWISE and ASKAP to cluster similar radio sources together. This separates AGN, galaxies, blazars and stellar sources. The probabilistic random forest classifier is trained on known sources with data from optical to mid-IR. This yielded accuracies of 0.93 ± 0.01 (SMC) and 0.91 ± 0.01 (LMC) when tested on known sources. I classify the 31,169,627 sources in the VMC SMC field to find that classes distribute across colour-colour plots and the SMC field as expected, except for in the highest density regions where there is an over-density of AGN due to blending and photometry mismatches. Following the discovery of SAGE0536AGN (z ∼ 0.14), with the strongest 10-μm silicate emission ever observed for an AGN, I discovered SAGE0534AGN (z ∼ 1.01), a similar AGN but with less extreme silicate emission. Both originally mistaken as evolved stars in the Magellanic Clouds. Lack of star-formation implies we are seeing the central engine of the AGN without contribution from the host galaxy. They could be a key link in galaxy evolution. I searched for more of these sources using the SMC t- SNE clusters to find they are grouped with AGN (0.13 < z < 1.23) separated from the rest, suggesting a rare class. Their host galaxies appear to be either in or transitioning into the green valley, where AGN properties, such as the torus width, X-ray luminosity, radio loudness/spectral index and Eddington ratio, appear to be tracing the transition
National Astronomy Meeting 2019 Abstract Book
The National Astronomy Meeting 2019 Abstract Book. Abstracts accepted and presented, including both oral and poster presentations, at the Royal Astronomical Society's NAM2019 conference, held at Lancaster University between 30 June and 4 July 2019