11 research outputs found
Radio Galaxy Zoo: Knowledge Transfer Using Rotationally Invariant Self-Organising Maps
With the advent of large scale surveys the manual analysis and classification
of individual radio source morphologies is rendered impossible as existing
approaches do not scale. The analysis of complex morphological features in the
spatial domain is a particularly important task. Here we discuss the challenges
of transferring crowdsourced labels obtained from the Radio Galaxy Zoo project
and introduce a proper transfer mechanism via quantile random forest
regression. By using parallelized rotation and flipping invariant Kohonen-maps,
image cubes of Radio Galaxy Zoo selected galaxies formed from the FIRST radio
continuum and WISE infrared all sky surveys are first projected down to a
two-dimensional embedding in an unsupervised way. This embedding can be seen as
a discretised space of shapes with the coordinates reflecting morphological
features as expressed by the automatically derived prototypes. We find that
these prototypes have reconstructed physically meaningful processes across two
channel images at radio and infrared wavelengths in an unsupervised manner. In
the second step, images are compared with those prototypes to create a
heat-map, which is the morphological fingerprint of each object and the basis
for transferring the user generated labels. These heat-maps have reduced the
feature space by a factor of 248 and are able to be used as the basis for
subsequent ML methods. Using an ensemble of decision trees we achieve upwards
of 85.7% and 80.7% accuracy when predicting the number of components and peaks
in an image, respectively, using these heat-maps. We also question the
currently used discrete classification schema and introduce a continuous scale
that better reflects the uncertainty in transition between two classes, caused
by sensitivity and resolution limits
Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy
Astrophysics and cosmology are rich with data. The advent of wide-area
digital cameras on large aperture telescopes has led to ever more ambitious
surveys of the sky. Data volumes of entire surveys a decade ago can now be
acquired in a single night and real-time analysis is often desired. Thus,
modern astronomy requires big data know-how, in particular it demands highly
efficient machine learning and image analysis algorithms. But scalability is
not the only challenge: Astronomy applications touch several current machine
learning research questions, such as learning from biased data and dealing with
label and measurement noise. We argue that this makes astronomy a great domain
for computer science research, as it pushes the boundaries of data analysis. In
the following, we will present this exciting application area for data
scientists. We will focus on exemplary results, discuss main challenges, and
highlight some recent methodological advancements in machine learning and image
analysis triggered by astronomical applications
Cataloging the radio-sky with unsupervised machine learning: a new approach for the SKA era
We develop a new analysis approach towards identifying related radio
components and their corresponding infrared host galaxy based on unsupervised
machine learning methods. By exploiting PINK, a self-organising map algorithm,
we are able to associate radio and infrared sources without the a priori
requirement of training labels. We present an example of this method using
images from the FIRST and WISE surveys centred towards positions
described by the FIRST catalogue. We produce a set of catalogues that
complement FIRST and describe 802,646 objects, including their radio components
and their corresponding AllWISE infrared host galaxy. Using these data products
we (i) demonstrate the ability to identify objects with rare and unique radio
morphologies (e.g. 'X'-shaped galaxies, hybrid FR-I/FR-II morphologies), (ii)
can identify the potentially resolved radio components that are associated with
a single infrared host and (iii) introduce a "curliness" statistic to search
for bent and disturbed radio morphologies, and (iv) extract a set of 17 giant
radio galaxies between 700-1100 kpc. As we require no training labels, our
method can be applied to any radio-continuum survey, provided a sufficiently
representative SOM can be trained
Advances on the morphological classification of radio galaxiesreview: A review
Modern radio telescopes will generate, on a daily basis, data sets on the scale of exabytes for systems like the Square Kilometre Array (SKA). Massive data sets are a source of unknown and rare astrophysical phenomena that lead to discoveries. Nonetheless, this is only plausible with the exploitation of machine learning to complement human-aided and traditional statistical techniques. Recently, there has been a surge in scientific publications focusing on the use of machine/deep learning in radio astronomy, addressing challenges such as source extraction, morphological classification, and anomaly detection. This study provides a comprehensive and concise overview of the use of machine learning techniques for the morphological classification of radio galaxies. It summarizes the recent literature on this topic, highlighting the main challenges, achievements, state-of-the-art methods, and the future research directions in the field. The application of machine learning in radio astronomy has led to a new paradigm shift and a revolution in the automation of complex data processes. However, the optimal exploitation of machine/deep learning in radio astronomy, calls for continued collaborative efforts in the creation of high-resolution annotated data sets. This is especially true in the case of modern telescopes like MeerKAT and the LOw-Frequency ARray (LOFAR). Additionally, it is important to consider the potential benefits of utilizing multi-channel data cubes and algorithms that can leverage massive datasets without relying solely on annotated datasets for radio galaxy classification.<br/
A statistical approach to automated detection of multi-component radio sources
Advances in radio astronomy are allowing for deeper and wider areas of the sky to be observed than ever before. Source counts of future radio surveys are expected to number in the tens of millions. Source finding techniques are used to identify sources in a radio image, however, these techniques identify single distinct sources and are challenged to identify multi-component sources, that is to say, where two or more distinct sources belong to the same underlying physical phenomenon, such as a radio galaxy. Identification of such phenomena is an important step in generating catalogues from surveys on which much of the radio astronomy science is based. Historically, identifying multi-component sources was conducted by visual inspection, however, the size of future surveys makes manual identification prohibitive. An algorithm to automate this process using statistical techniques is proposed. The algorithm is demonstrated on two radio images. The output of the algorithm is a catalogue where nearest neighbour source pairs are assigned a probability score of being a component of the same physical object. By applying several selection criteria, pairs of sources which are likely to be multi-component sources can be determined. Radio image cutouts are then generated from this selection and may be used as input into radio source classification techniques. Successful identification of multi-component sources using this method is demonstrated
Unsupervised machine learning clustering and data exploration of radio-astronomical images
In this thesis, I demonstrate a novel and efficient unsupervised clustering and data exploration method with the combination of a Self-Organising Map (SOM) and a Convolutional Autoencoder, applied to radio-astronomical images from the Radio Galaxy Zoo (RGZ) dataset. The rapidly increasing volume and complexity of radio-astronomical data have ushered in a new era of big-data astronomy which has increased the demand for Machine Learning (ML) solutions. In this era, the sheer amount of image data produced with modern instruments and has resulted in a significant data deluge. Furthermore, the morphologies of objects captured in these radio-astronomical images are highly complex and challenging to classify conclusively due to their intricate and indiscrete nature. Additionally, major radio-astronomical discoveries are unplanned and found in the unexpected, making unsupervised ML highly desirable by operating with few assumptions and without labelled training data. In this thesis, I developed a novel unsupervised ML approach as a practical solution to these astronomy challenges. Using this system, I demonstrated the use of convolutional autoencoders and SOM’s as a dimensionality reduction method to delineate the complexity and volume of astronomical data. My optimised system shows that the coupling of these methods is a powerful method of data exploration and unsupervised clustering of radio-astronomical images. The results of this thesis show this approach is capable of accurately separating features by complexity on a SOM manifold and unified distance matrix with neighbourhood similarity and hierarchical clustering of the mapped astronomical features. This method provides an effective means to explore the high-level topological relationships of image features and morphology in large datasets automatically with minimal processing time and computational resources. I achieved these capabilities with a new and innovative method of SOM training using the autoencoder compressed latent feature vector representations of radio-astronomical data, rather than raw images. Using this system, I successfully investigated SOM affine transformation invariance and analysed the true nature of rotational effects on this manifold using autoencoder random rotation training augmentations. Throughout this thesis, I present my method as a powerful new approach to data exploration technique and contribution to the field. The speed and effectiveness of this method indicates excellent scalability and holds implications for use on large future surveys, large-scale instruments such as the Square Kilometre Array and in other big-data and complexity analysis applications
Probabilistic photometric redshift estimation in massive digital sky surveys via machine learning
The problem of photometric redshift estimation is a major subject in astronomy, since the need of estimating distances for a huge number of sources, as required by the data deluge of the recent years. The ability to estimate redshifts through spectroscopy does not scale with this avalanche of data. Photometric redshifts provide the required redshift estimates at the cost of some precision. The success of several forthcoming missions is highly dependent on the availability of photometric redshifts.
The purpose of this thesis is to provide innovative methods for photometric redshift estimation. Two models are proposed. The first is fully-automatized, based on the combination of a convolutional neural network with a mixture density network, to predict probabilistic multimodal redshifts directly from images. The second model is features-based, performing a massive combination of photometric parameters to apply a forward selection in a huge feature space. The proposed models perform very efficiently compared to some of the most common models used in the literature. An important part of the work is dedicated to the correct estimation of the errors and prediction quality.
The proposed models are very general and can be applied to different topics in astronomy and beyond
Parallelized rotation and flipping INvariant Kohonen maps (PINK) on GPUs
Contains fulltext :
159586.pdf (publisher's version ) (Open Access)ESANN 2016 : European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 27-29 April 201