11 research outputs found

    Radio Galaxy Zoo: Knowledge Transfer Using Rotationally Invariant Self-Organising Maps

    Full text link
    With the advent of large scale surveys the manual analysis and classification of individual radio source morphologies is rendered impossible as existing approaches do not scale. The analysis of complex morphological features in the spatial domain is a particularly important task. Here we discuss the challenges of transferring crowdsourced labels obtained from the Radio Galaxy Zoo project and introduce a proper transfer mechanism via quantile random forest regression. By using parallelized rotation and flipping invariant Kohonen-maps, image cubes of Radio Galaxy Zoo selected galaxies formed from the FIRST radio continuum and WISE infrared all sky surveys are first projected down to a two-dimensional embedding in an unsupervised way. This embedding can be seen as a discretised space of shapes with the coordinates reflecting morphological features as expressed by the automatically derived prototypes. We find that these prototypes have reconstructed physically meaningful processes across two channel images at radio and infrared wavelengths in an unsupervised manner. In the second step, images are compared with those prototypes to create a heat-map, which is the morphological fingerprint of each object and the basis for transferring the user generated labels. These heat-maps have reduced the feature space by a factor of 248 and are able to be used as the basis for subsequent ML methods. Using an ensemble of decision trees we achieve upwards of 85.7% and 80.7% accuracy when predicting the number of components and peaks in an image, respectively, using these heat-maps. We also question the currently used discrete classification schema and introduce a continuous scale that better reflects the uncertainty in transition between two classes, caused by sensitivity and resolution limits

    Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy

    Get PDF
    Astrophysics and cosmology are rich with data. The advent of wide-area digital cameras on large aperture telescopes has led to ever more ambitious surveys of the sky. Data volumes of entire surveys a decade ago can now be acquired in a single night and real-time analysis is often desired. Thus, modern astronomy requires big data know-how, in particular it demands highly efficient machine learning and image analysis algorithms. But scalability is not the only challenge: Astronomy applications touch several current machine learning research questions, such as learning from biased data and dealing with label and measurement noise. We argue that this makes astronomy a great domain for computer science research, as it pushes the boundaries of data analysis. In the following, we will present this exciting application area for data scientists. We will focus on exemplary results, discuss main challenges, and highlight some recent methodological advancements in machine learning and image analysis triggered by astronomical applications

    Cataloging the radio-sky with unsupervised machine learning: a new approach for the SKA era

    Full text link
    We develop a new analysis approach towards identifying related radio components and their corresponding infrared host galaxy based on unsupervised machine learning methods. By exploiting PINK, a self-organising map algorithm, we are able to associate radio and infrared sources without the a priori requirement of training labels. We present an example of this method using 894,415894,415 images from the FIRST and WISE surveys centred towards positions described by the FIRST catalogue. We produce a set of catalogues that complement FIRST and describe 802,646 objects, including their radio components and their corresponding AllWISE infrared host galaxy. Using these data products we (i) demonstrate the ability to identify objects with rare and unique radio morphologies (e.g. 'X'-shaped galaxies, hybrid FR-I/FR-II morphologies), (ii) can identify the potentially resolved radio components that are associated with a single infrared host and (iii) introduce a "curliness" statistic to search for bent and disturbed radio morphologies, and (iv) extract a set of 17 giant radio galaxies between 700-1100 kpc. As we require no training labels, our method can be applied to any radio-continuum survey, provided a sufficiently representative SOM can be trained

    Advances on the morphological classification of radio galaxiesreview: A review

    Get PDF
    Modern radio telescopes will generate, on a daily basis, data sets on the scale of exabytes for systems like the Square Kilometre Array (SKA). Massive data sets are a source of unknown and rare astrophysical phenomena that lead to discoveries. Nonetheless, this is only plausible with the exploitation of machine learning to complement human-aided and traditional statistical techniques. Recently, there has been a surge in scientific publications focusing on the use of machine/deep learning in radio astronomy, addressing challenges such as source extraction, morphological classification, and anomaly detection. This study provides a comprehensive and concise overview of the use of machine learning techniques for the morphological classification of radio galaxies. It summarizes the recent literature on this topic, highlighting the main challenges, achievements, state-of-the-art methods, and the future research directions in the field. The application of machine learning in radio astronomy has led to a new paradigm shift and a revolution in the automation of complex data processes. However, the optimal exploitation of machine/deep learning in radio astronomy, calls for continued collaborative efforts in the creation of high-resolution annotated data sets. This is especially true in the case of modern telescopes like MeerKAT and the LOw-Frequency ARray (LOFAR). Additionally, it is important to consider the potential benefits of utilizing multi-channel data cubes and algorithms that can leverage massive datasets without relying solely on annotated datasets for radio galaxy classification.<br/

    A statistical approach to automated detection of multi-component radio sources

    Get PDF
    Advances in radio astronomy are allowing for deeper and wider areas of the sky to be observed than ever before. Source counts of future radio surveys are expected to number in the tens of millions. Source finding techniques are used to identify sources in a radio image, however, these techniques identify single distinct sources and are challenged to identify multi-component sources, that is to say, where two or more distinct sources belong to the same underlying physical phenomenon, such as a radio galaxy. Identification of such phenomena is an important step in generating catalogues from surveys on which much of the radio astronomy science is based. Historically, identifying multi-component sources was conducted by visual inspection, however, the size of future surveys makes manual identification prohibitive. An algorithm to automate this process using statistical techniques is proposed. The algorithm is demonstrated on two radio images. The output of the algorithm is a catalogue where nearest neighbour source pairs are assigned a probability score of being a component of the same physical object. By applying several selection criteria, pairs of sources which are likely to be multi-component sources can be determined. Radio image cutouts are then generated from this selection and may be used as input into radio source classification techniques. Successful identification of multi-component sources using this method is demonstrated

    Unsupervised machine learning clustering and data exploration of radio-astronomical images

    Get PDF
    In this thesis, I demonstrate a novel and efficient unsupervised clustering and data exploration method with the combination of a Self-Organising Map (SOM) and a Convolutional Autoencoder, applied to radio-astronomical images from the Radio Galaxy Zoo (RGZ) dataset. The rapidly increasing volume and complexity of radio-astronomical data have ushered in a new era of big-data astronomy which has increased the demand for Machine Learning (ML) solutions. In this era, the sheer amount of image data produced with modern instruments and has resulted in a significant data deluge. Furthermore, the morphologies of objects captured in these radio-astronomical images are highly complex and challenging to classify conclusively due to their intricate and indiscrete nature. Additionally, major radio-astronomical discoveries are unplanned and found in the unexpected, making unsupervised ML highly desirable by operating with few assumptions and without labelled training data. In this thesis, I developed a novel unsupervised ML approach as a practical solution to these astronomy challenges. Using this system, I demonstrated the use of convolutional autoencoders and SOM’s as a dimensionality reduction method to delineate the complexity and volume of astronomical data. My optimised system shows that the coupling of these methods is a powerful method of data exploration and unsupervised clustering of radio-astronomical images. The results of this thesis show this approach is capable of accurately separating features by complexity on a SOM manifold and unified distance matrix with neighbourhood similarity and hierarchical clustering of the mapped astronomical features. This method provides an effective means to explore the high-level topological relationships of image features and morphology in large datasets automatically with minimal processing time and computational resources. I achieved these capabilities with a new and innovative method of SOM training using the autoencoder compressed latent feature vector representations of radio-astronomical data, rather than raw images. Using this system, I successfully investigated SOM affine transformation invariance and analysed the true nature of rotational effects on this manifold using autoencoder random rotation training augmentations. Throughout this thesis, I present my method as a powerful new approach to data exploration technique and contribution to the field. The speed and effectiveness of this method indicates excellent scalability and holds implications for use on large future surveys, large-scale instruments such as the Square Kilometre Array and in other big-data and complexity analysis applications

    Probabilistic photometric redshift estimation in massive digital sky surveys via machine learning

    Get PDF
    The problem of photometric redshift estimation is a major subject in astronomy, since the need of estimating distances for a huge number of sources, as required by the data deluge of the recent years. The ability to estimate redshifts through spectroscopy does not scale with this avalanche of data. Photometric redshifts provide the required redshift estimates at the cost of some precision. The success of several forthcoming missions is highly dependent on the availability of photometric redshifts. The purpose of this thesis is to provide innovative methods for photometric redshift estimation. Two models are proposed. The first is fully-automatized, based on the combination of a convolutional neural network with a mixture density network, to predict probabilistic multimodal redshifts directly from images. The second model is features-based, performing a massive combination of photometric parameters to apply a forward selection in a huge feature space. The proposed models perform very efficiently compared to some of the most common models used in the literature. An important part of the work is dedicated to the correct estimation of the errors and prediction quality. The proposed models are very general and can be applied to different topics in astronomy and beyond

    Parallelized rotation and flipping INvariant Kohonen maps (PINK) on GPUs

    Get PDF
    Contains fulltext : 159586.pdf (publisher's version ) (Open Access)ESANN 2016 : European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 27-29 April 201
    corecore