203 research outputs found

    Radio Galaxy Zoo: Knowledge Transfer Using Rotationally Invariant Self-Organising Maps

    With the advent of large scale surveys the manual analysis and classification of individual radio source morphologies is rendered impossible as existing approaches do not scale. The analysis of complex morphological features in the spatial domain is a particularly important task. Here we discuss the challenges of transferring crowdsourced labels obtained from the Radio Galaxy Zoo project and introduce a proper transfer mechanism via quantile random forest regression. By using parallelized rotation and flipping invariant Kohonen-maps, image cubes of Radio Galaxy Zoo selected galaxies formed from the FIRST radio continuum and WISE infrared all sky surveys are first projected down to a two-dimensional embedding in an unsupervised way. This embedding can be seen as a discretised space of shapes with the coordinates reflecting morphological features as expressed by the automatically derived prototypes. We find that these prototypes have reconstructed physically meaningful processes across two channel images at radio and infrared wavelengths in an unsupervised manner. In the second step, images are compared with those prototypes to create a heat-map, which is the morphological fingerprint of each object and the basis for transferring the user generated labels. These heat-maps have reduced the feature space by a factor of 248 and are able to be used as the basis for subsequent ML methods. Using an ensemble of decision trees we achieve upwards of 85.7% and 80.7% accuracy when predicting the number of components and peaks in an image, respectively, using these heat-maps. We also question the currently used discrete classification schema and introduce a continuous scale that better reflects the uncertainty in transition between two classes, caused by sensitivity and resolution limits

    Beyond the hubble sequence – exploring galaxy morphology with unsupervised machine learning

    We explore unsupervised machine learning for galaxy morphology analyses using a combination of feature extraction with a vector-quantized variational autoencoder (VQ-VAE) and hierarchical clustering (HC). We propose a new methodology that includes: (1) consideration of the clustering performance simultaneously when learning features from images; (2) allowing for various distance thresholds within the HC algorithm; (3) using the galaxy orientation to determine the number of clusters. This set-up provides 27 clusters created with this unsupervised learning that we show are well separated based on galaxy shape and structure (e.g. Sérsic index, concentration, asymmetry, Gini coefficient). These resulting clusters also correlate well with physical properties such as the colour–magnitude diagram, and span the range of scaling relations such as mass versus size amongst the different machine-defined clusters. When we merge these multiple clusters into two large preliminary clusters to provide a binary classification, an accuracy of ∼87 per cent is reached using an imbalanced data set, matching real galaxy distributions, which includes 22.7 per cent early-type galaxies and 77.3 per cent late-type galaxies. Comparing the given clusters with classic Hubble types (ellipticals, lenticulars, early spirals, late spirals, and irregulars), we show that there is an intrinsic vagueness in visual classification systems, in particular galaxies with transitional features such as lenticulars and early spirals. Based on this, the main result in this work is not how well our unsupervised method matches visual classifications and physical properties, but that the method provides an independent classification that may be more physically meaningful than any visually based ones

    Structured Variational Inference for Simulating Populations of Radio Galaxies

    We present a model for generating postage stamp images of synthetic Fanaroff-Riley Class I and Class II radio galaxies suitable for use in simulations of future radio surveys such as those being developed for the Square Kilometre Array. This model uses a fully-connected neural network to implement structured variational inference through a variational auto-encoder and decoder architecture. In order to optimise the dimensionality of the latent space for the auto-encoder we introduce the radio morphology inception score (RAMIS), a quantitative method for assessing the quality of generated images, and discuss in detail how data pre-processing choices can affect the value of this measure. We examine the 2-dimensional latent space of the VAEs and discuss how this can be used to control the generation of synthetic populations, whilst also cautioning how it may lead to biases when used for data augmentation.Comment: 20 pages, 20 figures, accepted MNRA

    Galaxy morphological classification in deep-wide surveys via unsupervised machine learning

    Accepted versionGalaxy morphology is a fundamental quantity, that is essential not only for the full spectrum of galaxy-evolution studies, but also for a plethora of science in observational cosmology. While a rich literature exists on morphological-classification techniques, the unprecedented data volumes, coupled, in some cases, with the short cadences of forthcoming 'Big-Data' surveys (e.g. from the LSST), present novel challenges for this field. Large data volumes make such datasets intractable for visual inspection (even via massively-distributed platforms like Galaxy Zoo), while short cadences make it difficult to employ techniques like supervised machine-learning, since it may be impractical to repeatedly produce training sets on short timescales. Unsupervised machine learning, which does not require training sets, is ideally suited to the morphological analysis of new and forthcoming surveys. Here, we employ an algorithm that performs clustering of graph representations, in order to group image patches with similar visual properties and objects constructed from those patches, like galaxies. We implement the algorithm on the Hyper-Suprime-Cam Subaru-Strategic-Program Ultra-Deep survey, to autonomously reduce the galaxy population to a small number (160) of 'morphological clusters', populated by galaxies with similar morphologies, which are then benchmarked using visual inspection. The morphological classifications (which we release publicly) exhibit a high level of purity, and reproduce known trends in key galaxy properties as a function of morphological type at zPeer reviewe

    Identifying strong lenses with unsupervised machine learning using convolutional autoencoder

    In this paper, we develop a new unsupervised machine learning technique comprised of a feature extractor, a convolutional autoencoder, and a clustering algorithm consisting of a Bayesian Gaussian mixture model. We apply this technique to visual band space-based simulated imaging data from the Euclid Space Telescope using data from the strong gravitational lenses finding challenge. Our technique promisingly captures a variety of lensing features such as Einstein rings with different radii, distorted arc structures, etc., without using predefined labels. After the clustering process, we obtain several classification clusters separated by different visual features which are seen in the images. Our method successfully picks up 3c63 per cent of lensing images from all lenses in the training set. With the assumed probability proposed in this study, this technique reaches an accuracy of 77.25 \ub1 0.48 per cent in binary classification using the training set. Additionally, our unsupervised clustering process can be used as the preliminary classification for future surveys of lenses to efficiently select targets and to speed up the labelling process. As the starting point of the astronomical application using this technique, we not only explore the application to gravitationally lensed systems, but also discuss the limitations and potential future uses of this technique

    On the Key Processes that Drive Galaxy Evolution: the Role of Galaxy Mergers, Accretion, Local Environment and Feedback in Shaping the Present-Day Universe

    The study of galaxy evolution is a fundamental discipline in modern astrophysics, dealing with how and why galaxies of all types evolve over time. The diversity of present-day galaxies is a reflection of the processes through which these populations were assembled and offers insights into how these processes influence and regulate their mass assembly over the lifetime of the Universe. The currently favoured hierarchical paradigm of structure formation hypothesises that much of a galaxy’s evolution must be driven by mergers. It is therefore important to understand the role of the merger process in shaping the galaxy populations in today’s Universe. Together with data from large observational surveys, statistical studies of galaxy evolution rely on comparison to simulations, which can be used to make realistic survey-scale predictions. Together these two approaches can offer powerful insights into the processes that drive galaxy evolution over cosmic time. I have used the Horizon-AGN simulation to study the effect of galaxy mergers on the stellar populations and central super-massive black holes of galaxies over cosmic time. I have shown that, while mergers can enhance star formation and black-hole growth significantly in the low redshift Universe, these enhancements are small at high redshift when the cosmic SFH peaks. This is because galaxies are already gas-rich at early epochs and mergers are not able to increase gas densities in the central regions of the galaxy. As a result, mergers are directly responsible for creating only around 30 per cent of the stellar mass and black-hole mass found and in today’s galaxies and that mergers never dominate the budget (e.g. ~35 and ~20 per cent of star formation at z~3 and z~1 respectively are a result of mergers). Notwithstanding their relatively minor role in driving stellar and BH mass growth, mergers are important drivers of morphological change, with major and minor mergers accounting for essentially all (95 per cent) of the morphological change experienced by massive present-day spheroids over their lifetime. However, at a given stellar mass, the average merger histories of discs and spheroids do not differ strongly enough to explain the survival of discs to the present day. Instead, their survival is largely due to a preponderance of prograde and gas rich mergers. Prograde mergers trigger milder morphological transformation than retrograde mergers - the average change due to retrograde mergers is around twice that due to their prograde counterparts at ɀ ~ 0 and remnant morphology also depends strongly on the gas fraction of a merger, with gas-rich mergers routinely re-growing discs. My results also emphasise the important role of minor mergers, which dominate the stellar mass and black-hole growth budget after ɀ = 1 and are a potentially important reservoir of cold gas which plays a role in the rejuvenation and survival of discs. I have also investigated the biases that this morphological evolution produces in observational studies of galaxy populations. In particular, I have shown that ‘progenitor bias’ i.e. the bias produced by using only early-type galaxies to define the progenitor population of today’s early-types, is a significant problem at all but the lowest redshifts and an important considerations for large, deep observational surveys (JWST, LSST etc.). For example while early-types attain their final morphology at relatively early epochs – by ɀ ~ 1, around 60 per cent of today’s early-types have had their last significant merger, progenitor bias is severe at all but the lowest redshifts. At ɀ ~ 0.6, less than 50 per cent of the stellar mass in today’s early-types is actually in progenitors with early-type morphology, while, at the peak epoch of cosmic of star-formation (ɀ ~ 2), studying only early-types misses almost all (80 per cent) of the stellar mass that eventually ends up in local early-type systems. I have explored the significance and formation mechanisms of low-surface-brightness galaxies (LSBGs). For M ͙ > 108Mʘ, LSBGs contribute 50 per cent of the local number density and exist in significant numbers across all environments. Their progenitors have stronger, burstier star formation at high redshift which causes stronger supernova feedback. This feedback flattens the gas-density profiles (but does not remove the gas reservoirs). This, in turn, gives rise to flatter stellar profiles, which are more susceptible to environmental processes and galaxy interactions, which produce today’s LSBG populations by driving the steady removal of cold gas and gradually increasing galaxy effective radii over time. The ability of these populations to elucidate key questions in the field of galaxy evolution and significantly alter our current paradigm is becoming increasingly clear, especially with the advent of new deep surveys. Finally, I have implemented a new unsupervised machine learning technique (UML) on images from the Hyper-Suprime-Cam Subaru-Strategic-Program Ultra-Deep survey. The algorithm autonomously reduces galaxy populations down to a small number of ‘morphological clusters’, populated by galaxies with similar morphologies, which are then benchmarked using visual inspection. The morphological classifications reproduce known trends in key galaxy properties as a function of morphological type (e.g. stellar mass functions and colours). This study demonstrates the power of UML in performing accurate morphological analysis, which will become indispensable in the forthcoming era of deep-wide surveys

    Automated Quantitative Description of Spiral Galaxy Arm-Segment Structure

    We describe a system for the automatic quantification of structure in spiral galaxies. This enables translation of sky survey images into data needed to help address fundamental astrophysical questions such as the origin of spiral structure---a phenomenon that has eluded theoretical description despite 150 years of study (Sellwood 2010). The difficulty of automated measurement is underscored by the fact that, to date, only manual efforts (such as the citizen science project Galaxy Zoo) have been able to extract information about large samples of spiral galaxies. An automated approach will be needed to eliminate measurement subjectivity and handle the otherwise-overwhelming image quantities (up to billions of images) from near-future surveys. Our approach automatically describes spiral galaxy structure as a set of arcs, precisely describing spiral arm segment arrangement while retaining the flexibility needed to accommodate the observed wide variety of spiral galaxy structure. The largest existing quantitative measurements were manually-guided and encompassed fewer than 100 galaxies, while we have already applied our method to more than 29,000 galaxies. Our output matches previous information, both quantitatively over small existing samples, and qualitatively against human classifications from Galaxy Zoo.Comment: 9 pages;4 figures; 2 tables; accepted to CVPR (Computer Vision and Pattern Recognition), June 2012, Providence, Rhode Island, June 16-21, 201

    Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies

    Modern astronomical surveys are producing datasets of unprecedented size and richness, increasing the potential for high-impact scientific discovery. This possibility, coupled with the challenge of exploring a large number of sources, has led to the development of novel machine-learning-based anomaly detection approaches, such as Astronomaly. For the first time, we test the scalability of Astronomaly by applying it to almost 4 million images of galaxies from the Dark Energy Camera Legacy Survey. We use a trained deep learning algorithm to learn useful representations of the images and pass these to the anomaly detection algorithm isolation forest, coupled with Astronomaly's active learning method, to discover interesting sources. We find that data selection criteria have a significant impact on the trade-off between finding rare sources such as strong lenses and introducing artefacts into the dataset. We demonstrate that active learning is required to identify the most interesting sources and reduce artefacts, while anomaly detection methods alone are insufficient. Using Astronomaly, we find 1635 anomalies among the top 2000 sources in the dataset after applying active learning, including 8 strong gravitational lens candidates, 1609 galaxy merger candidates, and 18 previously unidentified sources exhibiting highly unusual morphology. Our results show that by leveraging the human-machine interface, Astronomaly is able to rapidly identify sources of scientific interest even in large datasets.Comment: 15 pages, 9 figures. Comments welcome, especially suggestions about the anomalous source