1,844 research outputs found

    Radio Galaxy Zoo: Knowledge Transfer Using Rotationally Invariant Self-Organising Maps

    Full text link
    With the advent of large scale surveys the manual analysis and classification of individual radio source morphologies is rendered impossible as existing approaches do not scale. The analysis of complex morphological features in the spatial domain is a particularly important task. Here we discuss the challenges of transferring crowdsourced labels obtained from the Radio Galaxy Zoo project and introduce a proper transfer mechanism via quantile random forest regression. By using parallelized rotation and flipping invariant Kohonen-maps, image cubes of Radio Galaxy Zoo selected galaxies formed from the FIRST radio continuum and WISE infrared all sky surveys are first projected down to a two-dimensional embedding in an unsupervised way. This embedding can be seen as a discretised space of shapes with the coordinates reflecting morphological features as expressed by the automatically derived prototypes. We find that these prototypes have reconstructed physically meaningful processes across two channel images at radio and infrared wavelengths in an unsupervised manner. In the second step, images are compared with those prototypes to create a heat-map, which is the morphological fingerprint of each object and the basis for transferring the user generated labels. These heat-maps have reduced the feature space by a factor of 248 and are able to be used as the basis for subsequent ML methods. Using an ensemble of decision trees we achieve upwards of 85.7% and 80.7% accuracy when predicting the number of components and peaks in an image, respectively, using these heat-maps. We also question the currently used discrete classification schema and introduce a continuous scale that better reflects the uncertainty in transition between two classes, caused by sensitivity and resolution limits

    K2 Variable Catalogue II: Machine Learning Classification of Variable Stars and Eclipsing Binaries in K2 Fields 0-4

    Get PDF
    We are entering an era of unprecedented quantities of data from current and planned survey telescopes. To maximise the potential of such surveys, automated data analysis techniques are required. Here we implement a new methodology for variable star classification, through the combination of Kohonen Self Organising Maps (SOM, an unsupervised machine learning algorithm) and the more common Random Forest (RF) supervised machine learning technique. We apply this method to data from the K2 mission fields 0-4, finding 154 ab-type RR Lyraes (10 newly discovered), 377 Delta Scuti pulsators, 133 Gamma Doradus pulsators, 183 detached eclipsing binaries, 290 semi-detached or contact eclipsing binaries and 9399 other periodic (mostly spot-modulated) sources, once class significance cuts are taken into account. We present lightcurve features for all K2 stellar targets, including their three strongest detected frequencies, which can be used to study stellar rotation periods where the observed variability arises from spot modulation. The resulting catalogue of variable stars, classes, and associated data features are made available online. We publish our SOM code in Python as part of the open source PyMVPA package, which in combination with already available RF modules can be easily used to recreate the method.Comment: Accepted for publication in MNRAS, 16 pages, 13 figures. Updated with proof corrections. Full catalogue tables available at https://www2.warwick.ac.uk/fac/sci/physics/research/astro/people/armstrong/ or at the CD

    Organised Randoms: learning and correcting for systematic galaxy clustering patterns in KiDS using self-organising maps

    Get PDF
    We present a new method for the mitigation of observational systematic effects in angular galaxy clustering via corrective random galaxy catalogues. Real and synthetic galaxy data, from the Kilo Degree Survey's (KiDS) 4th^{\rm{th}} Data Release (KiDS-10001000) and the Full-sky Lognormal Astro-fields Simulation Kit (FLASK) package respectively, are used to train self-organising maps (SOMs) to learn the multivariate relationships between observed galaxy number density and up to six systematic-tracer variables, including seeing, Galactic dust extinction, and Galactic stellar density. We then create `organised' randoms, i.e. random galaxy catalogues with spatially variable number densities, mimicking the learnt systematic density modes in the data. Using realistically biased mock data, we show that these organised randoms consistently subtract spurious density modes from the two-point angular correlation function w(ϑ)w(\vartheta), correcting biases of up to 12σ12\sigma in the mean clustering amplitude to as low as 0.1σ0.1\sigma, over a high signal-to-noise angular range of 7-100 arcmin. Their performance is also validated for angular clustering cross-correlations in a bright, flux-limited subset of KiDS-10001000, comparing against an analogous sample constructed from highly-complete spectroscopic redshift data. Each organised random catalogue object is a `clone' carrying the properties of a real galaxy, and is distributed throughout the survey footprint according to the parent galaxy's position in systematics-space. Thus, sub-sample randoms are readily derived from a single master random catalogue via the same selection as applied to the real galaxies. Our method is expected to improve in performance with increased survey area, galaxy number density, and systematic contamination, making organised randoms extremely promising for current and future clustering analyses of faint samples.Comment: 18 pages (6 appendix pages), 12 figures (8 appendix figures), submitted to A&

    Unveiling the rarest morphologies of the LOFAR Two-metre Sky Survey radio source population with self-organised maps

    Get PDF
    Context. The Low Frequency Array (LOFAR) Two-metre Sky Survey (LoTSS) is a low-frequency radiocontinuum survey of the Northern sky at an unparalleled resolution and sensitivity. Aims. In order to fully exploit this huge dataset and those produced by the Square Kilometre Array in the next decade, automated methods in machine learning and data-mining will be increasingly essential both for morphological classifications and for identifying optical counterparts to the radio sources. Methods. Using self-organising maps (SOMs), a form of unsupervised machine learning, we created a dimensionality reduction of the radio morphologies for the ∌25k extended radio continuum sources in the LoTSS first data release, which is only ∌2 percent of the final LoTSS survey. We made use of PINK, a code which extends the SOM algorithm with rotation and flipping invariance, increasing its suitability and effectiveness for training on astronomical sources. Results. After training, the SOMs can be used for a wide range of science exploitation and we present an illustration of their potential by finding an arbitrary number of morphologically rare sources in our training data (424 square degrees) and subsequently in an area of the sky (∌5300 square degrees) outside the trainingdata. Objects found in this way span a wide range of morphological and physical categories: extended jets of radio active galactic nuclei, diffuse cluster haloes and relics, and nearby spiral galaxies. Finally, to enable accessible, interactive, and intuitive data exploration, we showcase the LOFAR-PyBDSF Visualisation Tool, which allows users to explore the LoTSS dataset through the trained SOMs

    Redshift distributions of extragalactic galaxy surveys

    Get PDF
    Measurements of the large-scale structure of the universe are key observables to study fundamental physics. This thesis focuses on the calibration of photometric redshift distributions of extragalactic galaxy surveys and their impact on the analysis of large-scale structure data. First, I examine the impact of redshift distribution uncertainties on the cosmological inference from weak lensing measurements. The weak gravitational lensing effect, known as cosmic shear, distorts the shape of galaxy images due to the distribution of gravitating matter along the line sight. Thus, it provides a probe of the matter distribution in the universe. However, modelling the observed cosmic shear signal requires knowledge about the distribution of observed galaxies along the line of sight, which is usually determined through photometric redshifts. I develop a method that accurately propagates residual redshift distribution uncertainties into the weak lensing likelihood and perform a self-calibration of the redshift distribution with cosmic shear data. Second, I develop a new method to assign photometrically observed galaxies to tomographic redshift bins. The goal is to obtain compact distributions and to reduce the overlap between redshift bins caused by catastrophic outliers in the photometric redshift estimation. This is achieved by combining a self-organising map with a simulated annealing algorithm which optimises the clustering cross-correlation signal between a photometric galaxy catalogue and a spectroscopically observed sample of reference galaxies. Finally, I perform consistency tests in cosmological analyses. These tests include a study of the consistency between the constraints on cosmological parameters probed by the five tomographic bins of the Kilo-Degree Survey. Furthermore, I study the internal consistency of the ΛCDM model by dividing the model into regimes: one that describes the evolution of the isotropic background of the universe and one describing matter density perturbations. This model is constrained by cosmic shear, galaxy clustering, and cosmic microwave background measurements

    Unusual quasars from the Sloan Digital Sky Survey selected by means of Kohonen self-organising maps

    Full text link
    We exploit the spectral archive of the Sloan Digital Sky Survey (SDSS) Data Release 7 to select unusual quasar spectra. The selection method is based on a combination of the power of self-organising maps and the visual inspection of a huge number of spectra. Self-organising maps were applied to nearly 10^5 spectra classified as quasars by the SDSS pipeline. Particular attention was paid to minimise possible contamination by rare peculiar stellar spectral types. We present a catalogue of 1005 quasars with unusual spectra. This large sample provides a useful resource for both studying properties and relations of/between different types of unusual quasars and selecting particularly interesting objects. The spectra are grouped into six types. All these types turn out to be on average more luminous than comparison samples of normal quasars after a statistical correction is made for intrinsic reddening. Both the unusual broad absorption line (BAL) quasars and the strong iron emitters have significantly lower radio luminosities than normal quasars. We also confirm that strong BALs avoid the most radio-luminous quasars. Finally, we create a sample of quasars similar to the two "mysterious" objects discovered by Hall et al. (2002) and briefly discuss the quasar properties and possible explanations of their highly peculiar spectra. (Abstract modified to match the arXiv format)Comment: Added reference to section 6; a few typos corrected; corrections according to the version published in Astronomy and Astrophysic

    Signal transduction-related responses to phytohormones and environmental challenges in sugarcane

    Get PDF
    BACKGROUND: Sugarcane is an increasingly economically and environmentally important C4 grass, used for the production of sugar and bioethanol, a low-carbon emission fuel. Sugarcane originated from crosses of Saccharum species and is noted for its unique capacity to accumulate high amounts of sucrose in its stems. Environmental stresses limit enormously sugarcane productivity worldwide. To investigate transcriptome changes in response to environmental inputs that alter yield we used cDNA microarrays to profile expression of 1,545 genes in plants submitted to drought, phosphate starvation, herbivory and N(2)-fixing endophytic bacteria. We also investigated the response to phytohormones (abscisic acid and methyl jasmonate). The arrayed elements correspond mostly to genes involved in signal transduction, hormone biosynthesis, transcription factors, novel genes and genes corresponding to unknown proteins. RESULTS: Adopting an outliers searching method 179 genes with strikingly different expression levels were identified as differentially expressed in at least one of the treatments analysed. Self Organizing Maps were used to cluster the expression profiles of 695 genes that showed a highly correlated expression pattern among replicates. The expression data for 22 genes was evaluated for 36 experimental data points by quantitative RT-PCR indicating a validation rate of 80.5% using three biological experimental replicates. The SUCAST Database was created that provides public access to the data described in this work, linked to tissue expression profiling and the SUCAST gene category and sequence analysis. The SUCAST database also includes a categorization of the sugarcane kinome based on a phylogenetic grouping that included 182 undefined kinases. CONCLUSION: An extensive study on the sugarcane transcriptome was performed. Sugarcane genes responsive to phytohormones and to challenges sugarcane commonly deals with in the field were identified. Additionally, the protein kinases were annotated based on a phylogenetic approach. The experimental design and statistical analysis applied proved robust to unravel genes associated with a diverse array of conditions attributing novel functions to previously unknown or undefined genes. The data consolidated in the SUCAST database resource can guide further studies and be useful for the development of improved sugarcane varieties
    • 

    corecore