1,844 research outputs found
Radio Galaxy Zoo: Knowledge Transfer Using Rotationally Invariant Self-Organising Maps
With the advent of large scale surveys the manual analysis and classification
of individual radio source morphologies is rendered impossible as existing
approaches do not scale. The analysis of complex morphological features in the
spatial domain is a particularly important task. Here we discuss the challenges
of transferring crowdsourced labels obtained from the Radio Galaxy Zoo project
and introduce a proper transfer mechanism via quantile random forest
regression. By using parallelized rotation and flipping invariant Kohonen-maps,
image cubes of Radio Galaxy Zoo selected galaxies formed from the FIRST radio
continuum and WISE infrared all sky surveys are first projected down to a
two-dimensional embedding in an unsupervised way. This embedding can be seen as
a discretised space of shapes with the coordinates reflecting morphological
features as expressed by the automatically derived prototypes. We find that
these prototypes have reconstructed physically meaningful processes across two
channel images at radio and infrared wavelengths in an unsupervised manner. In
the second step, images are compared with those prototypes to create a
heat-map, which is the morphological fingerprint of each object and the basis
for transferring the user generated labels. These heat-maps have reduced the
feature space by a factor of 248 and are able to be used as the basis for
subsequent ML methods. Using an ensemble of decision trees we achieve upwards
of 85.7% and 80.7% accuracy when predicting the number of components and peaks
in an image, respectively, using these heat-maps. We also question the
currently used discrete classification schema and introduce a continuous scale
that better reflects the uncertainty in transition between two classes, caused
by sensitivity and resolution limits
K2 Variable Catalogue II: Machine Learning Classification of Variable Stars and Eclipsing Binaries in K2 Fields 0-4
We are entering an era of unprecedented quantities of data from current and
planned survey telescopes. To maximise the potential of such surveys, automated
data analysis techniques are required. Here we implement a new methodology for
variable star classification, through the combination of Kohonen Self
Organising Maps (SOM, an unsupervised machine learning algorithm) and the more
common Random Forest (RF) supervised machine learning technique. We apply this
method to data from the K2 mission fields 0-4, finding 154 ab-type RR Lyraes
(10 newly discovered), 377 Delta Scuti pulsators, 133 Gamma Doradus pulsators,
183 detached eclipsing binaries, 290 semi-detached or contact eclipsing
binaries and 9399 other periodic (mostly spot-modulated) sources, once class
significance cuts are taken into account. We present lightcurve features for
all K2 stellar targets, including their three strongest detected frequencies,
which can be used to study stellar rotation periods where the observed
variability arises from spot modulation. The resulting catalogue of variable
stars, classes, and associated data features are made available online. We
publish our SOM code in Python as part of the open source PyMVPA package, which
in combination with already available RF modules can be easily used to recreate
the method.Comment: Accepted for publication in MNRAS, 16 pages, 13 figures. Updated with
proof corrections. Full catalogue tables available at
https://www2.warwick.ac.uk/fac/sci/physics/research/astro/people/armstrong/
or at the CD
Organised Randoms: learning and correcting for systematic galaxy clustering patterns in KiDS using self-organising maps
We present a new method for the mitigation of observational systematic
effects in angular galaxy clustering via corrective random galaxy catalogues.
Real and synthetic galaxy data, from the Kilo Degree Survey's (KiDS)
4 Data Release (KiDS-) and the Full-sky Lognormal
Astro-fields Simulation Kit (FLASK) package respectively, are used to train
self-organising maps (SOMs) to learn the multivariate relationships between
observed galaxy number density and up to six systematic-tracer variables,
including seeing, Galactic dust extinction, and Galactic stellar density. We
then create `organised' randoms, i.e. random galaxy catalogues with spatially
variable number densities, mimicking the learnt systematic density modes in the
data. Using realistically biased mock data, we show that these organised
randoms consistently subtract spurious density modes from the two-point angular
correlation function , correcting biases of up to in
the mean clustering amplitude to as low as , over a high
signal-to-noise angular range of 7-100 arcmin. Their performance is also
validated for angular clustering cross-correlations in a bright, flux-limited
subset of KiDS-, comparing against an analogous sample constructed from
highly-complete spectroscopic redshift data. Each organised random catalogue
object is a `clone' carrying the properties of a real galaxy, and is
distributed throughout the survey footprint according to the parent galaxy's
position in systematics-space. Thus, sub-sample randoms are readily derived
from a single master random catalogue via the same selection as applied to the
real galaxies. Our method is expected to improve in performance with increased
survey area, galaxy number density, and systematic contamination, making
organised randoms extremely promising for current and future clustering
analyses of faint samples.Comment: 18 pages (6 appendix pages), 12 figures (8 appendix figures),
submitted to A&
Unveiling the rarest morphologies of the LOFAR Two-metre Sky Survey radio source population with self-organised maps
Context. The Low Frequency Array (LOFAR) Two-metre Sky Survey (LoTSS) is a low-frequency radiocontinuum survey of the Northern sky at an unparalleled resolution and sensitivity. Aims. In order to fully exploit this huge dataset and those produced by the Square Kilometre Array in the next decade, automated methods in machine learning and data-mining will be increasingly essential both for morphological classifications and for identifying optical counterparts to the radio sources. Methods. Using self-organising maps (SOMs), a form of unsupervised machine learning, we created a dimensionality reduction of the radio morphologies for the âŒ25k extended radio continuum sources in the LoTSS first data release, which is only âŒ2 percent of the final LoTSS survey. We made use of PINK, a code which extends the SOM algorithm with rotation and flipping invariance, increasing its suitability and effectiveness for training on astronomical sources. Results. After training, the SOMs can be used for a wide range of science exploitation and we present an illustration of their potential by finding an arbitrary number of morphologically rare sources in our training data (424 square degrees) and subsequently in an area of the sky (âŒ5300 square degrees) outside the trainingdata. Objects found in this way span a wide range of morphological and physical categories: extended jets of radio active galactic nuclei, diffuse cluster haloes and relics, and nearby spiral galaxies. Finally, to enable accessible, interactive, and intuitive data exploration, we showcase the LOFAR-PyBDSF Visualisation Tool, which allows users to explore the LoTSS dataset through the trained SOMs
Redshift distributions of extragalactic galaxy surveys
Measurements of the large-scale structure of the universe are key observables to study fundamental physics. This thesis focuses on the calibration of photometric redshift distributions of extragalactic galaxy surveys and their impact on the analysis of large-scale structure data.
First, I examine the impact of redshift distribution uncertainties on the cosmological inference from weak lensing measurements. The weak gravitational lensing effect, known as cosmic shear, distorts the shape of galaxy images due to the distribution of gravitating matter along the line sight. Thus, it provides a probe of the matter distribution in the universe. However, modelling the observed cosmic shear signal requires knowledge about the distribution of observed galaxies along the line of sight, which is usually determined through photometric redshifts. I develop a method that accurately propagates residual redshift distribution uncertainties into the weak lensing likelihood and perform a self-calibration of the redshift distribution with cosmic shear data.
Second, I develop a new method to assign photometrically observed galaxies to tomographic redshift bins. The goal is to obtain compact distributions and to reduce the overlap between redshift bins caused by catastrophic outliers in the photometric redshift estimation. This is achieved by combining a self-organising map with a simulated annealing algorithm which optimises the clustering cross-correlation signal between a photometric galaxy catalogue and a spectroscopically observed sample of reference galaxies.
Finally, I perform consistency tests in cosmological analyses. These tests include a study of the consistency between the constraints on cosmological parameters probed by the five tomographic bins of the Kilo-Degree Survey. Furthermore, I study the internal consistency of the ÎCDM model by dividing the model into regimes: one that describes the evolution of the isotropic background of the universe and one describing matter density perturbations. This model is constrained by cosmic shear, galaxy clustering, and cosmic microwave background measurements
Unusual quasars from the Sloan Digital Sky Survey selected by means of Kohonen self-organising maps
We exploit the spectral archive of the Sloan Digital Sky Survey (SDSS) Data
Release 7 to select unusual quasar spectra. The selection method is based on a
combination of the power of self-organising maps and the visual inspection of a
huge number of spectra. Self-organising maps were applied to nearly 10^5
spectra classified as quasars by the SDSS pipeline. Particular attention was
paid to minimise possible contamination by rare peculiar stellar spectral
types. We present a catalogue of 1005 quasars with unusual spectra. This large
sample provides a useful resource for both studying properties and relations
of/between different types of unusual quasars and selecting particularly
interesting objects. The spectra are grouped into six types. All these types
turn out to be on average more luminous than comparison samples of normal
quasars after a statistical correction is made for intrinsic reddening. Both
the unusual broad absorption line (BAL) quasars and the strong iron emitters
have significantly lower radio luminosities than normal quasars. We also
confirm that strong BALs avoid the most radio-luminous quasars. Finally, we
create a sample of quasars similar to the two "mysterious" objects discovered
by Hall et al. (2002) and briefly discuss the quasar properties and possible
explanations of their highly peculiar spectra. (Abstract modified to match the
arXiv format)Comment: Added reference to section 6; a few typos corrected; corrections
according to the version published in Astronomy and Astrophysic
Signal transduction-related responses to phytohormones and environmental challenges in sugarcane
BACKGROUND: Sugarcane is an increasingly economically and environmentally important C4 grass, used for the production of sugar and bioethanol, a low-carbon emission fuel. Sugarcane originated from crosses of Saccharum species and is noted for its unique capacity to accumulate high amounts of sucrose in its stems. Environmental stresses limit enormously sugarcane productivity worldwide. To investigate transcriptome changes in response to environmental inputs that alter yield we used cDNA microarrays to profile expression of 1,545 genes in plants submitted to drought, phosphate starvation, herbivory and N(2)-fixing endophytic bacteria. We also investigated the response to phytohormones (abscisic acid and methyl jasmonate). The arrayed elements correspond mostly to genes involved in signal transduction, hormone biosynthesis, transcription factors, novel genes and genes corresponding to unknown proteins. RESULTS: Adopting an outliers searching method 179 genes with strikingly different expression levels were identified as differentially expressed in at least one of the treatments analysed. Self Organizing Maps were used to cluster the expression profiles of 695 genes that showed a highly correlated expression pattern among replicates. The expression data for 22 genes was evaluated for 36 experimental data points by quantitative RT-PCR indicating a validation rate of 80.5% using three biological experimental replicates. The SUCAST Database was created that provides public access to the data described in this work, linked to tissue expression profiling and the SUCAST gene category and sequence analysis. The SUCAST database also includes a categorization of the sugarcane kinome based on a phylogenetic grouping that included 182 undefined kinases. CONCLUSION: An extensive study on the sugarcane transcriptome was performed. Sugarcane genes responsive to phytohormones and to challenges sugarcane commonly deals with in the field were identified. Additionally, the protein kinases were annotated based on a phylogenetic approach. The experimental design and statistical analysis applied proved robust to unravel genes associated with a diverse array of conditions attributing novel functions to previously unknown or undefined genes. The data consolidated in the SUCAST database resource can guide further studies and be useful for the development of improved sugarcane varieties
- âŠ