39 research outputs found
Estimating Spectroscopic Redshifts by Using k Nearest Neighbors Regression I. Description of Method and Analysis
Context: In astronomy, new approaches to process and analyze the
exponentially increasing amount of data are inevitable. While classical
approaches (e.g. template fitting) are fine for objects of well-known classes,
alternative techniques have to be developed to determine those that do not fit.
Therefore a classification scheme should be based on individual properties
instead of fitting to a global model and therefore loose valuable information.
An important issue when dealing with large data sets is the outlier detection
which at the moment is often treated problem-orientated. Aims: In this paper we
present a method to statistically estimate the redshift z based on a similarity
approach. This allows us to determine redshifts in spectra in emission as well
as in absorption without using any predefined model. Additionally we show how
an estimate of the redshift based on single features is possible. As a
consequence we are e.g. able to filter objects which show multiple redshift
components. We propose to apply this general method to all similar problems in
order to identify objects where traditional approaches fail. Methods: The
redshift estimation is performed by comparing predefined regions in the spectra
and applying a k nearest neighbor regression model for every predefined
emission and absorption region, individually. Results: We estimated a redshift
for more than 50% of the analyzed 16,000 spectra of our reference and test
sample. The redshift estimate yields a precision for every individually tested
feature that is comparable with the overall precision of the redshifts of SDSS.
In 14 spectra we find a significant shift between emission and absorption or
emission and emission lines. The results show already the immense power of this
simple machine learning approach for investigating huge databases such as the
SDSS.Comment: accepted for publication in A&
Radio Galaxy Zoo: Knowledge Transfer Using Rotationally Invariant Self-Organising Maps
With the advent of large scale surveys the manual analysis and classification
of individual radio source morphologies is rendered impossible as existing
approaches do not scale. The analysis of complex morphological features in the
spatial domain is a particularly important task. Here we discuss the challenges
of transferring crowdsourced labels obtained from the Radio Galaxy Zoo project
and introduce a proper transfer mechanism via quantile random forest
regression. By using parallelized rotation and flipping invariant Kohonen-maps,
image cubes of Radio Galaxy Zoo selected galaxies formed from the FIRST radio
continuum and WISE infrared all sky surveys are first projected down to a
two-dimensional embedding in an unsupervised way. This embedding can be seen as
a discretised space of shapes with the coordinates reflecting morphological
features as expressed by the automatically derived prototypes. We find that
these prototypes have reconstructed physically meaningful processes across two
channel images at radio and infrared wavelengths in an unsupervised manner. In
the second step, images are compared with those prototypes to create a
heat-map, which is the morphological fingerprint of each object and the basis
for transferring the user generated labels. These heat-maps have reduced the
feature space by a factor of 248 and are able to be used as the basis for
subsequent ML methods. Using an ensemble of decision trees we achieve upwards
of 85.7% and 80.7% accuracy when predicting the number of components and peaks
in an image, respectively, using these heat-maps. We also question the
currently used discrete classification schema and introduce a continuous scale
that better reflects the uncertainty in transition between two classes, caused
by sensitivity and resolution limits
Bigger Buffer k-d Trees on Multi-Many-Core Systems
A buffer k-d tree is a k-d tree variant for massively-parallel nearest neighbor search. While providing valuable speed-ups on modern many-core devices in case both a large number of reference and query points are given, buffer k-d trees are limited by the amount of points that can fit on a single device. In this work, we show how to modify the original data structure and the associated workflow to make the overall approach capable of dealing with massive data sets. We further provide a simple yet efficient way of using multiple devices given in a single workstation. The applicability of the modified framework is demonstrated in the context of astronomy, a field that is faced with huge amounts of data
Cataloging the radio-sky with unsupervised machine learning: a new approach for the SKA era
We develop a new analysis approach towards identifying related radio
components and their corresponding infrared host galaxy based on unsupervised
machine learning methods. By exploiting PINK, a self-organising map algorithm,
we are able to associate radio and infrared sources without the a priori
requirement of training labels. We present an example of this method using
images from the FIRST and WISE surveys centred towards positions
described by the FIRST catalogue. We produce a set of catalogues that
complement FIRST and describe 802,646 objects, including their radio components
and their corresponding AllWISE infrared host galaxy. Using these data products
we (i) demonstrate the ability to identify objects with rare and unique radio
morphologies (e.g. 'X'-shaped galaxies, hybrid FR-I/FR-II morphologies), (ii)
can identify the potentially resolved radio components that are associated with
a single infrared host and (iii) introduce a "curliness" statistic to search
for bent and disturbed radio morphologies, and (iv) extract a set of 17 giant
radio galaxies between 700-1100 kpc. As we require no training labels, our
method can be applied to any radio-continuum survey, provided a sufficiently
representative SOM can be trained
A Comparison of Photometric Redshift Techniques for Large Radio Surveys
Future radio surveys will generate catalogs of tens of millions of radio sources, for which redshift estimates will be essential to achieve many of the science goals. However, spectroscopic data will be available for only a small fraction of these sources, and in most cases even the optical and infrared photometry will be of limited quality. Furthermore, radio sources tend to be at higher redshift than most optical sources (most radio surveys have a median redshift greater than 1) and so a significant fraction of radio sources hosts differ from those for which most photometric redshift templates are designed. We therefore need to develop new techniques for estimating the redshifts of radio sources. As a starting point in this process, we evaluate a number of machine-learning techniques for estimating redshift, together with a conventional template-fitting technique. We pay special attention to how the performance is affected by the incompleteness of the training sample and by sparseness of the parameter space or by limited availability of ancillary multiwavelength data. As expected, we find that the quality of the photometric-redshift degrades as the quality of the photometry decreases, but that even with the limited quality of photometry available for all-sky-surveys, useful redshift information is available for the majority of sources, particularly at low redshift. We find that a template-fitting technique performs best in the presence of high-quality and almost complete multi-band photometry, especially if radio sources that are also X-ray emitting are treated separately, using specific templates and priors. When we reduced the quality of photometry to match that available for the EMU all-sky radio survey, the quality of the template-fitting degraded and became comparable to some of the machine-learning methods. Machine learning techniques currently perform better at low redshift than at high redshift, because of incompleteness of the currently available training data at high redshifts
Unveiling the rarest morphologies of the LOFAR Two-metre Sky Survey radio source population with self-organised maps
Context. The Low Frequency Array (LOFAR) Two-metre Sky Survey (LoTSS) is a low-frequency radiocontinuum survey of the Northern sky at an unparalleled resolution and sensitivity. Aims. In order to fully exploit this huge dataset and those produced by the Square Kilometre Array in the next decade, automated methods in machine learning and data-mining will be increasingly essential both for morphological classifications and for identifying optical counterparts to the radio sources. Methods. Using self-organising maps (SOMs), a form of unsupervised machine learning, we created a dimensionality reduction of the radio morphologies for the ∼25k extended radio continuum sources in the LoTSS first data release, which is only ∼2 percent of the final LoTSS survey. We made use of PINK, a code which extends the SOM algorithm with rotation and flipping invariance, increasing its suitability and effectiveness for training on astronomical sources. Results. After training, the SOMs can be used for a wide range of science exploitation and we present an illustration of their potential by finding an arbitrary number of morphologically rare sources in our training data (424 square degrees) and subsequently in an area of the sky (∼5300 square degrees) outside the trainingdata. Objects found in this way span a wide range of morphological and physical categories: extended jets of radio active galactic nuclei, diffuse cluster haloes and relics, and nearby spiral galaxies. Finally, to enable accessible, interactive, and intuitive data exploration, we showcase the LOFAR-PyBDSF Visualisation Tool, which allows users to explore the LoTSS dataset through the trained SOMs
A comparison of photometric redshift techniques for large radio surveys
Future radio surveys will generate catalogs of tens of millions of radio sources, for which redshift estimates will be essential to achieve many of the science goals. However, spectroscopic data will be available for only a small fraction of these sources, and in most cases even the optical and infrared photometry will be of limited quality. Furthermore, radio sources tend to be at higher redshift than most optical sources (most radio surveys have a median redshift greater than 1) and so a significant fraction of radio sources hosts differ from those for which most photometric redshift templates are designed. We therefore need to develop new techniques for estimating the redshifts of radio sources. As a starting point in this process, we evaluate a number of machine-learning techniques for estimating redshift, together with a conventional template-fitting technique. We pay special attention to how the performance is affected by the incompleteness of the training sample and by sparseness of the parameter space or by limited availability of ancillary multiwavelength data. As expected, we find that the quality of the photometric-redshift degrades as the quality of the photometry decreases, but that even with the limited quality of photometry available for all-sky-surveys, useful redshift information is available for the majority of sources, particularly at low redshift. We find that a template-fitting technique performs best in the presence of high-quality and almost complete multi-band photometry, especially if radio sources that are also X-ray emitting are treated separately, using specific templates and priors. When we reduced the quality of photometry to match that available for the EMU all-sky radio survey, the quality of the template-fitting degraded and became comparable to some of the machine-learning methods. Machine learning techniques currently perform better at low redshift than at high redshift, because of incompleteness of the currently available training data at high redshifts
Black Hole Mass Estimates Based on CIV are Consistent with Those Based on the Balmer Lines
Using a sample of high-redshift lensed quasars from the CASTLES project with
observed-frame ultraviolet or optical and near-infrared spectra, we have
searched for possible biases between supermassive black hole (BH) mass
estimates based on the CIV, Halpha and Hbeta broad emission lines. Our sample
is based upon that of Greene, Peng & Ludwig, expanded with new near-IR
spectroscopic observations, consistently analyzed high S/N optical spectra, and
consistent continuum luminosity estimates at 5100A. We find that BH mass
estimates based on the FWHM of CIV show a systematic offset with respect to
those obtained from the line dispersion, sigma_l, of the same emission line,
but not with those obtained from the FWHM of Halpha and Hbeta. The magnitude of
the offset depends on the treatment of the HeII and FeII emission blended with
CIV, but there is little scatter for any fixed measurement prescription. While
we otherwise find no systematic offsets between CIV and Balmer line mass
estimates, we do find that the residuals between them are strongly correlated
with the ratio of the UV and optical continuum luminosities. Removing this
dependency reduces the scatter between the UV- and optical-based BH mass
estimates by a factor of approximately 2, from roughly 0.35 to 0.18 dex. The
dispersion is smallest when comparing the CIV sigma_l mass estimate, after
removing the offset from the FWHM estimates, and either Balmer line mass
estimate. The correlation with the continuum slope is likely due to a
combination of reddening, host contamination and object-dependent SED shapes.
When we add additional heterogeneous measurements from the literature, the
results are unchanged.Comment: Accepted for publication in The Astrophysical Journal. 37 text pages
+ 8 tables + 23 figures. Updated with comments by the referee and with a
expanded discussion on literature data including new observation