39 research outputs found

    Estimating Spectroscopic Redshifts by Using k Nearest Neighbors Regression I. Description of Method and Analysis

    Full text link
    Context: In astronomy, new approaches to process and analyze the exponentially increasing amount of data are inevitable. While classical approaches (e.g. template fitting) are fine for objects of well-known classes, alternative techniques have to be developed to determine those that do not fit. Therefore a classification scheme should be based on individual properties instead of fitting to a global model and therefore loose valuable information. An important issue when dealing with large data sets is the outlier detection which at the moment is often treated problem-orientated. Aims: In this paper we present a method to statistically estimate the redshift z based on a similarity approach. This allows us to determine redshifts in spectra in emission as well as in absorption without using any predefined model. Additionally we show how an estimate of the redshift based on single features is possible. As a consequence we are e.g. able to filter objects which show multiple redshift components. We propose to apply this general method to all similar problems in order to identify objects where traditional approaches fail. Methods: The redshift estimation is performed by comparing predefined regions in the spectra and applying a k nearest neighbor regression model for every predefined emission and absorption region, individually. Results: We estimated a redshift for more than 50% of the analyzed 16,000 spectra of our reference and test sample. The redshift estimate yields a precision for every individually tested feature that is comparable with the overall precision of the redshifts of SDSS. In 14 spectra we find a significant shift between emission and absorption or emission and emission lines. The results show already the immense power of this simple machine learning approach for investigating huge databases such as the SDSS.Comment: accepted for publication in A&

    Radio Galaxy Zoo: Knowledge Transfer Using Rotationally Invariant Self-Organising Maps

    Full text link
    With the advent of large scale surveys the manual analysis and classification of individual radio source morphologies is rendered impossible as existing approaches do not scale. The analysis of complex morphological features in the spatial domain is a particularly important task. Here we discuss the challenges of transferring crowdsourced labels obtained from the Radio Galaxy Zoo project and introduce a proper transfer mechanism via quantile random forest regression. By using parallelized rotation and flipping invariant Kohonen-maps, image cubes of Radio Galaxy Zoo selected galaxies formed from the FIRST radio continuum and WISE infrared all sky surveys are first projected down to a two-dimensional embedding in an unsupervised way. This embedding can be seen as a discretised space of shapes with the coordinates reflecting morphological features as expressed by the automatically derived prototypes. We find that these prototypes have reconstructed physically meaningful processes across two channel images at radio and infrared wavelengths in an unsupervised manner. In the second step, images are compared with those prototypes to create a heat-map, which is the morphological fingerprint of each object and the basis for transferring the user generated labels. These heat-maps have reduced the feature space by a factor of 248 and are able to be used as the basis for subsequent ML methods. Using an ensemble of decision trees we achieve upwards of 85.7% and 80.7% accuracy when predicting the number of components and peaks in an image, respectively, using these heat-maps. We also question the currently used discrete classification schema and introduce a continuous scale that better reflects the uncertainty in transition between two classes, caused by sensitivity and resolution limits

    Bigger Buffer k-d Trees on Multi-Many-Core Systems

    Get PDF
    A buffer k-d tree is a k-d tree variant for massively-parallel nearest neighbor search. While providing valuable speed-ups on modern many-core devices in case both a large number of reference and query points are given, buffer k-d trees are limited by the amount of points that can fit on a single device. In this work, we show how to modify the original data structure and the associated workflow to make the overall approach capable of dealing with massive data sets. We further provide a simple yet efficient way of using multiple devices given in a single workstation. The applicability of the modified framework is demonstrated in the context of astronomy, a field that is faced with huge amounts of data

    Cataloging the radio-sky with unsupervised machine learning: a new approach for the SKA era

    Full text link
    We develop a new analysis approach towards identifying related radio components and their corresponding infrared host galaxy based on unsupervised machine learning methods. By exploiting PINK, a self-organising map algorithm, we are able to associate radio and infrared sources without the a priori requirement of training labels. We present an example of this method using 894,415894,415 images from the FIRST and WISE surveys centred towards positions described by the FIRST catalogue. We produce a set of catalogues that complement FIRST and describe 802,646 objects, including their radio components and their corresponding AllWISE infrared host galaxy. Using these data products we (i) demonstrate the ability to identify objects with rare and unique radio morphologies (e.g. 'X'-shaped galaxies, hybrid FR-I/FR-II morphologies), (ii) can identify the potentially resolved radio components that are associated with a single infrared host and (iii) introduce a "curliness" statistic to search for bent and disturbed radio morphologies, and (iv) extract a set of 17 giant radio galaxies between 700-1100 kpc. As we require no training labels, our method can be applied to any radio-continuum survey, provided a sufficiently representative SOM can be trained

    A Comparison of Photometric Redshift Techniques for Large Radio Surveys

    Get PDF
    Future radio surveys will generate catalogs of tens of millions of radio sources, for which redshift estimates will be essential to achieve many of the science goals. However, spectroscopic data will be available for only a small fraction of these sources, and in most cases even the optical and infrared photometry will be of limited quality. Furthermore, radio sources tend to be at higher redshift than most optical sources (most radio surveys have a median redshift greater than 1) and so a significant fraction of radio sources hosts differ from those for which most photometric redshift templates are designed. We therefore need to develop new techniques for estimating the redshifts of radio sources. As a starting point in this process, we evaluate a number of machine-learning techniques for estimating redshift, together with a conventional template-fitting technique. We pay special attention to how the performance is affected by the incompleteness of the training sample and by sparseness of the parameter space or by limited availability of ancillary multiwavelength data. As expected, we find that the quality of the photometric-redshift degrades as the quality of the photometry decreases, but that even with the limited quality of photometry available for all-sky-surveys, useful redshift information is available for the majority of sources, particularly at low redshift. We find that a template-fitting technique performs best in the presence of high-quality and almost complete multi-band photometry, especially if radio sources that are also X-ray emitting are treated separately, using specific templates and priors. When we reduced the quality of photometry to match that available for the EMU all-sky radio survey, the quality of the template-fitting degraded and became comparable to some of the machine-learning methods. Machine learning techniques currently perform better at low redshift than at high redshift, because of incompleteness of the currently available training data at high redshifts

    Unveiling the rarest morphologies of the LOFAR Two-metre Sky Survey radio source population with self-organised maps

    Get PDF
    Context. The Low Frequency Array (LOFAR) Two-metre Sky Survey (LoTSS) is a low-frequency radiocontinuum survey of the Northern sky at an unparalleled resolution and sensitivity. Aims. In order to fully exploit this huge dataset and those produced by the Square Kilometre Array in the next decade, automated methods in machine learning and data-mining will be increasingly essential both for morphological classifications and for identifying optical counterparts to the radio sources. Methods. Using self-organising maps (SOMs), a form of unsupervised machine learning, we created a dimensionality reduction of the radio morphologies for the ∼25k extended radio continuum sources in the LoTSS first data release, which is only ∼2 percent of the final LoTSS survey. We made use of PINK, a code which extends the SOM algorithm with rotation and flipping invariance, increasing its suitability and effectiveness for training on astronomical sources. Results. After training, the SOMs can be used for a wide range of science exploitation and we present an illustration of their potential by finding an arbitrary number of morphologically rare sources in our training data (424 square degrees) and subsequently in an area of the sky (∼5300 square degrees) outside the trainingdata. Objects found in this way span a wide range of morphological and physical categories: extended jets of radio active galactic nuclei, diffuse cluster haloes and relics, and nearby spiral galaxies. Finally, to enable accessible, interactive, and intuitive data exploration, we showcase the LOFAR-PyBDSF Visualisation Tool, which allows users to explore the LoTSS dataset through the trained SOMs

    A comparison of photometric redshift techniques for large radio surveys

    Get PDF
    Future radio surveys will generate catalogs of tens of millions of radio sources, for which redshift estimates will be essential to achieve many of the science goals. However, spectroscopic data will be available for only a small fraction of these sources, and in most cases even the optical and infrared photometry will be of limited quality. Furthermore, radio sources tend to be at higher redshift than most optical sources (most radio surveys have a median redshift greater than 1) and so a significant fraction of radio sources hosts differ from those for which most photometric redshift templates are designed. We therefore need to develop new techniques for estimating the redshifts of radio sources. As a starting point in this process, we evaluate a number of machine-learning techniques for estimating redshift, together with a conventional template-fitting technique. We pay special attention to how the performance is affected by the incompleteness of the training sample and by sparseness of the parameter space or by limited availability of ancillary multiwavelength data. As expected, we find that the quality of the photometric-redshift degrades as the quality of the photometry decreases, but that even with the limited quality of photometry available for all-sky-surveys, useful redshift information is available for the majority of sources, particularly at low redshift. We find that a template-fitting technique performs best in the presence of high-quality and almost complete multi-band photometry, especially if radio sources that are also X-ray emitting are treated separately, using specific templates and priors. When we reduced the quality of photometry to match that available for the EMU all-sky radio survey, the quality of the template-fitting degraded and became comparable to some of the machine-learning methods. Machine learning techniques currently perform better at low redshift than at high redshift, because of incompleteness of the currently available training data at high redshifts

    Black Hole Mass Estimates Based on CIV are Consistent with Those Based on the Balmer Lines

    Full text link
    Using a sample of high-redshift lensed quasars from the CASTLES project with observed-frame ultraviolet or optical and near-infrared spectra, we have searched for possible biases between supermassive black hole (BH) mass estimates based on the CIV, Halpha and Hbeta broad emission lines. Our sample is based upon that of Greene, Peng & Ludwig, expanded with new near-IR spectroscopic observations, consistently analyzed high S/N optical spectra, and consistent continuum luminosity estimates at 5100A. We find that BH mass estimates based on the FWHM of CIV show a systematic offset with respect to those obtained from the line dispersion, sigma_l, of the same emission line, but not with those obtained from the FWHM of Halpha and Hbeta. The magnitude of the offset depends on the treatment of the HeII and FeII emission blended with CIV, but there is little scatter for any fixed measurement prescription. While we otherwise find no systematic offsets between CIV and Balmer line mass estimates, we do find that the residuals between them are strongly correlated with the ratio of the UV and optical continuum luminosities. Removing this dependency reduces the scatter between the UV- and optical-based BH mass estimates by a factor of approximately 2, from roughly 0.35 to 0.18 dex. The dispersion is smallest when comparing the CIV sigma_l mass estimate, after removing the offset from the FWHM estimates, and either Balmer line mass estimate. The correlation with the continuum slope is likely due to a combination of reddening, host contamination and object-dependent SED shapes. When we add additional heterogeneous measurements from the literature, the results are unchanged.Comment: Accepted for publication in The Astrophysical Journal. 37 text pages + 8 tables + 23 figures. Updated with comments by the referee and with a expanded discussion on literature data including new observation
    corecore