1,088 research outputs found

    On the selection of dimension reduction techniques for scientific applications

    Get PDF
    Many dimension reduction methods have been proposed to discover the intrinsic, lower dimensional structure of a high-dimensional dataset. However, determining critical features in datasets that consist of a large number of features is still a challenge. In this paper, through a series of carefully designed experiments on real-world datasets, we investigate the performance of different dimension reduction techniques, ranging from feature subset selection to methods that transform the features into a lower dimensional space. We also discuss methods that calculate the intrinsic dimensionality of a dataset in order to understand the reduced dimension. Using several evaluation strategies, we show how these different methods can provide useful insights into the data. These comparisons enable us to provide guidance to a user on the selection of a technique for their dataset

    Data Mining and Machine Learning in Astronomy

    Full text link
    We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

    Unveiling the rarest morphologies of the LOFAR Two-metre Sky Survey radio source population with self-organised maps

    Get PDF
    Context. The Low Frequency Array (LOFAR) Two-metre Sky Survey (LoTSS) is a low-frequency radiocontinuum survey of the Northern sky at an unparalleled resolution and sensitivity. Aims. In order to fully exploit this huge dataset and those produced by the Square Kilometre Array in the next decade, automated methods in machine learning and data-mining will be increasingly essential both for morphological classifications and for identifying optical counterparts to the radio sources. Methods. Using self-organising maps (SOMs), a form of unsupervised machine learning, we created a dimensionality reduction of the radio morphologies for the ∼25k extended radio continuum sources in the LoTSS first data release, which is only ∼2 percent of the final LoTSS survey. We made use of PINK, a code which extends the SOM algorithm with rotation and flipping invariance, increasing its suitability and effectiveness for training on astronomical sources. Results. After training, the SOMs can be used for a wide range of science exploitation and we present an illustration of their potential by finding an arbitrary number of morphologically rare sources in our training data (424 square degrees) and subsequently in an area of the sky (∼5300 square degrees) outside the trainingdata. Objects found in this way span a wide range of morphological and physical categories: extended jets of radio active galactic nuclei, diffuse cluster haloes and relics, and nearby spiral galaxies. Finally, to enable accessible, interactive, and intuitive data exploration, we showcase the LOFAR-PyBDSF Visualisation Tool, which allows users to explore the LoTSS dataset through the trained SOMs

    Radio Galaxy Zoo: CLARAN - A deep learning classifier for radio morphologies

    Get PDF
    The upcoming next-generation large area radio continuum surveys can expect tens of millions of radio sources, rendering the traditional method for radio morphology classification through visual inspection unfeasible.We present CLARAN-Classifying Radio sources Automatically with Neural networks - a proof-of-concept radio source morphology classifier based upon the Faster Region-based Convolutional Neutral Networks method. Specifically, we train and test CLARAN on the FIRST and WISE (Wide-field Infrared Survey Explorer) images from the Radio Galaxy Zoo Data Release 1 catalogue. CLARAN provides end users with automated identification of radio source morphology classifications from a simple input of a radio image and a counterpart infrared image of the same region. CLARAN is the first open-source, endto- end radio source morphology classifier that is capable of locating and associating discrete and extended components of radio sources in a fast (<200 ms per image) and accurate (=90 per cent) fashion. Future work will improve CLARAN's relatively lower success rates in dealing with multisource fields and will enable CLARAN to identify sources on much larger fields without loss in classification accuracy

    Unveiling the rarest morphologies of the LOFAR Two-metre Sky Survey radio source population with self-organised maps

    Get PDF
    Context. The Low Frequency Array (LOFAR) Two-metre Sky Survey (LoTSS) is a low-frequency radio continuum survey of the Northern sky at an unparalleled resolution and sensitivity. Aims. In order to fully exploit this huge dataset and those produced by the Square Kilometre Array in the next decade, automated methods in machine learning and data-mining will be increasingly essential both for morphological classifications and for identifying optical counterparts to the radio sources. Methods. Using self-organising maps (SOMs), a form of unsupervised machine learning, we created a dimensionality reduction of the radio morphologies for the ∼25k extended radio continuum sources in the LoTSS first data release, which is only ∼2 percent of the final LoTSS survey. We made use of PINK, a code which extends the SOM algorithm with rotation and flipping invariance, increasing its suitability and effectiveness for training on astronomical sources. Results. After training, the SOMs can be used for a wide range of science exploitation and we present an illustration of their potential by finding an arbitrary number of morphologically rare sources in our training data (424 square degrees) and subsequently in an area of the sky (∼5300 square degrees) outside the training data. Objects found in this way span a wide range of morphological and physical categories: extended jets of radio active galactic nuclei, diffuse cluster haloes and relics, and nearby spiral galaxies. Finally, to enable accessible, interactive, and intuitive data exploration, we showcase the LOFAR-PyBDSF Visualisation Tool, which allows users to explore the LoTSS dataset through the trained SOMs

    A study of extended radio galaxies in the Shapley concentration core

    Get PDF
    Extended cluster radio galaxies show different morphologies com- pared to those found isolated in the field. Indeed, symmetric double radio galaxies are only a small percentage of the total content of ra- dio loud cluster galaxies, which show mainly tailed morphologies (e.g. O’Dea & Owen, 1985). Moreover, cluster mergers can deeply affect the statistical properties of their radio activity. In order to better understand the morphological and radio activity differences of the radio galaxies in major mergeing and non/tidal-merging clusters, we performed a multifrequency study of extended radio galax- ies inside two cluster complexes, A3528 and A3558. They belong to the innermost region of the Shapley Concentration, the most massive con- centration of galaxy clusters (termed supercluster) in the local Universe, at average redshift z ≈ 0.043. We analysed low frequency radio data performed at 235 and 610 MHz with Giant Metrewave Radio Telescope (GMRT) and we combined them with proprietary and literature observations, in order to have a wide frequency range (150 MHz to 8.4 GHz) to perform the spectral analysis. The low frequency images allowed us to carry out a detailed study of the radio tails and diffuse emission found in some cases. The results in the radio band were also qualitatively compared with the X-ray information coming from XMM-Newton observations, in order to test the interaction between radio galaxies and cluster weather. We found that the brightest central galaxies (BCGs) in the A3528 cluster complex are powerful and present substantial emission from old relativistic plasma characterized by a steep spectrum (α > 2). In the light of observational pieces of evidence, we suggest they are possible re-started radio galaxies. On the other hand, the tailed radio galaxies trace the host galaxy motion with respect to the ICM, and our find- ings is consistent with the dynamical interpretation of a tidal interaction (Gastaldello et al. 2003). On the contrary, the BCGs in the A3558 clus- ter complex are either quiet or very faint radio galaxies, supporting the hypothesis that clusters mergers quench the radio emission from AGN

    Unsupervised spectral classification of astronomical x-ray sources based on independent component analysis

    Get PDF
    By virtue of the sensitivity of the XMM-Newton and Chandra X-ray telescopes, astronomers are capable of probing increasingly faint X-ray sources in the universe. On the other hand, we have to face a tremendous amount of X-ray imaging data collected by these observatories. We developed an efficient framework to classify astronomical X-ray sources through natural grouping of their reduced dimensionality profiles, which can faithfully represent the high dimensional spectral information. X-ray imaging spectral extraction techniques, which use standard astronomical software (e.g., SAS, FTOOLS and CIAO), provide an efficient means to investigate multiple X-ray sources in one or more observations at the same time. After applying independent component analysis (ICA), the high-dimensional spectra can be expressed by reduced dimensionality profiles in an independent space. An infrared spectral data set obtained for the stars in the Large Magellanic Cloud,observed by the Spitzer Space Telescope Infrared Spectrograph, has been used to test the unsupervised classification algorithms. The least classification error is achieved by the hierarchical clustering algorithm with the average linkage of the data, in which each spectrum is scaled by its maximum amplitude. Then we applied a similar hierarchical clustering algorithm based on ICA to a deep XMM-Newton X-ray observation of the field of the eruptive young star V1647 Ori. Our classification method establishes that V1647 Ori is a spectrally distinct X-ray source in this field. Finally, we classified the Xray sources in the central field of a large survey, the Subaru/XMM-Newton deep survey, which contains a large population of high-redshift extragalactic sources. A small group of sources with maximum spectral peak above 1 keV are easily picked out from the spectral data set, and these sources appear to be associated with active galaxies. In general, these experiments confirm that our classification framework is an efficient X-ray imaging spectral analysis tool that gives astronomers insight into the fundamental physicalmechanisms responsible for X-ray emission and, furthermore, can be applied to a wide range of the electromagnetic spectrum

    Deep WFPC2 and Ground-based Imaging of a Complete Sample of 3C Quasars and Galaxies

    Get PDF
    We present the results of an HST and ground-based imaging study of a complete 3C sample of z ~ 1 sources, including 5 quasars and 5 radio galaxies. We have resolved continuum structure around all of our quasars in the WFPC2 images and in four of the five ground-based K' images. All of the quasars have some optical continuum structure that is aligned with the radio axis. In 3 of these cases, some of this optical structure is most likely due to optical synchrotron radiation, including optical counterparts to two radio jets and one radio lobe. Two quasars have aligned continuum and emission-line structures that are probably not due to beamed optical synchrotron emission. In another quasar, we see a red aligned object that lies 3 arcsec beyond the radio lobe, and may be an unassociated foreground galaxy, but has a remarkable morphological resemblance to the radio lobe itself. The radio galaxies and the quasars in this small sample have similar incidence of alignment, and the optical and K' flux densities are consistent within the high dispersion. The average quasar host galaxy luminosity is equivalent to, or a little fainter than, L*. All components around the quasars have optical-infrared colors that are redder than or similar to the colors of their respective nuclei; this is generally more consistent with a stellar rather than scattered origin for the emission. This study provides qualitative support for the unification of FRII quasars and galaxies.Comment: 69 pages, LaTeX (aaspp4.sty); 10 tables (aj_pt4.sty); 22 figures; accepted to A.J., August 199
    • …