805 research outputs found
Data Deluge in Astrophysics: Photometric Redshifts as a Template Use Case
Astronomy has entered the big data era and Machine Learning based methods
have found widespread use in a large variety of astronomical applications. This
is demonstrated by the recent huge increase in the number of publications
making use of this new approach. The usage of machine learning methods, however
is still far from trivial and many problems still need to be solved. Using the
evaluation of photometric redshifts as a case study, we outline the main
problems and some ongoing efforts to solve them.Comment: 13 pages, 3 figures, Springer's Communications in Computer and
Information Science (CCIS), Vol. 82
Automated physical classification in the SDSS DR10. A catalogue of candidate Quasars
We discuss whether modern machine learning methods can be used to
characterize the physical nature of the large number of objects sampled by the
modern multi-band digital surveys. In particular, we applied the MLPQNA (Multi
Layer Perceptron with Quasi Newton Algorithm) method to the optical data of the
Sloan Digital Sky Survey - Data Release 10, investigating whether photometric
data alone suffice to disentangle different classes of objects as they are
defined in the SDSS spectroscopic classification. We discuss three groups of
classification problems: (i) the simultaneous classification of galaxies,
quasars and stars; (ii) the separation of stars from quasars; (iii) the
separation of galaxies with normal spectral energy distribution from those with
peculiar spectra, such as starburst or starforming galaxies and AGN. While
confirming the difficulty of disentangling AGN from normal galaxies on a
photometric basis only, MLPQNA proved to be quite effective in the three-class
separation. In disentangling quasars from stars and galaxies, our method
achieved an overall efficiency of 91.31% and a QSO class purity of ~95%. The
resulting catalogue of candidate quasars/AGNs consists of ~3.6 million objects,
of which about half a million are also flagged as robust candidates, and will
be made available on CDS VizieR facility.Comment: Accepted for publication by MNRAS, 13 pages, 6 figure
Data-Rich Astronomy: Mining Sky Surveys with PhotoRApToR
In the last decade a new generation of telescopes and sensors has allowed the
production of a very large amount of data and astronomy has become a data-rich
science. New automatic methods largely based on machine learning are needed to
cope with such data tsunami. We present some results in the fields of
photometric redshifts and galaxy classification, obtained using the MLPQNA
algorithm available in the DAMEWARE (Data Mining and Web Application Resource)
for the SDSS galaxies (DR9 and DR10). We present PhotoRApToR (Photometric
Research Application To Redshift): a Java based desktop application capable to
solve regression and classification problems and specialized for photo-z
estimation.Comment: proceedings of the IAU Symposium, Vol. 306, Cambridge University
Pres
Photometric redshifts with Quasi Newton Algorithm (MLPQNA). Results in the PHAT1 contest
Context. Since the advent of modern multiband digital sky surveys,
photometric redshifts (photo-z's) have become relevant if not crucial to many
fields of observational cosmology, from the characterization of cosmic
structures, to weak and strong lensing. Aims. We describe an application to an
astrophysical context, namely the evaluation of photometric redshifts, of
MLPQNA, a machine learning method based on Quasi Newton Algorithm. Methods.
Theoretical methods for photo-z's evaluation are based on the interpolation of
a priori knowledge (spectroscopic redshifts or SED templates) and represent an
ideal comparison ground for neural networks based methods. The MultiLayer
Perceptron with Quasi Newton learning rule (MLPQNA) described here is a
computing effective implementation of Neural Networks for the first time
exploited to solve regression problems in the astrophysical context and is
offered to the community through the DAMEWARE (DAta Mining & ExplorationWeb
Application REsource) infrastructure. Results. The PHAT contest (Hildebrandt et
al. 2010) provides a standard dataset to test old and new methods for
photometric redshift evaluation and with a set of statistical indicators which
allow a straightforward comparison among different methods. The MLPQNA model
has been applied on the whole PHAT1 dataset of 1984 objects after an
optimization of the model performed by using as training set the 515 available
spectroscopic redshifts. When applied to the PHAT1 dataset, MLPQNA obtains the
best bias accuracy (0.0006) and very competitive accuracies in terms of scatter
(0.056) and outlier percentage (16.3%), scoring as the second most effective
empirical method among those which have so far participated to the contest.
MLPQNA shows better generalization capabilities than most other empirical
methods especially in presence of underpopulated regions of the Knowledge Base.Comment: Accepted for publication in Astronomy & Astrophysics; 9 pages, 2
figure
Photometric redshift estimation based on data mining with PhotoRApToR
Photometric redshifts (photo-z) are crucial to the scientific exploitation of
modern panchromatic digital surveys. In this paper we present PhotoRApToR
(Photometric Research Application To Redshift): a Java/C++ based desktop
application capable to solve non-linear regression and multi-variate
classification problems, in particular specialized for photo-z estimation. It
embeds a machine learning algorithm, namely a multilayer neural network trained
by the Quasi Newton learning rule, and special tools dedicated to pre- and
postprocessing data. PhotoRApToR has been successfully tested on several
scientific cases. The application is available for free download from the DAME
Program web site.Comment: To appear on Experimental Astronomy, Springer, 20 pages, 15 figure
Return of the features. Efficient feature selection and interpretation for photometric redshifts
The explosion of data in recent years has generated an increasing need for
new analysis techniques in order to extract knowledge from massive datasets.
Machine learning has proved particularly useful to perform this task. Fully
automatized methods have recently gathered great popularity, even though those
methods often lack physical interpretability. In contrast, feature based
approaches can provide both well-performing models and understandable
causalities with respect to the correlations found between features and
physical processes. Efficient feature selection is an essential tool to boost
the performance of machine learning models. In this work, we propose a forward
selection method in order to compute, evaluate, and characterize better
performing features for regression and classification problems. Given the
importance of photometric redshift estimation, we adopt it as our case study.
We synthetically created 4,520 features by combining magnitudes, errors, radii,
and ellipticities of quasars, taken from the SDSS. We apply a forward selection
process, a recursive method in which a huge number of feature sets is tested
through a kNN algorithm, leading to a tree of feature sets. The branches of the
tree are then used to perform experiments with the random forest, in order to
validate the best set with an alternative model. We demonstrate that the sets
of features determined with our approach improve the performances of the
regression models significantly when compared to the performance of the classic
features from the literature. The found features are unexpected and surprising,
being very different from the classic features. Therefore, a method to
interpret some of the found features in a physical context is presented. The
methodology described here is very general and can be used to improve the
performance of machine learning models for any regression or classification
task.Comment: 21 pages, 11 figures, accepted for publication on A&A, final version
after language revisio
PhotoRaptor - Photometric Research Application To Redshifts
Due to the necessity to evaluate photo-z for a variety of huge sky survey
data sets, it seemed important to provide the astronomical community with an
instrument able to fill this gap. Besides the problem of moving massive data
sets over the network, another critical point is that a great part of
astronomical data is stored in private archives that are not fully accessible
on line. So, in order to evaluate photo-z it is needed a desktop application
that can be downloaded and used by everyone locally, i.e. on his own personal
computer or more in general within the local intranet hosted by a data center.
The name chosen for the application is PhotoRApToR, i.e. Photometric Research
Application To Redshift (Cavuoti et al. 2015, 2014; Brescia 2014b). It embeds a
machine learning algorithm and special tools dedicated to preand
post-processing data. The ML model is the MLPQNA (Multi Layer Perceptron
trained by the Quasi Newton Algorithm), which has been revealed particularly
powerful for the photo-z calculation on the base of a spectroscopic sample
(Cavuoti et al. 2012; Brescia et al. 2013, 2014a; Biviano et al. 2013).
The PhotoRApToR program package is available, for different platforms, at the
official website (http://dame.dsf.unina.it/dame_photoz.html#photoraptor).Comment: User Manual of the PhotoRaptor tool, 54 pages. arXiv admin note:
substantial text overlap with arXiv:1501.0650
Photometric redshifts for Quasars in multi band Surveys
MLPQNA stands for Multi Layer Perceptron with Quasi Newton Algorithm and it
is a machine learning method which can be used to cope with regression and
classification problems on complex and massive data sets. In this paper we give
the formal description of the method and present the results of its application
to the evaluation of photometric redshifts for quasars. The data set used for
the experiment was obtained by merging four different surveys (SDSS, GALEX,
UKIDSS and WISE), thus covering a wide range of wavelengths from the UV to the
mid-infrared. The method is able i) to achieve a very high accuracy; ii) to
drastically reduce the number of outliers and catastrophic objects; iii) to
discriminate among parameters (or features) on the basis of their significance,
so that the number of features used for training and analysis can be optimized
in order to reduce both the computational demands and the effects of
degeneracy. The best experiment, which makes use of a selected combination of
parameters drawn from the four surveys, leads, in terms of DeltaZnorm (i.e.
(zspec-zphot)/(1+zspec)), to an average of DeltaZnorm = 0.004, a standard
deviation sigma = 0.069 and a Median Absolute Deviation MAD = 0.02 over the
whole redshift range (i.e. zspec <= 3.6), defined by the 4-survey cross-matched
spectroscopic sample. The fraction of catastrophic outliers, i.e. of objects
with photo-z deviating more than 2sigma from the spectroscopic value is < 3%,
leading to a sigma = 0.035 after their removal, over the same redshift range.
The method is made available to the community through the DAMEWARE web
application.Comment: 38 pages, Submitted to ApJ in February 2013; Accepted by ApJ in May
201
Data Driven Discovery in Astrophysics
We review some aspects of the current state of data-intensive astronomy, its
methods, and some outstanding data analysis challenges. Astronomy is at the
forefront of "big data" science, with exponentially growing data volumes and
data rates, and an ever-increasing complexity, now entering the Petascale
regime. Telescopes and observatories from both ground and space, covering a
full range of wavelengths, feed the data via processing pipelines into
dedicated archives, where they can be accessed for scientific analysis. Most of
the large archives are connected through the Virtual Observatory framework,
that provides interoperability standards and services, and effectively
constitutes a global data grid of astronomy. Making discoveries in this
overabundance of data requires applications of novel, machine learning tools.
We describe some of the recent examples of such applications.Comment: Keynote talk in the proceedings of ESA-ESRIN Conference: Big Data
from Space 2014, Frascati, Italy, November 12-14, 2014, 8 pages, 2 figure
- …