2,548 research outputs found
Astroinformatics, data mining and the future of astronomical research
Astronomy, as many other scientific disciplines, is facing a true data deluge
which is bound to change both the praxis and the methodology of every day
research work. The emerging field of astroinformatics, while on the one end
appears crucial to face the technological challenges, on the other is opening
new exciting perspectives for new astronomical discoveries through the
implementation of advanced data mining procedures. The complexity of
astronomical data and the variety of scientific problems, however, call for
innovative algorithms and methods as well as for an extreme usage of ICT
technologies.Comment: To appear in the Proceedings of the 2-nd International Conference on
Frontiers on diagnostic technologie
Automated physical classification in the SDSS DR10. A catalogue of candidate Quasars
We discuss whether modern machine learning methods can be used to
characterize the physical nature of the large number of objects sampled by the
modern multi-band digital surveys. In particular, we applied the MLPQNA (Multi
Layer Perceptron with Quasi Newton Algorithm) method to the optical data of the
Sloan Digital Sky Survey - Data Release 10, investigating whether photometric
data alone suffice to disentangle different classes of objects as they are
defined in the SDSS spectroscopic classification. We discuss three groups of
classification problems: (i) the simultaneous classification of galaxies,
quasars and stars; (ii) the separation of stars from quasars; (iii) the
separation of galaxies with normal spectral energy distribution from those with
peculiar spectra, such as starburst or starforming galaxies and AGN. While
confirming the difficulty of disentangling AGN from normal galaxies on a
photometric basis only, MLPQNA proved to be quite effective in the three-class
separation. In disentangling quasars from stars and galaxies, our method
achieved an overall efficiency of 91.31% and a QSO class purity of ~95%. The
resulting catalogue of candidate quasars/AGNs consists of ~3.6 million objects,
of which about half a million are also flagged as robust candidates, and will
be made available on CDS VizieR facility.Comment: Accepted for publication by MNRAS, 13 pages, 6 figure
Data-Rich Astronomy: Mining Sky Surveys with PhotoRApToR
In the last decade a new generation of telescopes and sensors has allowed the
production of a very large amount of data and astronomy has become a data-rich
science. New automatic methods largely based on machine learning are needed to
cope with such data tsunami. We present some results in the fields of
photometric redshifts and galaxy classification, obtained using the MLPQNA
algorithm available in the DAMEWARE (Data Mining and Web Application Resource)
for the SDSS galaxies (DR9 and DR10). We present PhotoRApToR (Photometric
Research Application To Redshift): a Java based desktop application capable to
solve regression and classification problems and specialized for photo-z
estimation.Comment: proceedings of the IAU Symposium, Vol. 306, Cambridge University
Pres
Photometric redshifts with Quasi Newton Algorithm (MLPQNA). Results in the PHAT1 contest
Context. Since the advent of modern multiband digital sky surveys,
photometric redshifts (photo-z's) have become relevant if not crucial to many
fields of observational cosmology, from the characterization of cosmic
structures, to weak and strong lensing. Aims. We describe an application to an
astrophysical context, namely the evaluation of photometric redshifts, of
MLPQNA, a machine learning method based on Quasi Newton Algorithm. Methods.
Theoretical methods for photo-z's evaluation are based on the interpolation of
a priori knowledge (spectroscopic redshifts or SED templates) and represent an
ideal comparison ground for neural networks based methods. The MultiLayer
Perceptron with Quasi Newton learning rule (MLPQNA) described here is a
computing effective implementation of Neural Networks for the first time
exploited to solve regression problems in the astrophysical context and is
offered to the community through the DAMEWARE (DAta Mining & ExplorationWeb
Application REsource) infrastructure. Results. The PHAT contest (Hildebrandt et
al. 2010) provides a standard dataset to test old and new methods for
photometric redshift evaluation and with a set of statistical indicators which
allow a straightforward comparison among different methods. The MLPQNA model
has been applied on the whole PHAT1 dataset of 1984 objects after an
optimization of the model performed by using as training set the 515 available
spectroscopic redshifts. When applied to the PHAT1 dataset, MLPQNA obtains the
best bias accuracy (0.0006) and very competitive accuracies in terms of scatter
(0.056) and outlier percentage (16.3%), scoring as the second most effective
empirical method among those which have so far participated to the contest.
MLPQNA shows better generalization capabilities than most other empirical
methods especially in presence of underpopulated regions of the Knowledge Base.Comment: Accepted for publication in Astronomy & Astrophysics; 9 pages, 2
figure
PhotoRaptor - Photometric Research Application To Redshifts
Due to the necessity to evaluate photo-z for a variety of huge sky survey
data sets, it seemed important to provide the astronomical community with an
instrument able to fill this gap. Besides the problem of moving massive data
sets over the network, another critical point is that a great part of
astronomical data is stored in private archives that are not fully accessible
on line. So, in order to evaluate photo-z it is needed a desktop application
that can be downloaded and used by everyone locally, i.e. on his own personal
computer or more in general within the local intranet hosted by a data center.
The name chosen for the application is PhotoRApToR, i.e. Photometric Research
Application To Redshift (Cavuoti et al. 2015, 2014; Brescia 2014b). It embeds a
machine learning algorithm and special tools dedicated to preand
post-processing data. The ML model is the MLPQNA (Multi Layer Perceptron
trained by the Quasi Newton Algorithm), which has been revealed particularly
powerful for the photo-z calculation on the base of a spectroscopic sample
(Cavuoti et al. 2012; Brescia et al. 2013, 2014a; Biviano et al. 2013).
The PhotoRApToR program package is available, for different platforms, at the
official website (http://dame.dsf.unina.it/dame_photoz.html#photoraptor).Comment: User Manual of the PhotoRaptor tool, 54 pages. arXiv admin note:
substantial text overlap with arXiv:1501.0650
Photometric redshift estimation based on data mining with PhotoRApToR
Photometric redshifts (photo-z) are crucial to the scientific exploitation of
modern panchromatic digital surveys. In this paper we present PhotoRApToR
(Photometric Research Application To Redshift): a Java/C++ based desktop
application capable to solve non-linear regression and multi-variate
classification problems, in particular specialized for photo-z estimation. It
embeds a machine learning algorithm, namely a multilayer neural network trained
by the Quasi Newton learning rule, and special tools dedicated to pre- and
postprocessing data. PhotoRApToR has been successfully tested on several
scientific cases. The application is available for free download from the DAME
Program web site.Comment: To appear on Experimental Astronomy, Springer, 20 pages, 15 figure
Mining Knowledge in Astrophysical Massive Data Sets
Modern scientific data mainly consist of huge datasets gathered by a very
large number of techniques and stored in very diversified and often
incompatible data repositories. More in general, in the e-science environment,
it is considered as a critical and urgent requirement to integrate services
across distributed, heterogeneous, dynamic "virtual organizations" formed by
different resources within a single enterprise. In the last decade, Astronomy
has become an immensely data rich field due to the evolution of detectors
(plates to digital to mosaics), telescopes and space instruments. The Virtual
Observatory approach consists into the federation under common standards of all
astronomical archives available worldwide, as well as data analysis, data
mining and data exploration applications. The main drive behind such effort
being that once the infrastructure will be completed, it will allow a new type
of multi-wavelength, multi-epoch science which can only be barely imagined.
Data Mining, or Knowledge Discovery in Databases, while being the main
methodology to extract the scientific information contained in such MDS
(Massive Data Sets), poses crucial problems since it has to orchestrate complex
problems posed by transparent access to different computing environments,
scalability of algorithms, reusability of resources, etc. In the present paper
we summarize the present status of the MDS in the Virtual Observatory and what
is currently done and planned to bring advanced Data Mining methodologies in
the case of the DAME (DAta Mining & Exploration) project.Comment: Pages 845-849 1rs International Conference on Frontiers in
Diagnostics Technologie
Stellar formation rates in galaxies using Machine Learning models
Global Stellar Formation Rates or SFRs are crucial to constrain theories of
galaxy formation and evolution. SFR's are usually estimated via spectroscopic
observations which require too much previous telescope time and therefore
cannot match the needs of modern precision cosmology. We therefore propose a
novel method to estimate SFRs for large samples of galaxies using a variety of
supervised ML models.Comment: ESANN 2018 - Proceedings, ISBN-13 978287587048
Genetic Algorithm Modeling with GPU Parallel Computing Technology
We present a multi-purpose genetic algorithm, designed and implemented with
GPGPU / CUDA parallel computing technology. The model was derived from a
multi-core CPU serial implementation, named GAME, already scientifically
successfully tested and validated on astrophysical massive data classification
problems, through a web application resource (DAMEWARE), specialized in data
mining based on Machine Learning paradigms. Since genetic algorithms are
inherently parallel, the GPGPU computing paradigm has provided an exploit of
the internal training features of the model, permitting a strong optimization
in terms of processing performances and scalability.Comment: 11 pages, 2 figures, refereed proceedings; Neural Nets and
Surroundings, Proceedings of 22nd Italian Workshop on Neural Nets, WIRN 2012;
Smart Innovation, Systems and Technologies, Vol. 19, Springe
METAPHOR: Probability density estimation for machine learning based photometric redshifts
We present METAPHOR (Machine-learning Estimation Tool for Accurate
PHOtometric Redshifts), a method able to provide a reliable PDF for photometric
galaxy redshifts estimated through empirical techniques. METAPHOR is a modular
workflow, mainly based on the MLPQNA neural network as internal engine to
derive photometric galaxy redshifts, but giving the possibility to easily
replace MLPQNA with any other method to predict photo-z's and their PDF. We
present here the results about a validation test of the workflow on the
galaxies from SDSS-DR9, showing also the universality of the method by
replacing MLPQNA with KNN and Random Forest models. The validation test include
also a comparison with the PDF's derived from a traditional SED template
fitting method (Le Phare).Comment: proceedings of the International Astronomical Union, IAU-325
symposium, Cambridge University pres
- …