11 research outputs found
Return of the features. Efficient feature selection and interpretation for photometric redshifts
The explosion of data in recent years has generated an increasing need for
new analysis techniques in order to extract knowledge from massive datasets.
Machine learning has proved particularly useful to perform this task. Fully
automatized methods have recently gathered great popularity, even though those
methods often lack physical interpretability. In contrast, feature based
approaches can provide both well-performing models and understandable
causalities with respect to the correlations found between features and
physical processes. Efficient feature selection is an essential tool to boost
the performance of machine learning models. In this work, we propose a forward
selection method in order to compute, evaluate, and characterize better
performing features for regression and classification problems. Given the
importance of photometric redshift estimation, we adopt it as our case study.
We synthetically created 4,520 features by combining magnitudes, errors, radii,
and ellipticities of quasars, taken from the SDSS. We apply a forward selection
process, a recursive method in which a huge number of feature sets is tested
through a kNN algorithm, leading to a tree of feature sets. The branches of the
tree are then used to perform experiments with the random forest, in order to
validate the best set with an alternative model. We demonstrate that the sets
of features determined with our approach improve the performances of the
regression models significantly when compared to the performance of the classic
features from the literature. The found features are unexpected and surprising,
being very different from the classic features. Therefore, a method to
interpret some of the found features in a physical context is presented. The
methodology described here is very general and can be used to improve the
performance of machine learning models for any regression or classification
task.Comment: 21 pages, 11 figures, accepted for publication on A&A, final version
after language revisio
Detecting Quasars in Large-Scale Astronomical Surveys
We present a classification-based approach to identify quasi-stellar radio
sources (quasars) in the Sloan Digital Sky Survey and evaluate its performance
on a manually labeled training set. While reasonable results can already be
obtained via approaches working only on photometric data, our experiments
indicate that simple but problem-specific features extracted from spectroscopic
data can significantly improve the classification performance. Since our
approach works orthogonal to existing classification schemes used for building
the spectroscopic catalogs, our classification results are well suited for a
mutual assessment of the approaches' accuracies.Comment: 6 pages, 8 figures, published in proceedings of 2010 Ninth
International Conference on Machine Learning and Applications (ICMLA) of the
IEE
The LUCIFER control software
Diese Dissertation behandelt die Architektur und Implementierung der Steuerungssoftware des Nah-Infrarot-Instruments LUCIFER. Dabei wird besonders auf die Kernkomponenten des verteilten Systems, die Ansteuerung der Elektroniken sowie die Modellierung der Bewegungsabläufe der opto-mechanischen Elemente eingegangen. Hierbei wird der gesamte Prozess, angefangen bei der Kommunikation zwischen den unterschiedlichen Anwendungen bis hin zur Aufbereitung der komplexen Interaktionsmöglichkeiten für das technische Personal, beschrieben. Des Weiteren wird eine neue Methode zur photometrischen Entfernungsbestimmung von Galaxien vorgestellt.
Anhand von LUCIFER-Daten war es außerdem möglich, strukturierte extraplanare Emission von molekularem Wasserstoff in einer Zwerggalaxie nachzuweisen. Darüber hinaus beinhaltet die Arbeit die Suche nach hoch-rotverschobenen Quasaren in großen Katalogen unter Anwendung von Methoden des maschinellen Lernens
Improving the performance of photometric regression models via massive parallel feature selection
Speedy Greedy Feature Selection: Better Redshift Estimation via Massive Parallelism
Abstract. Nearest neighbor models are among the most basic tools in machine learning, and recent work has demonstrated their effectiveness in the field of astronomy. The performance of these models crucially depends on the underlying metric, and in particular on the selection of a meaningful subset of informative features. The feature selection is task-dependent and usually very time-consuming. In this work, we propose an efficient parallel implementation of incremental feature selection for nearest neighbor models utilizing nowadays graphics processing units. Our framework provides significant computational speed-ups over its sequential single-core competitor of up to two orders of magnitude. We demonstrate the applicability of the overall scheme on one of the most challenging tasks in astronomy: redshift estimation for distant galaxies.
Massively-parallel best subset selection for ordinary least-squares regression
Selecting an optimal subset of k out of d features for linear regression models given n training instances is often considered intractable for feature spaces with hundreds or thousands of dimensions. We propose an efficient massively-parallel implementation for selecting such optimal feature subsets in a brute-force fashion for small k. By exploiting the enormous compute power provided by modern parallel devices such as graphics processing units, it can deal with thousands of input dimensions even using standard commodity hardware only. We evaluate the practical runtime using artificial datasets and sketch the applicability of our framework in the context of astronomy