11 research outputs found

    Return of the features. Efficient feature selection and interpretation for photometric redshifts

    Get PDF
    The explosion of data in recent years has generated an increasing need for new analysis techniques in order to extract knowledge from massive datasets. Machine learning has proved particularly useful to perform this task. Fully automatized methods have recently gathered great popularity, even though those methods often lack physical interpretability. In contrast, feature based approaches can provide both well-performing models and understandable causalities with respect to the correlations found between features and physical processes. Efficient feature selection is an essential tool to boost the performance of machine learning models. In this work, we propose a forward selection method in order to compute, evaluate, and characterize better performing features for regression and classification problems. Given the importance of photometric redshift estimation, we adopt it as our case study. We synthetically created 4,520 features by combining magnitudes, errors, radii, and ellipticities of quasars, taken from the SDSS. We apply a forward selection process, a recursive method in which a huge number of feature sets is tested through a kNN algorithm, leading to a tree of feature sets. The branches of the tree are then used to perform experiments with the random forest, in order to validate the best set with an alternative model. We demonstrate that the sets of features determined with our approach improve the performances of the regression models significantly when compared to the performance of the classic features from the literature. The found features are unexpected and surprising, being very different from the classic features. Therefore, a method to interpret some of the found features in a physical context is presented. The methodology described here is very general and can be used to improve the performance of machine learning models for any regression or classification task.Comment: 21 pages, 11 figures, accepted for publication on A&A, final version after language revisio

    Detecting Quasars in Large-Scale Astronomical Surveys

    Full text link
    We present a classification-based approach to identify quasi-stellar radio sources (quasars) in the Sloan Digital Sky Survey and evaluate its performance on a manually labeled training set. While reasonable results can already be obtained via approaches working only on photometric data, our experiments indicate that simple but problem-specific features extracted from spectroscopic data can significantly improve the classification performance. Since our approach works orthogonal to existing classification schemes used for building the spectroscopic catalogs, our classification results are well suited for a mutual assessment of the approaches' accuracies.Comment: 6 pages, 8 figures, published in proceedings of 2010 Ninth International Conference on Machine Learning and Applications (ICMLA) of the IEE

    The LUCIFER control software

    No full text
    Diese Dissertation behandelt die Architektur und Implementierung der Steuerungssoftware des Nah-Infrarot-Instruments LUCIFER. Dabei wird besonders auf die Kernkomponenten des verteilten Systems, die Ansteuerung der Elektroniken sowie die Modellierung der Bewegungsabläufe der opto-mechanischen Elemente eingegangen. Hierbei wird der gesamte Prozess, angefangen bei der Kommunikation zwischen den unterschiedlichen Anwendungen bis hin zur Aufbereitung der komplexen Interaktionsmöglichkeiten für das technische Personal, beschrieben. Des Weiteren wird eine neue Methode zur photometrischen Entfernungsbestimmung von Galaxien vorgestellt. Anhand von LUCIFER-Daten war es außerdem möglich, strukturierte extraplanare Emission von molekularem Wasserstoff in einer Zwerggalaxie nachzuweisen. Darüber hinaus beinhaltet die Arbeit die Suche nach hoch-rotverschobenen Quasaren in großen Katalogen unter Anwendung von Methoden des maschinellen Lernens

    Speedy Greedy Feature Selection: Better Redshift Estimation via Massive Parallelism

    No full text
    Abstract. Nearest neighbor models are among the most basic tools in machine learning, and recent work has demonstrated their effectiveness in the field of astronomy. The performance of these models crucially depends on the underlying metric, and in particular on the selection of a meaningful subset of informative features. The feature selection is task-dependent and usually very time-consuming. In this work, we propose an efficient parallel implementation of incremental feature selection for nearest neighbor models utilizing nowadays graphics processing units. Our framework provides significant computational speed-ups over its sequential single-core competitor of up to two orders of magnitude. We demonstrate the applicability of the overall scheme on one of the most challenging tasks in astronomy: redshift estimation for distant galaxies.

    Massively-parallel best subset selection for ordinary least-squares regression

    No full text
    Selecting an optimal subset of k out of d features for linear regression models given n training instances is often considered intractable for feature spaces with hundreds or thousands of dimensions. We propose an efficient massively-parallel implementation for selecting such optimal feature subsets in a brute-force fashion for small k. By exploiting the enormous compute power provided by modern parallel devices such as graphics processing units, it can deal with thousands of input dimensions even using standard commodity hardware only. We evaluate the practical runtime using artificial datasets and sketch the applicability of our framework in the context of astronomy
    corecore