Search CORE

11 research outputs found

Return of the features. Efficient feature selection and interpretation for photometric redshifts

Author: Cavuoti Stefano
D'Isanto Antonio
Gieseke Fabian
Polsterer Kai Lars
Publication venue: 'EDP Sciences'
Publication date: 01/01/2018
Field of study

The explosion of data in recent years has generated an increasing need for new analysis techniques in order to extract knowledge from massive datasets. Machine learning has proved particularly useful to perform this task. Fully automatized methods have recently gathered great popularity, even though those methods often lack physical interpretability. In contrast, feature based approaches can provide both well-performing models and understandable causalities with respect to the correlations found between features and physical processes. Efficient feature selection is an essential tool to boost the performance of machine learning models. In this work, we propose a forward selection method in order to compute, evaluate, and characterize better performing features for regression and classification problems. Given the importance of photometric redshift estimation, we adopt it as our case study. We synthetically created 4,520 features by combining magnitudes, errors, radii, and ellipticities of quasars, taken from the SDSS. We apply a forward selection process, a recursive method in which a huge number of feature sets is tested through a kNN algorithm, leading to a tree of feature sets. The branches of the tree are then used to perform experiments with the random forest, in order to validate the best set with an alternative model. We demonstrate that the sets of features determined with our approach improve the performances of the regression models significantly when compared to the performance of the classic features from the literature. The found features are unexpected and surprising, being very different from the classic features. Therefore, a method to interpret some of the found features in a physical context is presented. The methodology described here is very general and can be used to improve the performance of machine learning models for any regression or classification task.Comment: 21 pages, 11 figures, accepted for publication on A&A, final version after language revisio

arXiv.org e-Print Archive

Archivio della ricerca - Università degli studi di Napoli Federico II

EDP Sciences OAI-PMH repository (1.2.0)

Copenhagen University Research Information System

OA@INAF - Istituto Nazionale di Astrofisica

Detecting Quasars in Large-Scale Astronomical Surveys

Author: Bomanns Dominik
Dettmar Ralf-Jürgen
Gieseke Fabian
Kramer Oliver
Polsterer Kai Lars
Thom Andreas
Vahrenhold Jan
Zinn Peter-Christian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/08/2011
Field of study

We present a classification-based approach to identify quasi-stellar radio sources (quasars) in the Sloan Digital Sky Survey and evaluate its performance on a manually labeled training set. While reasonable results can already be obtained via approaches working only on photometric data, our experiments indicate that simple but problem-specific features extracted from spectroscopic data can significantly improve the classification performance. Since our approach works orthogonal to existing classification schemes used for building the spectroscopic catalogs, our classification results are well suited for a mutual assessment of the approaches' accuracies.Comment: 6 pages, 8 figures, published in proceedings of 2010 Ninth International Conference on Machine Learning and Applications (ICMLA) of the IEE

arXiv.org e-Print Archive

Crossref

The LUCIFER control software

Author: Polsterer Kai Lars (Dipl.)
Publication venue
Publication date: 12/12/2011
Field of study

Diese Dissertation behandelt die Architektur und Implementierung der Steuerungssoftware des Nah-Infrarot-Instruments LUCIFER. Dabei wird besonders auf die Kernkomponenten des verteilten Systems, die Ansteuerung der Elektroniken sowie die Modellierung der Bewegungsabläufe der opto-mechanischen Elemente eingegangen. Hierbei wird der gesamte Prozess, angefangen bei der Kommunikation zwischen den unterschiedlichen Anwendungen bis hin zur Aufbereitung der komplexen Interaktionsmöglichkeiten für das technische Personal, beschrieben. Des Weiteren wird eine neue Methode zur photometrischen Entfernungsbestimmung von Galaxien vorgestellt. Anhand von LUCIFER-Daten war es außerdem möglich, strukturierte extraplanare Emission von molekularem Wasserstoff in einer Zwerggalaxie nachzuweisen. Darüber hinaus beinhaltet die Arbeit die Suche nach hoch-rotverschobenen Quasaren in großen Katalogen unter Anwendung von Methoden des maschinellen Lernens

Dokumentenrepositorium der RUB / RUB-Repository

On GPU-based nearest neighbor queries for large-scale photometric catalogs in astronomy

Author: Gieseke Fabian
Heinermann Justin
Kramer Oliver
Polsterer Kai Lars
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Copenhagen University Research Information System

Improving the performance of photometric regression models via massive parallel feature selection

Author: Gieseke Fabian Cristian
Goto Tomotsugu
Igel Christian
Polsterer Kai Lars
Publication venue: 'Astronomical Society of the Pacific Conference Series'
Publication date: 01/01/2014
Field of study

Copenhagen University Research Information System

Speedy Greedy Feature Selection: Better Redshift Estimation via Massive Parallelism

Author: Christian Igel
Cosmin Eugen Oancea
Fabian Gieseke
Kai Lars Polsterer
Publication venue
Publication date: 01/01/2014
Field of study

Abstract. Nearest neighbor models are among the most basic tools in machine learning, and recent work has demonstrated their effectiveness in the field of astronomy. The performance of these models crucially depends on the underlying metric, and in particular on the selection of a meaningful subset of informative features. The feature selection is task-dependent and usually very time-consuming. In this work, we propose an efficient parallel implementation of incremental feature selection for nearest neighbor models utilizing nowadays graphics processing units. Our framework provides significant computational speed-ups over its sequential single-core competitor of up to two orders of magnitude. We demonstrate the applicability of the overall scheme on one of the most challenging tasks in astronomy: redshift estimation for distant galaxies.

CiteSeerX

Copenhagen University Research Information System

Massively-parallel best subset selection for ordinary least-squares regression

Author: Gieseke Fabian
Heskes Tom
Igel Christian
Mahabal Ashish
Polsterer Kai Lars
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2017
Field of study

Selecting an optimal subset of k out of d features for linear regression models given n training instances is often considered intractable for feature spaces with hundreds or thousands of dimensions. We propose an efficient massively-parallel implementation for selecting such optimal feature subsets in a brute-force fashion for small k. By exploiting the enormous compute power provided by modern parallel devices such as graphics processing units, it can deal with thousands of input dimensions even using standard commodity hardware only. We evaluate the practical runtime using artificial datasets and sketch the applicability of our framework in the context of astronomy