18 research outputs found
Dagstuhl Annual Report January - December 2011
The International Conference and Research Center for Computer Science is a non-profit organization. Its objective is to promote world-class research in computer science and to host research seminars which enable new ideas to be showcased, problems to be discussed and the course to be set for future development in this field. The work being done to run this informatics center is documented in this report for the business year 2011
Supervised learning of short and high-dimensional temporal sequences for life science measurements
The analysis of physiological processes over time are often given by
spectrometric or gene expression profiles over time with only few time points
but a large number of measured variables. The analysis of such temporal
sequences is challenging and only few methods have been proposed. The
information can be encoded time independent, by means of classical expression
differences for a single time point or in expression profiles over time.
Available methods are limited to unsupervised and semi-supervised settings. The
predictive variables can be identified only by means of wrapper or
post-processing techniques. This is complicated due to the small number of
samples for such studies. Here, we present a supervised learning approach,
termed Supervised Topographic Mapping Through Time (SGTM-TT). It learns a
supervised mapping of the temporal sequences onto a low dimensional grid. We
utilize a hidden markov model (HMM) to account for the time domain and
relevance learning to identify the relevant feature dimensions most predictive
over time. The learned mapping can be used to visualize the temporal sequences
and to predict the class of a new sequence. The relevance learning permits the
identification of discriminating masses or gen expressions and prunes
dimensions which are unnecessary for the classification task or encode mainly
noise. In this way we obtain a very efficient learning system for temporal
sequences. The results indicate that using simultaneous supervised learning and
metric adaptation significantly improves the prediction accuracy for
synthetically and real life data in comparison to the standard techniques. The
discriminating features, identified by relevance learning, compare favorably
with the results of alternative methods. Our method permits the visualization
of the data on a low dimensional grid, highlighting the observed temporal
structure
Dissimilarity-based learning for complex data
Mokbel B. Dissimilarity-based learning for complex data. Bielefeld: Universität Bielefeld; 2016.Rapid advances of information technology have entailed an ever increasing amount of digital data, which raises the demand for powerful data mining and machine learning tools. Due to modern methods for gathering, preprocessing, and storing information, the collected data become more and more complex: a simple vectorial representation, and comparison in terms of the Euclidean distance is often no longer appropriate to capture relevant aspects in the data. Instead, problem-adapted similarity or dissimilarity measures refer directly to the given encoding scheme, allowing to treat information constituents in a relational manner.
This thesis addresses several challenges of complex data sets and their representation in the context of machine learning. The goal is to investigate possible remedies, and propose corresponding improvements of established methods, accompanied by examples from various application domains. The main scientific contributions are the following:
(I) Many well-established machine learning techniques are restricted to vectorial input data only. Therefore, we propose the extension of two popular prototype-based clustering and classification algorithms to non-negative symmetric dissimilarity matrices.
(II) Some dissimilarity measures incorporate a fine-grained parameterization, which allows to configure the comparison scheme with respect to the given data and the problem at hand. However, finding adequate parameters can be hard or even impossible for human users, due to the intricate effects of parameter changes and the lack of detailed prior knowledge. Therefore, we propose to integrate a metric learning scheme into a dissimilarity-based classifier, which can automatically adapt the parameters of a sequence alignment measure according to the given classification task.
(III) A valuable instrument to make complex data sets accessible are dimensionality reduction techniques, which can provide an approximate low-dimensional embedding of the given data set, and, as a special case, a planar map to visualize the data's neighborhood structure. To assess the reliability of such an embedding, we propose the extension of a well-known quality measure to enable a fine-grained, tractable quantitative analysis, which can be integrated into a visualization. This tool can also help to compare different dissimilarity measures (and parameter settings), if ground truth is not available.
(IV) All techniques are demonstrated on real-world examples from a variety of application domains, including bioinformatics, motion capturing, music, and education
Learning in the context of very high dimensional data (Dagstuhl Seminar 11341)
This report documents the program and the outcomes of Dagstuhl Seminar 11341 "Learning in the context of very high dimensional data". The aim of the seminar was to bring together researchers who develop, investigate, or apply machine learning methods for very high dimensional data to advance this important field of research. The focus was be on broadly applicable methods and processing pipelines, which offer efficient solutions for high-dimensional data analysis appropriate for a wide range of application scenarios
Discriminative dimensionality reduction: variations, applications, interpretations
Schulz A. Discriminative dimensionality reduction: variations, applications, interpretations. Bielefeld: Universität Bielefeld; 2017.The amount of digital data increases rapidly as a result of advances in information and sensor technology. Because the data sets grow with respect to their size, complexity and dimensionality, they are no longer easily accessible to a human user. The framework of dimensionality reduction addresses this problem by aiming to visualize complex data sets in two dimensions while preserving the relevant structure. While these methods can provide significant insights, the problem formulation of structure preservation is ill-posed in general and can lead to undesired effects.
In this thesis, the concept of discriminative dimensionality reduction is investigated as a particular promising way to indicate relevant structure by specifying auxiliary data.
The goal is to overcome challenges in data inspection and to investigate in how far discriminative dimensionality reduction methods can yield an improvement. The main scientific contributions are the following:
(I) The most popular techniques for discriminative dimensionality reduction
are based on the Fisher metric. However, they are restricted in their applicability as concerns complex settings: They can only be employed for fixed data sets, i.e. new data cannot be included in an existing embedding. Only data provided in vectorial representation can be processed. And they are designed for discrete-valued auxiliary data and cannot be applied to real-valued ones. We propose solutions to overcome these challenges.
(II) Besides the problem that complex data are not accessible to humans, the same holds for trained machine learning models which often constitute black box models. In order to provide an intuitive interface to such models, we propose a general framework which allows to visualize high-dimensional functions, such as regression or classification functions, in two dimensions.
(III) Although nonlinear dimensionality reduction techniques illustrate the structure of the data very well, they suffer from the fact that there is no explicit relationship between the original features and the obtained projection. We propose a methodology to create a connection, thus allowing to understand the
importance of the features.
(IV) Although linear mappings constitute a very popular tool, a direct interpretation of their weights as feature relevance can be misleading. We propose a methodology which enables a valid interpretation by providing relevance bounds for each feature.
(V) The problem of transfer learning without given correspondence information between the source and target space and without labels is particularly challenging. Here, we utilize the structure preserving property of dimensionality reduction methods to transfer knowledge in a latent space given by dimensionality reduction
Advances in dissimilarity-based data visualisation
Gisbrecht A. Advances in dissimilarity-based data visualisation. Bielefeld: Universitätsbibliothek Bielefeld; 2015