18 research outputs found

    Dagstuhl Annual Report January - December 2011

    Get PDF
    The International Conference and Research Center for Computer Science is a non-profit organization. Its objective is to promote world-class research in computer science and to host research seminars which enable new ideas to be showcased, problems to be discussed and the course to be set for future development in this field. The work being done to run this informatics center is documented in this report for the business year 2011

    Supervised learning of short and high-dimensional temporal sequences for life science measurements

    Full text link
    The analysis of physiological processes over time are often given by spectrometric or gene expression profiles over time with only few time points but a large number of measured variables. The analysis of such temporal sequences is challenging and only few methods have been proposed. The information can be encoded time independent, by means of classical expression differences for a single time point or in expression profiles over time. Available methods are limited to unsupervised and semi-supervised settings. The predictive variables can be identified only by means of wrapper or post-processing techniques. This is complicated due to the small number of samples for such studies. Here, we present a supervised learning approach, termed Supervised Topographic Mapping Through Time (SGTM-TT). It learns a supervised mapping of the temporal sequences onto a low dimensional grid. We utilize a hidden markov model (HMM) to account for the time domain and relevance learning to identify the relevant feature dimensions most predictive over time. The learned mapping can be used to visualize the temporal sequences and to predict the class of a new sequence. The relevance learning permits the identification of discriminating masses or gen expressions and prunes dimensions which are unnecessary for the classification task or encode mainly noise. In this way we obtain a very efficient learning system for temporal sequences. The results indicate that using simultaneous supervised learning and metric adaptation significantly improves the prediction accuracy for synthetically and real life data in comparison to the standard techniques. The discriminating features, identified by relevance learning, compare favorably with the results of alternative methods. Our method permits the visualization of the data on a low dimensional grid, highlighting the observed temporal structure

    Dissimilarity-based learning for complex data

    Get PDF
    Mokbel B. Dissimilarity-based learning for complex data. Bielefeld: Universität Bielefeld; 2016.Rapid advances of information technology have entailed an ever increasing amount of digital data, which raises the demand for powerful data mining and machine learning tools. Due to modern methods for gathering, preprocessing, and storing information, the collected data become more and more complex: a simple vectorial representation, and comparison in terms of the Euclidean distance is often no longer appropriate to capture relevant aspects in the data. Instead, problem-adapted similarity or dissimilarity measures refer directly to the given encoding scheme, allowing to treat information constituents in a relational manner. This thesis addresses several challenges of complex data sets and their representation in the context of machine learning. The goal is to investigate possible remedies, and propose corresponding improvements of established methods, accompanied by examples from various application domains. The main scientific contributions are the following: (I) Many well-established machine learning techniques are restricted to vectorial input data only. Therefore, we propose the extension of two popular prototype-based clustering and classification algorithms to non-negative symmetric dissimilarity matrices. (II) Some dissimilarity measures incorporate a fine-grained parameterization, which allows to configure the comparison scheme with respect to the given data and the problem at hand. However, finding adequate parameters can be hard or even impossible for human users, due to the intricate effects of parameter changes and the lack of detailed prior knowledge. Therefore, we propose to integrate a metric learning scheme into a dissimilarity-based classifier, which can automatically adapt the parameters of a sequence alignment measure according to the given classification task. (III) A valuable instrument to make complex data sets accessible are dimensionality reduction techniques, which can provide an approximate low-dimensional embedding of the given data set, and, as a special case, a planar map to visualize the data's neighborhood structure. To assess the reliability of such an embedding, we propose the extension of a well-known quality measure to enable a fine-grained, tractable quantitative analysis, which can be integrated into a visualization. This tool can also help to compare different dissimilarity measures (and parameter settings), if ground truth is not available. (IV) All techniques are demonstrated on real-world examples from a variety of application domains, including bioinformatics, motion capturing, music, and education

    Learning in the context of very high dimensional data (Dagstuhl Seminar 11341)

    No full text
    This report documents the program and the outcomes of Dagstuhl Seminar 11341 "Learning in the context of very high dimensional data". The aim of the seminar was to bring together researchers who develop, investigate, or apply machine learning methods for very high dimensional data to advance this important field of research. The focus was be on broadly applicable methods and processing pipelines, which offer efficient solutions for high-dimensional data analysis appropriate for a wide range of application scenarios

    Discriminative dimensionality reduction: variations, applications, interpretations

    Get PDF
    Schulz A. Discriminative dimensionality reduction: variations, applications, interpretations. Bielefeld: Universität Bielefeld; 2017.The amount of digital data increases rapidly as a result of advances in information and sensor technology. Because the data sets grow with respect to their size, complexity and dimensionality, they are no longer easily accessible to a human user. The framework of dimensionality reduction addresses this problem by aiming to visualize complex data sets in two dimensions while preserving the relevant structure. While these methods can provide significant insights, the problem formulation of structure preservation is ill-posed in general and can lead to undesired effects. In this thesis, the concept of discriminative dimensionality reduction is investigated as a particular promising way to indicate relevant structure by specifying auxiliary data. The goal is to overcome challenges in data inspection and to investigate in how far discriminative dimensionality reduction methods can yield an improvement. The main scientific contributions are the following: (I) The most popular techniques for discriminative dimensionality reduction are based on the Fisher metric. However, they are restricted in their applicability as concerns complex settings: They can only be employed for fixed data sets, i.e. new data cannot be included in an existing embedding. Only data provided in vectorial representation can be processed. And they are designed for discrete-valued auxiliary data and cannot be applied to real-valued ones. We propose solutions to overcome these challenges. (II) Besides the problem that complex data are not accessible to humans, the same holds for trained machine learning models which often constitute black box models. In order to provide an intuitive interface to such models, we propose a general framework which allows to visualize high-dimensional functions, such as regression or classification functions, in two dimensions. (III) Although nonlinear dimensionality reduction techniques illustrate the structure of the data very well, they suffer from the fact that there is no explicit relationship between the original features and the obtained projection. We propose a methodology to create a connection, thus allowing to understand the importance of the features. (IV) Although linear mappings constitute a very popular tool, a direct interpretation of their weights as feature relevance can be misleading. We propose a methodology which enables a valid interpretation by providing relevance bounds for each feature. (V) The problem of transfer learning without given correspondence information between the source and target space and without labels is particularly challenging. Here, we utilize the structure preserving property of dimensionality reduction methods to transfer knowledge in a latent space given by dimensionality reduction

    Advances in dissimilarity-based data visualisation

    Get PDF
    Gisbrecht A. Advances in dissimilarity-based data visualisation. Bielefeld: Universitätsbibliothek Bielefeld; 2015
    corecore