Search CORE

8 research outputs found

Dissimilarity-based learning for complex data

Author: Mokbel Bassam
Publication venue: Universität Bielefeld
Publication date: 01/01/2016
Field of study

Mokbel B. Dissimilarity-based learning for complex data. Bielefeld: Universität Bielefeld; 2016.Rapid advances of information technology have entailed an ever increasing amount of digital data, which raises the demand for powerful data mining and machine learning tools. Due to modern methods for gathering, preprocessing, and storing information, the collected data become more and more complex: a simple vectorial representation, and comparison in terms of the Euclidean distance is often no longer appropriate to capture relevant aspects in the data. Instead, problem-adapted similarity or dissimilarity measures refer directly to the given encoding scheme, allowing to treat information constituents in a relational manner. This thesis addresses several challenges of complex data sets and their representation in the context of machine learning. The goal is to investigate possible remedies, and propose corresponding improvements of established methods, accompanied by examples from various application domains. The main scientific contributions are the following: (I) Many well-established machine learning techniques are restricted to vectorial input data only. Therefore, we propose the extension of two popular prototype-based clustering and classification algorithms to non-negative symmetric dissimilarity matrices. (II) Some dissimilarity measures incorporate a fine-grained parameterization, which allows to configure the comparison scheme with respect to the given data and the problem at hand. However, finding adequate parameters can be hard or even impossible for human users, due to the intricate effects of parameter changes and the lack of detailed prior knowledge. Therefore, we propose to integrate a metric learning scheme into a dissimilarity-based classifier, which can automatically adapt the parameters of a sequence alignment measure according to the given classification task. (III) A valuable instrument to make complex data sets accessible are dimensionality reduction techniques, which can provide an approximate low-dimensional embedding of the given data set, and, as a special case, a planar map to visualize the data's neighborhood structure. To assess the reliability of such an embedding, we propose the extension of a well-known quality measure to enable a fine-grained, tractable quantitative analysis, which can be integrated into a visualization. This tool can also help to compare different dissimilarity measures (and parameter settings), if ground truth is not available. (IV) All techniques are demonstrated on real-world examples from a variety of application domains, including bioinformatics, motion capturing, music, and education

Publications at Bielefeld University

Efficient Adaptation of Structure Metrics in Prototype-Based Classification

Author: Duch Włodzisław
Hammer Barbara
Honkela Timo
Koprinkova-Hristova Petia
Magg Sven
Mokbel Bassam
Paaßen Benjamin
Palm Günther
Villa Allessandro
Weber Cornelius
Wermter Stefan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Mokbel B, Paaßen B, Hammer B. Efficient Adaptation of Structure Metrics in Prototype-Based Classification. In: Wermter S, Weber C, Duch W, et al., eds. Artificial Neural Networks and Machine Learning - ICANN 2014 - 24th International Conference on Artificial Neural Networks, Hamburg, Germany, September 15-19, 2014. Proceedings. Lecture Notes in Computer Science. Vol 8681. Springer; 2014: 571-578.More complex data formats and dedicated structure metrics have spurred the development of intuitive machine learning techniques which directly deal with dissimilarity data, such as relational learning vector quantization (RLVQ). The adjustment of metric parameters like relevance weights for basic structural elements constitutes a crucial issue therein, and first methods to automatically learn metric parameters from given data were proposed recently. In this contribution, we investigate a robust learning scheme to adapt metric parameters such as the scoring matrix in sequence alignment in conjunction with prototype learning, and we investigate the suitability of efficient approximations thereof

Publications at Bielefeld University

Advances in dissimilarity-based data visualisation

Author: Gisbrecht Andrej
Publication venue: Universitätsbibliothek Bielefeld
Publication date: 01/01/2015
Field of study

Gisbrecht A. Advances in dissimilarity-based data visualisation. Bielefeld: Universitätsbibliothek Bielefeld; 2015

Publications at Bielefeld University

Adaptive prototype-based dissimilarity learning

Author: Zhu Xibin
Publication venue: Universitätsbibliothek Bielefeld
Publication date: 01/01/2015
Field of study

Zhu X. Adaptive prototype-based dissimilarity learning. Bielefeld: Universitätsbibliothek Bielefeld; 2015.In this thesis we focus on prototype-based learning techniques, namely three unsuper- vised techniques: generative topographic mapping (GTM), neural gas (NG) and affinity propagation (AP), and two supervised techniques: generalized learning vector quantiza- tion (GLVQ) and robust soft learning vector quantization (RSLVQ). We extend their abilities with respect to the following central aspects: • Applicability on dissimilarity data: Due to the increased complexity of data, in many cases data are only available in form of (dis)similarities which describe the relations between objects. Classical methods can not directly deal with this kind of data. For unsupervised methods this problem has been studied, here we transfer the same idea to the two supervised prototype-based techniques such that they can directly deal with dissimilarities without an explicit embedding into a vector space. • Quadratic complexity issue: For dealing with dissimilarity data, due to the need of the full dissimilarity matrix, the complexity becomes quadratic which is infeasible for large data sets. In this thesis we investigate two linear approximation techniques: Nyström approximation and patch processing, and integrate them into unsupervised and supervised prototype-based techniques. • Reliability of prototype-based classifiers: In practical applications, a relia- bility measure is beneficial for evaluating the classification quality expected by the end users. Here we adopt concepts from conformal prediction (CP), which provides point-wise confidence measure of the prediction, and we combine those with supervised prototype-based techniques. • Model complexity: By means of the confidence values provided by CP, the model complexity can be automatically adjusted by adding new prototypes to cover low confidence data space. • Extendability to semi-supervised problems: Besides its ability to evaluate a classifier, conformal prediction can also be considered as a classifier. This opens a way that supervised techniques can be easily extended for semi-supervised settings by means of a self-training approach

Publications at Bielefeld University

Discriminative dimensionality reduction: variations, applications, interpretations

Author: Schulz Alexander
Publication venue: Universität Bielefeld
Publication date: 01/01/2017
Field of study

Schulz A. Discriminative dimensionality reduction: variations, applications, interpretations. Bielefeld: Universität Bielefeld; 2017.The amount of digital data increases rapidly as a result of advances in information and sensor technology. Because the data sets grow with respect to their size, complexity and dimensionality, they are no longer easily accessible to a human user. The framework of dimensionality reduction addresses this problem by aiming to visualize complex data sets in two dimensions while preserving the relevant structure. While these methods can provide significant insights, the problem formulation of structure preservation is ill-posed in general and can lead to undesired effects. In this thesis, the concept of discriminative dimensionality reduction is investigated as a particular promising way to indicate relevant structure by specifying auxiliary data. The goal is to overcome challenges in data inspection and to investigate in how far discriminative dimensionality reduction methods can yield an improvement. The main scientific contributions are the following: (I) The most popular techniques for discriminative dimensionality reduction are based on the Fisher metric. However, they are restricted in their applicability as concerns complex settings: They can only be employed for fixed data sets, i.e. new data cannot be included in an existing embedding. Only data provided in vectorial representation can be processed. And they are designed for discrete-valued auxiliary data and cannot be applied to real-valued ones. We propose solutions to overcome these challenges. (II) Besides the problem that complex data are not accessible to humans, the same holds for trained machine learning models which often constitute black box models. In order to provide an intuitive interface to such models, we propose a general framework which allows to visualize high-dimensional functions, such as regression or classification functions, in two dimensions. (III) Although nonlinear dimensionality reduction techniques illustrate the structure of the data very well, they suffer from the fact that there is no explicit relationship between the original features and the obtained projection. We propose a methodology to create a connection, thus allowing to understand the importance of the features. (IV) Although linear mappings constitute a very popular tool, a direct interpretation of their weights as feature relevance can be misleading. We propose a methodology which enables a valid interpretation by providing relevance bounds for each feature. (V) The problem of transfer learning without given correspondence information between the source and target space and without labels is particularly challenging. Here, we utilize the structure preserving property of dimensionality reduction methods to transfer knowledge in a latent space given by dimensionality reduction

Publications at Bielefeld University

Exploration of customer churn routes using machine learning probabilistic models

Author: Garcia Gomez David
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2014
Field of study

The ongoing processes of globalization and deregulation are changing the competitive framework in the majority of economic sectors. The appearance of new competitors and technologies entails a sharp increase in competition and a growing preoccupation among service providing companies with creating stronger bonds with customers. Many of these companies are shifting resources away from the goal of capturing new customers and are instead focusing on retaining existing ones. In this context, anticipating the customer¿s intention to abandon, a phenomenon also known as churn, and facilitating the launch of retention-focused actions represent clear elements of competitive advantage. Data mining, as applied to market surveyed information, can provide assistance to churn management processes. In this thesis, we mine real market data for churn analysis, placing a strong emphasis on the applicability and interpretability of the results. Statistical Machine Learning models for simultaneous data clustering and visualization lay the foundations for the analyses, which yield an interpretable segmentation of the surveyed markets. To achieve interpretability, much attention is paid to the intuitive visualization of the experimental results. Given that the modelling techniques under consideration are nonlinear in nature, this represents a non-trivial challenge. Newly developed techniques for data visualization in nonlinear latent models are presented. They are inspired in geographical representation methods and suited to both static and dynamic data representation

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Relational Generative Topographic Mapping

Author: Andrej Gisbrecht
Barbara Hammer
Barreto
Bassam Mokbel
Bauer
Bishop
Boulet
Chen
Cilibrasi
Graepel
Hammer
Hammer
Hasenfuss
Hathaway
Heskes
Kaski
Kaski
Kohonen
Kohonen
Laub
Lee
Mevissen
Mokbel
Olier
Pekalska
Roth
Tino
van der Maaten
Venna
Yin
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Gisbrecht A, Mokbel B, Hammer B. Relational Generative Topographic Mapping. Neurocomputing. 2011;74(9):1359-1371

CiteSeerX

Crossref

Publications at Bielefeld University

Approximation techniques for clustering dissimilarity data

Author: Gisbrecht Andrej
Hammer Barbara
Schleif Frank-Michael
Zhu Xibin
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Zhu X, Gisbrecht A, Schleif F-M, Hammer B. Approximation techniques for clustering dissimilarity data. Neurocomputing. 2012;90:72-84.Recently, diverse high quality prototype-based clustering techniques have been developed which can directly deal with data sets given by general pairwise dissimilarities rather than standard Euclidean vectors. Examples include affinity propagation, relational neural gas, or relational generative topographic mapping. Corresponding to the size of the dissimilarity matrix, these techniques scale quadratically with the size of the training set, such that training becomes prohibitive for large data volumes. In this contribution, we investigate two different linear time approximation techniques, patch processing and the Nystrom approximation. We apply these approximations to several representative clustering techniques for dissimilarities, where possible, and compare the results for diverse data sets. (C) 2012 Elsevier B.V. All rights reserved

Crossref

Publications at Bielefeld University