290 research outputs found

    Advances in dissimilarity-based data visualisation

    Get PDF
    Gisbrecht A. Advances in dissimilarity-based data visualisation. Bielefeld: Universitätsbibliothek Bielefeld; 2015

    Γ-stochastic neighbour embedding for feed-forward data visualization

    Get PDF
    t-distributed Stochastic Neighbour Embedding (t-SNE) is one of the most popular nonlinear dimension reduction techniques used in multiple application domains. In this paper we propose a variation on the embedding neighbourhood distribution, resulting in Γ-SNE, which can construct a feed-forward mapping using an RBF network. We compare the visualizations generated by Γ-SNE with those of t-SNE and provide empirical evidence suggesting the network is capable of robust interpolation and automatic weight regularization

    Dissimilarity-based learning for complex data

    Get PDF
    Mokbel B. Dissimilarity-based learning for complex data. Bielefeld: Universität Bielefeld; 2016.Rapid advances of information technology have entailed an ever increasing amount of digital data, which raises the demand for powerful data mining and machine learning tools. Due to modern methods for gathering, preprocessing, and storing information, the collected data become more and more complex: a simple vectorial representation, and comparison in terms of the Euclidean distance is often no longer appropriate to capture relevant aspects in the data. Instead, problem-adapted similarity or dissimilarity measures refer directly to the given encoding scheme, allowing to treat information constituents in a relational manner. This thesis addresses several challenges of complex data sets and their representation in the context of machine learning. The goal is to investigate possible remedies, and propose corresponding improvements of established methods, accompanied by examples from various application domains. The main scientific contributions are the following: (I) Many well-established machine learning techniques are restricted to vectorial input data only. Therefore, we propose the extension of two popular prototype-based clustering and classification algorithms to non-negative symmetric dissimilarity matrices. (II) Some dissimilarity measures incorporate a fine-grained parameterization, which allows to configure the comparison scheme with respect to the given data and the problem at hand. However, finding adequate parameters can be hard or even impossible for human users, due to the intricate effects of parameter changes and the lack of detailed prior knowledge. Therefore, we propose to integrate a metric learning scheme into a dissimilarity-based classifier, which can automatically adapt the parameters of a sequence alignment measure according to the given classification task. (III) A valuable instrument to make complex data sets accessible are dimensionality reduction techniques, which can provide an approximate low-dimensional embedding of the given data set, and, as a special case, a planar map to visualize the data's neighborhood structure. To assess the reliability of such an embedding, we propose the extension of a well-known quality measure to enable a fine-grained, tractable quantitative analysis, which can be integrated into a visualization. This tool can also help to compare different dissimilarity measures (and parameter settings), if ground truth is not available. (IV) All techniques are demonstrated on real-world examples from a variety of application domains, including bioinformatics, motion capturing, music, and education

    Overlap Removal of Dimensionality Reduction Scatterplot Layouts

    Full text link
    Dimensionality Reduction (DR) scatterplot layouts have become a ubiquitous visualization tool for analyzing multidimensional data items with presence in different areas. Despite its popularity, scatterplots suffer from occlusion, especially when markers convey information, making it troublesome for users to estimate items' groups' sizes and, more importantly, potentially obfuscating critical items for the analysis under execution. Different strategies have been devised to address this issue, either producing overlap-free layouts, lacking the powerful capabilities of contemporary DR techniques in uncover interesting data patterns, or eliminating overlaps as a post-processing strategy. Despite the good results of post-processing techniques, the best methods typically expand or distort the scatterplot area, thus reducing markers' size (sometimes) to unreadable dimensions, defeating the purpose of removing overlaps. This paper presents a novel post-processing strategy to remove DR layouts' overlaps that faithfully preserves the original layout's characteristics and markers' sizes. We show that the proposed strategy surpasses the state-of-the-art in overlap removal through an extensive comparative evaluation considering multiple different metrics while it is 2 or 3 orders of magnitude faster for large datasets.Comment: 11 pages and 9 figure

    Studies on dimension reduction and feature spaces :

    Get PDF
    Today's world produces and stores huge amounts of data, which calls for methods that can tackle both growing sizes and growing dimensionalities of data sets. Dimension reduction aims at answering the challenges posed by the latter. Many dimension reduction methods consist of a metric transformation part followed by optimization of a cost function. Several classes of cost functions have been developed and studied, while metrics have received less attention. We promote the view that metrics should be lifted to a more independent role in dimension reduction research. The subject of this work is the interaction of metrics with dimension reduction. The work is built on a series of studies on current topics in dimension reduction and neural network research. Neural networks are used both as a tool and as a target for dimension reduction. When the results of modeling or clustering are represented as a metric, they can be studied using dimension reduction, or they can be used to introduce new properties into a dimension reduction method. We give two examples of such use: visualizing results of hierarchical clustering, and creating supervised variants of existing dimension reduction methods by using a metric that is built on the feature space of a neural network. Combining clustering with dimension reduction results in a novel way for creating space-efficient visualizations, that tell both about hierarchical structure and about distances of clusters. We study feature spaces used in a recently developed neural network architecture called extreme learning machine. We give a novel interpretation for such neural networks, and recognize the need to parameterize extreme learning machines with the variance of network weights. This has practical implications for use of extreme learning machines, since the current practice emphasizes the role of hidden units and ignores the variance. A current trend in the research of deep neural networks is to use cost functions from dimension reduction methods to train the network for supervised dimension reduction. We show that equally good results can be obtained by training a bottlenecked neural network for classification or regression, which is faster than using a dimension reduction cost. We demonstrate that, contrary to the current belief, using sparse distance matrices for creating fast dimension reduction methods is feasible, if a proper balance between short-distance and long-distance entries in the sparse matrix is maintained. This observation opens up a promising research direction, with possibility to use modern dimension reduction methods on much larger data sets than which are manageable today

    Supervised dimension reduction mappings

    Get PDF

    Supervised dimension reduction mappings

    Get PDF

    Compositional generative mapping for tree-structured data - Part II: Topographic projection model

    Get PDF
    We introduce GTM-SD (Generative Topographic Mapping for Structured Data), which is the first compositional generative model for topographic mapping of tree-structured data. GTM-SD exploits a scalable bottom-up hidden-tree Markov model that was introduced in Part I of this paper to achieve a recursive topographic mapping of hierarchical information. The proposed model allows efficient exploitation of contextual information from shared substructures by a recursive upward propagation on the tree structure which distributes substructure information across the topographic map. Compared to its noncompositional generative counterpart, GTM-SD is shown to allow the topographic mapping of the full sample tree, which includes a projection onto the lattice of all the distinct subtrees rooted in each of its nodes. Experimental results show that the continuous projection space generated by the smooth topographic mapping of GTM-SD yields a finer grained discrimination of the sample structures with respect to the state-of-the-art recursive neural network approach

    Efficient Adaptation of Structure Metrics in Prototype-Based Classification

    Get PDF
    Mokbel B, PaaĂźen B, Hammer B. Efficient Adaptation of Structure Metrics in Prototype-Based Classification. In: Wermter S, Weber C, Duch W, et al., eds. Artificial Neural Networks and Machine Learning - ICANN 2014 - 24th International Conference on Artificial Neural Networks, Hamburg, Germany, September 15-19, 2014. Proceedings. Lecture Notes in Computer Science. Vol 8681. Springer; 2014: 571-578.More complex data formats and dedicated structure metrics have spurred the development of intuitive machine learning techniques which directly deal with dissimilarity data, such as relational learning vector quantization (RLVQ). The adjustment of metric parameters like relevance weights for basic structural elements constitutes a crucial issue therein, and first methods to automatically learn metric parameters from given data were proposed recently. In this contribution, we investigate a robust learning scheme to adapt metric parameters such as the scoring matrix in sequence alignment in conjunction with prototype learning, and we investigate the suitability of efficient approximations thereof
    • …
    corecore