Search CORE

1,339 research outputs found

Nonparametric Feature Extraction from Dendrograms

Author: Chehreghani Morteza Haghir
Chehreghani Mostafa Haghir
Publication venue
Publication date: 18/11/2019
Field of study

We propose feature extraction from dendrograms in a nonparametric way. The Minimax distance measures correspond to building a dendrogram with single linkage criterion, with defining specific forms of a level function and a distance function over that. Therefore, we extend this method to arbitrary dendrograms. We develop a generalized framework wherein different distance measures can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, to address the model selection problem, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep representations. In the first approach, for example for the clustering problem, we build a graph with positive and negative edge weights according to the consistency of the clustering labels of different objects among different solutions, in the context of ensemble methods. Then, we use an efficient variant of correlation clustering to produce the final clusters. In the second approach, we investigate the sequential combination of different distances and features sequentially in the spirit of multi-layered architectures to obtain the final features. Finally, we demonstrate the effectiveness of our approach via several numerical studies

arXiv.org e-Print Archive

A review of multi-instance learning assumptions

Author: Foulds James Richard
Frank Eibe
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2010
Field of study

Multi-instance (MI) learning is a variant of inductive machine learning, where each learning example contains a bag of instances instead of a single feature vector. The term commonly refers to the supervised setting, where each bag is associated with a label. This type of representation is a natural fit for a number of real-world learning scenarios, including drug activity prediction and image classification, hence many MI learning algorithms have been proposed. Any MI learning method must relate instances to bag-level class labels, but many types of relationships between instances and class labels are possible. Although all early work in MI learning assumes a specific MI concept class known to be appropriate for a drug activity prediction domain; this ‘standard MI assumption’ is not guaranteed to hold in other domains. Much of the recent work in MI learning has concentrated on a relaxed view of the MI problem, where the standard MI assumption is dropped, and alternative assumptions are considered instead. However, often it is not clearly stated what particular assumption is used and how it relates to other assumptions that have been proposed. In this paper, we aim to clarify the use of alternative MI assumptions by reviewing the work done in this area

Research Commons@Waikato

Learning representations from dendrograms

Author: Haghir Chehreghani Morteza
Haghir Chehreghani Mostafa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

We propose unsupervised representation learning and feature extraction from dendrograms. The commonly used Minimax distance measures correspond to building a dendrogram with single linkage criterion, with defining specific forms of a level function and a distance function over that. Therefore, we extend this method to arbitrary dendrograms. We develop a generalized framework wherein different distance measures and representations can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, to address the model selection problem, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep representations. In the first approach, for example for the clustering problem, we build a graph with positive and negative edge weights according to the consistency of the clustering labels of different objects among different solutions, in the context of ensemble methods. Then, we use an efficient variant of correlation clustering to produce the final clusters. In the second approach, we investigate the combination of different distances and features sequentially in the spirit of multi-layered architectures to obtain the final features. Finally, we demonstrate the effectiveness of our approach via several numerical studies

Chalmers Research

Recommended from our members

New Inference Concepts for Analysing Complex Data

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2004
Field of study

The main purpose of this workshop was to assemble international leaders from statistics and machine learning to identify important research problems, to cross-fertilize between the disciplines, and to ultimately start coordinated research efforts toward better solutions. The workshop focused on discussing modern methods for analysis complex high dimensional data with applications to econometrics, finance, biomedicine, genomics etc

Repositorium für Naturwissenschaften und Technik

Risk Bounds for Learning Multiple Components with Permutation-Invariant Losses

Author: Lauer Fabien
Publication venue
Publication date: 01/01/2019
Field of study

This paper proposes a simple approach to derive efficient error bounds for learning multiple components with sparsity-inducing regularization. We show that for such regularization schemes, known decompositions of the Rademacher complexity over the components can be used in a more efficient manner to result in tighter bounds without too much effort. We give examples of application to switching regression and center-based clustering/vector quantization. Then, the complete workflow is illustrated on the problem of subspace clustering, for which decomposition results were not previously available. For all these problems, the proposed approach yields risk bounds with mild dependencies on the number of components and completely removes this dependence for nonconvex regularization schemes that could not be handled by previous methods

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server