Search CORE

1,575 research outputs found

Preference learning and similarity learning perspectives on personalized recommendation

Author: LE Duy Dung
Publication venue: Singapore Management University
Publication date: 01/09/2019
Field of study

Institutional Knowledge at Singapore Management University

Ants constructing rule-based classifiers.

Author: Baesens Bart
De Backer Manu
Haesen Raf
Holvoet Tom
Martens David
Publication venue
Publication date
Field of study

Classifiers; Data; Data mining; Studies;

Research Papers in Economics

(Psycho-)Analysis of Benchmark Experiments

Author: Eugster Manuel J. A.
Leisch Friedrich
Strobl Carolin
Publication venue
Publication date: 09/03/2010
Field of study

It is common knowledge that certain characteristics of data sets -- such as linear separability or sample size -- determine the performance of learning algorithms. In this paper we propose a formal framework for investigations on this relationship. The framework combines three, in their respective scientific discipline well-established, methods. Benchmark experiments are the method of choice in machine and statistical learning to compare algorithms with respect to a certain performance measure on particular data sets. To realize the interaction between data sets and algorithms, the data sets are characterized using statistical and information-theoretic measures; a common approach in the field of meta learning to decide which algorithms are suited to particular data sets. Finally, the performance ranking of algorithms on groups of data sets with similar characteristics is determined by means of recursively partitioning Bradley-Terry models, that are commonly used in psychology to study the preferences of human subjects. The result is a tree with splits in data set characteristics which significantly change the performances of the algorithms. The main advantage is the automatic detection of these important characteristics. The framework is introduced using a simple artificial example. Its real-word usage is demonstrated by means of an application example consisting of thirteen well-known data sets and six common learning algorithms. All resources to replicate the examples are available online

Open Access LMU

Essays on distance metric learning

Author: Yang Xiaochen
Publication venue: UCL (University College London)
Publication date: 28/10/2020
Field of study

Many machine learning methods, such as the k-nearest neighbours algorithm, heavily depend on the distance measure between data points. As each task has its own notion of distance, distance metric learning has been proposed. It learns a distance metric to assign a small distance to semantically similar instances and a large distance to dissimilar instances by formulating an optimisation problem. While many loss functions and regularisation terms have been proposed to improve the discrimination and generalisation ability of the learned metric, the metric may be sensitive to a small perturbation in the input space. Moreover, these methods implicitly assume that features are numerical variables and labels are deterministic. However, categorical variables and probabilistic labels are common in real-world applications. This thesis develops three metric learning methods to enhance robustness against input perturbation and applicability for categorical variables and probabilistic labels. In Chapter 3, I identify that many existing methods maximise a margin in the feature space and such margin is insufficient to withstand perturbation in the input space. To address this issue, a new loss function is designed to penalise the input-space margin for being small and hence improve the robustness of the learned metric. In Chapter 4, I propose a metric learning method for categorical data. Classifying categorical data is difficult due to high feature ambiguity, and to this end, the technique of adversarial training is employed. Moreover, the generalisation bound of the proposed method is established, which informs the choice of the regularisation term. In Chapter 5, I adapt a classical probabilistic approach for metric learning to utilise information on probabilistic labels. The loss function is modified for training stability, and new evaluation criteria are suggested to assess the effectiveness of different methods. At the end of this thesis, two publications on hyperspectral target detection are appended as additional work during my PhD

UCL Discovery