55,566 research outputs found
Twin Learning for Similarity and Clustering: A Unified Kernel Approach
Many similarity-based clustering methods work in two separate steps including
similarity matrix computation and subsequent spectral clustering. However,
similarity measurement is challenging because it is usually impacted by many
factors, e.g., the choice of similarity metric, neighborhood size, scale of
data, noise and outliers. Thus the learned similarity matrix is often not
suitable, let alone optimal, for the subsequent clustering. In addition,
nonlinear similarity often exists in many real world data which, however, has
not been effectively considered by most existing methods. To tackle these two
challenges, we propose a model to simultaneously learn cluster indicator matrix
and similarity information in kernel spaces in a principled way. We show
theoretical relationships to kernel k-means, k-means, and spectral clustering
methods. Then, to address the practical issue of how to select the most
suitable kernel for a particular clustering task, we further extend our model
with a multiple kernel learning ability. With this joint model, we can
automatically accomplish three subtasks of finding the best cluster indicator
matrix, the most accurate similarity relations and the optimal combination of
multiple kernels. By leveraging the interactions between these three subtasks
in a joint framework, each subtask can be iteratively boosted by using the
results of the others towards an overall optimal solution. Extensive
experiments are performed to demonstrate the effectiveness of our method.Comment: Published in AAAI 201
Efficient Optimization of Performance Measures by Classifier Adaptation
In practical applications, machine learning algorithms are often needed to
learn classifiers that optimize domain specific performance measures.
Previously, the research has focused on learning the needed classifier in
isolation, yet learning nonlinear classifier for nonlinear and nonsmooth
performance measures is still hard. In this paper, rather than learning the
needed classifier by optimizing specific performance measure directly, we
circumvent this problem by proposing a novel two-step approach called as CAPO,
namely to first train nonlinear auxiliary classifiers with existing learning
methods, and then to adapt auxiliary classifiers for specific performance
measures. In the first step, auxiliary classifiers can be obtained efficiently
by taking off-the-shelf learning algorithms. For the second step, we show that
the classifier adaptation problem can be reduced to a quadratic program
problem, which is similar to linear SVMperf and can be efficiently solved. By
exploiting nonlinear auxiliary classifiers, CAPO can generate nonlinear
classifier which optimizes a large variety of performance measures including
all the performance measure based on the contingency table and AUC, whilst
keeping high computational efficiency. Empirical studies show that CAPO is
effective and of high computational efficiency, and even it is more efficient
than linear SVMperf.Comment: 30 pages, 5 figures, to appear in IEEE Transactions on Pattern
Analysis and Machine Intelligence, 201
To go deep or wide in learning?
To achieve acceptable performance for AI tasks, one can either use
sophisticated feature extraction methods as the first layer in a two-layered
supervised learning model, or learn the features directly using a deep
(multi-layered) model. While the first approach is very problem-specific, the
second approach has computational overheads in learning multiple layers and
fine-tuning of the model. In this paper, we propose an approach called wide
learning based on arc-cosine kernels, that learns a single layer of infinite
width. We propose exact and inexact learning strategies for wide learning and
show that wide learning with single layer outperforms single layer as well as
deep architectures of finite width for some benchmark datasets.Comment: 9 pages, 1 figure, Accepted for publication in Seventeenth
International Conference on Artificial Intelligence and Statistic
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
- …