17,792 research outputs found
Wasserstein Barycenter Model Ensembling
In this paper we propose to perform model ensembling in a multiclass or a
multilabel learning setting using Wasserstein (W.) barycenters. Optimal
transport metrics, such as the Wasserstein distance, allow incorporating
semantic side information such as word embeddings. Using W. barycenters to find
the consensus between models allows us to balance confidence and semantics in
finding the agreement between the models. We show applications of Wasserstein
ensembling in attribute-based classification, multilabel learning and image
captioning generation. These results show that the W. ensembling is a viable
alternative to the basic geometric or arithmetic mean ensembling.Comment: ICLR 201
Effective learning is accompanied by high dimensional and efficient representations of neural activity
A fundamental cognitive process is the ability to map value and identity onto
objects as we learn about them. Exactly how such mental constructs emerge and
what kind of space best embeds this mapping remains incompletely understood.
Here we develop tools to quantify the space and organization of such a mapping,
thereby providing a framework for studying the geometric representations of
neural responses as reflected in functional MRI. Considering how human subjects
learn the values of novel objects, we show that quick learners have a higher
dimensional geometric representation than slow learners, and hence more easily
distinguishable whole-brain responses to objects of different value.
Furthermore, we find that quick learners display a more compact embedding of
their neural responses and hence have a higher ratio of their task-based
dimension to their embedding dimension -- consistent with a greater efficiency
of cognitive coding. Lastly, we investigate the neurophysiological drivers of
high dimensional patterns at both regional and voxel levels, and we complete
our study with a complementary test of the distinguishability of associated
whole-brain responses. Our results demonstrate a spatial organization of neural
responses characteristic of learning, and offer a suite of geometric measures
applicable to the study of efficient coding in higher-order cognitive processes
more broadly
Differential geometric regularization for supervised learning of classifiers
We study the problem of supervised learning for both binary and multiclass classification from a unified geometric perspective. In particular, we propose a geometric regularization technique to find the submanifold corresponding to an estimator of the class probability P(y|\vec x). The regularization term measures the volume of this submanifold, based on the intuition that overfitting produces rapid local oscillations and hence large volume of the estimator. This technique can be applied to regularize any classification function that satisfies two requirements: firstly, an estimator of the class probability can be obtained; secondly, first and second derivatives of the class probability estimator can be calculated. In experiments, we apply our regularization technique to standard loss functions for classification, our RBF-based implementation compares favorably to widely used regularization methods for both binary and multiclass classification.http://proceedings.mlr.press/v48/baia16.pdfPublished versio
Curvature-based Comparison of Two Neural Networks
In this paper we show the similarities and differences of two deep neural
networks by comparing the manifolds composed of activation vectors in each
fully connected layer of them. The main contribution of this paper includes 1)
a new data generating algorithm which is crucial for determining the dimension
of manifolds; 2) a systematic strategy to compare manifolds. Especially, we
take Riemann curvature and sectional curvature as part of criterion, which can
reflect the intrinsic geometric properties of manifolds. Some interesting
results and phenomenon are given, which help in specifying the similarities and
differences between the features extracted by two networks and demystifying the
intrinsic mechanism of deep neural networks
Geometric Losses for Distributional Learning
Building upon recent advances in entropy-regularized optimal transport, and
upon Fenchel duality between measures and continuous functions , we propose a
generalization of the logistic loss that incorporates a metric or cost between
classes. Unlike previous attempts to use optimal transport distances for
learning, our loss results in unconstrained convex objective functions,
supports infinite (or very large) class spaces, and naturally defines a
geometric generalization of the softmax operator. The geometric properties of
this loss make it suitable for predicting sparse and singular distributions,
for instance supported on curves or hyper-surfaces. We study the theoretical
properties of our loss and show-case its effectiveness on two applications:
ordinal regression and drawing generation
How Generative Adversarial Networks and Their Variants Work: An Overview
Generative Adversarial Networks (GAN) have received wide attention in the
machine learning field for their potential to learn high-dimensional, complex
real data distribution. Specifically, they do not rely on any assumptions about
the distribution and can generate real-like samples from latent space in a
simple manner. This powerful property leads GAN to be applied to various
applications such as image synthesis, image attribute editing, image
translation, domain adaptation and other academic fields. In this paper, we aim
to discuss the details of GAN for those readers who are familiar with, but do
not comprehend GAN deeply or who wish to view GAN from various perspectives. In
addition, we explain how GAN operates and the fundamental meaning of various
objective functions that have been suggested recently. We then focus on how the
GAN can be combined with an autoencoder framework. Finally, we enumerate the
GAN variants that are applied to various tasks and other fields for those who
are interested in exploiting GAN for their research.Comment: 41 pages, 16 figures, Published in ACM Computing Surveys (CSUR
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
Few prior works study deep learning on point sets. PointNet by Qi et al. is a
pioneer in this direction. However, by design PointNet does not capture local
structures induced by the metric space points live in, limiting its ability to
recognize fine-grained patterns and generalizability to complex scenes. In this
work, we introduce a hierarchical neural network that applies PointNet
recursively on a nested partitioning of the input point set. By exploiting
metric space distances, our network is able to learn local features with
increasing contextual scales. With further observation that point sets are
usually sampled with varying densities, which results in greatly decreased
performance for networks trained on uniform densities, we propose novel set
learning layers to adaptively combine features from multiple scales.
Experiments show that our network called PointNet++ is able to learn deep point
set features efficiently and robustly. In particular, results significantly
better than state-of-the-art have been obtained on challenging benchmarks of 3D
point clouds
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
Hyperspectral Image Classification and Clutter Detection via Multiple Structural Embeddings and Dimension Reductions
We present a new and effective approach for Hyperspectral Image (HSI)
classification and clutter detection, overcoming a few long-standing challenges
presented by HSI data characteristics. Residing in a high-dimensional spectral
attribute space, HSI data samples are known to be strongly correlated in their
spectral signatures, exhibit nonlinear structure due to several physical laws,
and contain uncertainty and noise from multiple sources. In the presented
approach, we generate an adaptive, structurally enriched representation
environment, and employ the locally linear embedding (LLE) in it. There are two
structure layers external to LLE. One is feature space embedding: the HSI data
attributes are embedded into a discriminatory feature space where
spatio-spectral coherence and distinctive structures are distilled and
exploited to mitigate various difficulties encountered in the native
hyperspectral attribute space. The other structure layer encloses the ranges of
algorithmic parameters for LLE and feature embedding, and supports a
multiplexing and integrating scheme for contending with multi-source
uncertainty. Experiments on two commonly used HSI datasets with a small number
of learning samples have rendered remarkably high-accuracy classification
results, as well as distinctive maps of detected clutter regions.Comment: 13 pages, 6 figures (30 images), submitted to International
Conference on Computer Vision (ICCV) 201
Learning Rank Functionals: An Empirical Study
Ranking is a key aspect of many applications, such as information retrieval,
question answering, ad placement and recommender systems. Learning to rank has
the goal of estimating a ranking model automatically from training data. In
practical settings, the task often reduces to estimating a rank functional of
an object with respect to a query. In this paper, we investigate key issues in
designing an effective learning to rank algorithm. These include data
representation, the choice of rank functionals, the design of the loss function
so that it is correlated with the rank metrics used in evaluation. For the loss
function, we study three techniques: approximating the rank metric by a smooth
function, decomposition of the loss into a weighted sum of element-wise losses
and into a weighted sum of pairwise losses. We then present derivations of
piecewise losses using the theory of high-order Markov chains and Markov random
fields. In experiments, we evaluate these design aspects on two tasks: answer
ranking in a Social Question Answering site, and Web Information Retrieval
- …