1,390 research outputs found
Topological Data Analysis with Bregman Divergences
Given a finite set in a metric space, the topological analysis generalizes
hierarchical clustering using a 1-parameter family of homology groups to
quantify connectivity in all dimensions. The connectivity is compactly
described by the persistence diagram. One limitation of the current framework
is the reliance on metric distances, whereas in many practical applications
objects are compared by non-metric dissimilarity measures. Examples are the
Kullback-Leibler divergence, which is commonly used for comparing text and
images, and the Itakura-Saito divergence, popular for speech and sound. These
are two members of the broad family of dissimilarities called Bregman
divergences.
We show that the framework of topological data analysis can be extended to
general Bregman divergences, widening the scope of possible applications. In
particular, we prove that appropriately generalized Cech and Delaunay (alpha)
complexes capture the correct homotopy type, namely that of the corresponding
union of Bregman balls. Consequently, their filtrations give the correct
persistence diagram, namely the one generated by the uniformly growing Bregman
balls. Moreover, we show that unlike the metric setting, the filtration of
Vietoris-Rips complexes may fail to approximate the persistence diagram. We
propose algorithms to compute the thus generalized Cech, Vietoris-Rips and
Delaunay complexes and experimentally test their efficiency. Lastly, we explain
their surprisingly good performance by making a connection with discrete Morse
theory
Stochastic Discriminative EM
Stochastic discriminative EM (sdEM) is an online-EM-type algorithm for
discriminative training of probabilistic generative models belonging to the
exponential family. In this work, we introduce and justify this algorithm as a
stochastic natural gradient descent method, i.e. a method which accounts for
the information geometry in the parameter space of the statistical model. We
show how this learning algorithm can be used to train probabilistic generative
models by minimizing different discriminative loss functions, such as the
negative conditional log-likelihood and the Hinge loss. The resulting models
trained by sdEM are always generative (i.e. they define a joint probability
distribution) and, in consequence, allows to deal with missing data and latent
variables in a principled way either when being learned or when making
predictions. The performance of this method is illustrated by several text
classification problems for which a multinomial naive Bayes and a latent
Dirichlet allocation based classifier are learned using different
discriminative loss functions.Comment: UAI 2014 paper + Supplementary Material. In Proceedings of the
Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI 2014),
edited by Nevin L. Zhang and Jian Tian. AUAI Pres
Multi-Label Dimensionality Reduction
abstract: Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this thesis. The relationship between CCA and Orthonormalized Partial Least Squares (OPLS) is also investigated. To perform dimensionality reduction efficiently for large-scale problems, two efficient implementations are proposed for a class of dimensionality reduction algorithms, including canonical correlation analysis, orthonormalized partial least squares, linear discriminant analysis, and hypergraph spectral learning. The first approach is a direct least squares approach which allows the use of different regularization penalties, but is applicable under a certain assumption; the second one is a two-stage approach which can be applied in the regularization setting without any assumption. Furthermore, an online implementation for the same class of dimensionality reduction algorithms is proposed when the data comes sequentially. A Matlab toolbox for multi-label dimensionality reduction has been developed and released. The proposed algorithms have been applied successfully in the Drosophila gene expression pattern image annotation. The experimental results on some benchmark data sets in multi-label learning also demonstrate the effectiveness and efficiency of the proposed algorithms.Dissertation/ThesisPh.D. Computer Science 201
Crosslingual Document Embedding as Reduced-Rank Ridge Regression
There has recently been much interest in extending vector-based word
representations to multiple languages, such that words can be compared across
languages. In this paper, we shift the focus from words to documents and
introduce a method for embedding documents written in any language into a
single, language-independent vector space. For training, our approach leverages
a multilingual corpus where the same concept is covered in multiple languages
(but not necessarily via exact translations), such as Wikipedia. Our method,
Cr5 (Crosslingual reduced-rank ridge regression), starts by training a
ridge-regression-based classifier that uses language-specific bag-of-word
features in order to predict the concept that a given document is about. We
show that, when constraining the learned weight matrix to be of low rank, it
can be factored to obtain the desired mappings from language-specific
bags-of-words to language-independent embeddings. As opposed to most prior
methods, which use pretrained monolingual word vectors, postprocess them to
make them crosslingual, and finally average word vectors to obtain document
vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as
document-level. Moreover, since our algorithm uses the singular value
decomposition as its core operation, it is highly scalable. Experiments show
that our method achieves state-of-the-art performance on a crosslingual
document retrieval task. Finally, although not trained for embedding sentences
and words, it also achieves competitive performance on crosslingual sentence
and word retrieval tasks.Comment: In The Twelfth ACM International Conference on Web Search and Data
Mining (WSDM '19
- …