5,017 research outputs found
Dependence Maximizing Temporal Alignment via Squared-Loss Mutual Information
The goal of temporal alignment is to establish time correspondence between
two sequences, which has many applications in a variety of areas such as speech
processing, bioinformatics, computer vision, and computer graphics. In this
paper, we propose a novel temporal alignment method called least-squares
dynamic time warping (LSDTW). LSDTW finds an alignment that maximizes
statistical dependency between sequences, measured by a squared-loss variant of
mutual information. The benefit of this novel information-theoretic formulation
is that LSDTW can align sequences with different lengths, different
dimensionality, high non-linearity, and non-Gaussianity in a computationally
efficient manner. In addition, model parameters such as an initial alignment
matrix can be systematically optimized by cross-validation. We demonstrate the
usefulness of LSDTW through experiments on synthetic and real-world Kinect
action recognition datasets.Comment: 11 page
Deep Matching Autoencoders
Increasingly many real world tasks involve data in multiple modalities or
views. This has motivated the development of many effective algorithms for
learning a common latent space to relate multiple domains. However, most
existing cross-view learning algorithms assume access to paired data for
training. Their applicability is thus limited as the paired data assumption is
often violated in practice: many tasks have only a small subset of data
available with pairing annotation, or even no paired data at all. In this paper
we introduce Deep Matching Autoencoders (DMAE), which learn a common latent
space and pairing from unpaired multi-modal data. Specifically we formulate
this as a cross-domain representation learning and object matching problem. We
simultaneously optimise parameters of representation learning auto-encoders and
the pairing of unpaired multi-modal data. This framework elegantly spans the
full regime from fully supervised, semi-supervised, and unsupervised (no paired
data) multi-modal learning. We show promising results in image captioning, and
on a new task that is uniquely enabled by our methodology: unsupervised
classifier learning.Comment: 10 page
Similarity of Neural Network Representations Revisited
Recent work has sought to understand the behavior of neural networks by
comparing representations between layers and between different trained models.
We examine methods for comparing neural network representations based on
canonical correlation analysis (CCA). We show that CCA belongs to a family of
statistics for measuring multivariate similarity, but that neither CCA nor any
other statistic that is invariant to invertible linear transformation can
measure meaningful similarities between representations of higher dimension
than the number of data points. We introduce a similarity index that measures
the relationship between representational similarity matrices and does not
suffer from this limitation. This similarity index is equivalent to centered
kernel alignment (CKA) and is also closely connected to CCA. Unlike CCA, CKA
can reliably identify correspondences between representations in networks
trained from different initializations.Comment: ICML 201
Sufficient Component Analysis for Supervised Dimension Reduction
The purpose of sufficient dimension reduction (SDR) is to find the
low-dimensional subspace of input features that is sufficient for predicting
output values. In this paper, we propose a novel distribution-free SDR method
called sufficient component analysis (SCA), which is computationally more
efficient than existing methods. In our method, a solution is computed by
iteratively performing dependence estimation and maximization: Dependence
estimation is analytically carried out by recently-proposed least-squares
mutual information (LSMI), and dependence maximization is also analytically
carried out by utilizing the Epanechnikov kernel. Through large-scale
experiments on real-world image classification and audio tagging problems, the
proposed method is shown to compare favorably with existing dimension reduction
approaches
Multiple-antenna fading coherent channels with arbitrary inputs: Characterization and optimization of the reliable information transmission rate
We investigate the constrained capacity of multiple-antenna fading coherent
channels, where the receiver knows the channel state but the transmitter knows
only the channel distribution, driven by arbitrary equiprobable discrete inputs
in a regime of high signal-to-noise ratio (). In particular, we
capitalize on intersections between information theory and estimation theory to
conceive expansions to the average minimum-mean squared error (MMSE) and the
average mutual information, which leads to an expansion of the constrained
capacity, that capture well their behavior in the asymptotic regime of high
. We use the expansions to study the constrained capacity of various
multiple-antenna fading coherent channels, including Rayleigh fading models,
Ricean fading models and antenna-correlated models. The analysis unveils in
detail the impact of the number of transmit and receive antennas, transmit and
receive antenna correlation, line-of-sight components and the geometry of the
signalling scheme on the reliable information transmission rate. We also use
the expansions to design key system elements, such as power allocation and
precoding schemes, as well as to design space-time signalling schemes for
multiple-antenna fading coherent channels. Simulations results demonstrate that
the expansions lead to very sharp designs
Global Sensitivity Analysis with Dependence Measures
Global sensitivity analysis with variance-based measures suffers from several
theoretical and practical limitations, since they focus only on the variance of
the output and handle multivariate variables in a limited way. In this paper,
we introduce a new class of sensitivity indices based on dependence measures
which overcomes these insufficiencies. Our approach originates from the idea to
compare the output distribution with its conditional counterpart when one of
the input variables is fixed. We establish that this comparison yields
previously proposed indices when it is performed with Csiszar f-divergences, as
well as sensitivity indices which are well-known dependence measures between
random variables. This leads us to investigate completely new sensitivity
indices based on recent state-of-the-art dependence measures, such as distance
correlation and the Hilbert-Schmidt independence criterion. We also emphasize
the potential of feature selection techniques relying on such dependence
measures as alternatives to screening in high dimension
Feature ranking for multi-label classification using Markov Networks
We propose a simple and efficient method for ranking features in multi-label
classification. The method produces a ranking of features showing their
relevance in predicting labels, which in turn allows to choose a final subset
of features. The procedure is based on Markov Networks and allows to model the
dependencies between labels and features in a direct way. In the first step we
build a simple network using only labels and then we test how much adding a
single feature affects the initial network. More specifically, in the first
step we use the Ising model whereas the second step is based on the score
statistic, which allows to test a significance of added features very quickly.
The proposed approach does not require transformation of label space, gives
interpretable results and allows for attractive visualization of dependency
structure. We give a theoretical justification of the procedure by discussing
some theoretical properties of the Ising model and the score statistic. We also
discuss feature ranking procedure based on fitting Ising model using
regularized logistic regressions. Numerical experiments show that the proposed
methods outperform the conventional approaches on the considered artificial and
real datasets
Probabilistic CCA with Implicit Distributions
Canonical Correlation Analysis (CCA) is a classic technique for multi-view
data analysis. To overcome the deficiency of linear correlation in practical
multi-view learning tasks, various CCA variants were proposed to capture
nonlinear dependency. However, it is non-trivial to have an in-principle
understanding of these variants due to their inherent restrictive assumption on
the data and latent code distributions. Although some works have studied
probabilistic interpretation for CCA, these models still require the explicit
form of the distributions to achieve a tractable solution for the inference. In
this work, we study probabilistic interpretation for CCA based on implicit
distributions. We present Conditional Mutual Information (CMI) as a new
criterion for CCA to consider both linear and nonlinear dependency for
arbitrarily distributed data. To eliminate direct estimation for CMI, in which
explicit form of the distributions is still required, we derive an objective
which can provide an estimation for CMI with efficient inference methods. To
facilitate Bayesian inference of multi-view analysis, we propose Adversarial
CCA (ACCA), which achieves consistent encoding for multi-view data with the
consistent constraint imposed on the marginalization of the implicit
posteriors. Such a model would achieve superiority in the alignment of the
multi-view data with implicit distributions. It is interesting to note that
most of the existing CCA variants can be connected with our proposed CCA model
by assigning specific form for the posterior and likelihood distributions.
Extensive experiments on nonlinear correlation analysis and cross-view
generation on benchmark and real-world datasets demonstrate the superiority of
our model.Comment: 23 pages, 9 Figures; Keywords: Multi-view Learning, Nonlinear
Dependency, Deep Generative model
Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
Cross-lingual word embeddings (CLWE) are often evaluated on bilingual lexicon
induction (BLI). Recent CLWE methods use linear projections, which underfit the
training dictionary, to generalize on BLI. However, underfitting can hinder
generalization to other downstream tasks that rely on words from the training
dictionary. We address this limitation by retrofitting CLWE to the training
dictionary, which pulls training translation pairs closer in the embedding
space and overfits the training dictionary. This simple post-processing step
often improves accuracy on two downstream tasks, despite lowering BLI test
accuracy. We also retrofit to both the training dictionary and a synthetic
dictionary induced from CLWE, which sometimes generalizes even better on
downstream tasks. Our results confirm the importance of fully exploiting
training dictionary in downstream tasks and explains why BLI is a flawed CLWE
evaluation.Comment: ACL 202
Multi-view Alignment and Generation in CCA via Consistent Latent Encoding
Multi-view alignment, achieving one-to-one correspondence of multi-view
inputs, is critical in many real-world multi-view applications, especially for
cross-view data analysis problems. Recently, an increasing number of works
study this alignment problem with Canonical Correlation Analysis (CCA).
However, existing CCA models are prone to misalign the multiple views due to
either the neglect of uncertainty or the inconsistent encoding of the multiple
views. To tackle these two issues, this paper studies multi-view alignment from
the Bayesian perspective. Delving into the impairments of inconsistent
encodings, we propose to recover correspondence of the multi-view inputs by
matching the marginalization of the joint distribution of multi-view random
variables under different forms of factorization. To realize our design, we
present Adversarial CCA (ACCA) which achieves consistent latent encodings by
matching the marginalized latent encodings through the adversarial training
paradigm. Our analysis based on conditional mutual information reveals that
ACCA is flexible for handling implicit distributions. Extensive experiments on
correlation analysis and cross-view generation under noisy input settings
demonstrate the superiority of our model.Comment: 37 pages, 22 figure
- …