1,280 research outputs found
Learning detectors quickly using structured covariance matrices
Computer vision is increasingly becoming interested in the rapid estimation
of object detectors. Canonical hard negative mining strategies are slow as they
require multiple passes of the large negative training set. Recent work has
demonstrated that if the distribution of negative examples is assumed to be
stationary, then Linear Discriminant Analysis (LDA) can learn comparable
detectors without ever revisiting the negative set. Even with this insight,
however, the time to learn a single object detector can still be on the order
of tens of seconds on a modern desktop computer. This paper proposes to
leverage the resulting structured covariance matrix to obtain detectors with
identical performance in orders of magnitude less time and memory. We elucidate
an important connection to the correlation filter literature, demonstrating
that these can also be trained without ever revisiting the negative set
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Fast and accurate modelling of longitudinal and repeated measures neuroimaging data
Despite the growing importance of longitudinal data in neuroimaging, the standard analysis methods make restrictive or unrealistic assumptions (e.g., assumption of Compound Symmetry—the state of all equal variances and equal correlations—or spatially homogeneous longitudinal correlations). While some new methods have been proposed to more accurately account for such data, these methods are based on iterative algorithms that are slow and failure-prone. In this article, we propose the use of the Sandwich Estimator (SwE) method which first estimates the parameters of interest with a simple Ordinary Least Square model and second estimates variances/covariances with the “so-called” SwE which accounts for the within-subject correlation existing in longitudinal data. Here, we introduce the SwE method in its classic form, and we review and propose several adjustments to improve its behaviour, specifically in small samples. We use intensive Monte Carlo simulations to compare all considered adjustments and isolate the best combination for neuroimaging data. We also compare the SwE method to other popular methods and demonstrate its strengths and weaknesses. Finally, we analyse a highly unbalanced longitudinal dataset from the Alzheimer's Disease Neuroimaging Initiative and demonstrate the flexibility of the SwE method to fit within- and between-subject effects in a single model. Software implementing this SwE method has been made freely available at http://warwick.ac.uk/tenichols/SwE
Inverse optimal transport
Discrete optimal transportation problems arise in various contexts in
engineering, the sciences and the social sciences. Often the underlying cost
criterion is unknown, or only partly known, and the observed optimal solutions
are corrupted by noise. In this paper we propose a systematic approach to infer
unknown costs from noisy observations of optimal transportation plans. The
algorithm requires only the ability to solve the forward optimal transport
problem, which is a linear program, and to generate random numbers. It has a
Bayesian interpretation, and may also be viewed as a form of stochastic
optimization.
We illustrate the developed methodologies using the example of international
migration flows. Reported migration flow data captures (noisily) the number of
individuals moving from one country to another in a given period of time. It
can be interpreted as a noisy observation of an optimal transportation map,
with costs related to the geographical position of countries. We use a
graph-based formulation of the problem, with countries at the nodes of graphs
and non-zero weighted adjacencies only on edges between countries which share a
border. We use the proposed algorithm to estimate the weights, which represent
cost of transition, and to quantify uncertainty in these weights
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Subsequence clustering of multivariate time series is a useful tool for
discovering repeated patterns in temporal data. Once these patterns have been
discovered, seemingly complicated datasets can be interpreted as a temporal
sequence of only a small number of states, or clusters. For example, raw sensor
data from a fitness-tracking application can be expressed as a timeline of a
select few actions (i.e., walking, sitting, running). However, discovering
these patterns is challenging because it requires simultaneous segmentation and
clustering of the time series. Furthermore, interpreting the resulting clusters
is difficult, especially when the data is high-dimensional. Here we propose a
new method of model-based clustering, which we call Toeplitz Inverse
Covariance-based Clustering (TICC). Each cluster in the TICC method is defined
by a correlation network, or Markov random field (MRF), characterizing the
interdependencies between different observations in a typical subsequence of
that cluster. Based on this graphical representation, TICC simultaneously
segments and clusters the time series data. We solve the TICC problem through
alternating minimization, using a variation of the expectation maximization
(EM) algorithm. We derive closed-form solutions to efficiently solve the two
resulting subproblems in a scalable way, through dynamic programming and the
alternating direction method of multipliers (ADMM), respectively. We validate
our approach by comparing TICC to several state-of-the-art baselines in a
series of synthetic experiments, and we then demonstrate on an automobile
sensor dataset how TICC can be used to learn interpretable clusters in
real-world scenarios.Comment: This revised version fixes two small typos in the published versio
- …