170,045 research outputs found
Task-Driven Dictionary Learning
Modeling data with linear combinations of a few elements from a learned
dictionary has been the focus of much recent research in machine learning,
neuroscience and signal processing. For signals such as natural images that
admit such sparse representations, it is now well established that these models
are well suited to restoration tasks. In this context, learning the dictionary
amounts to solving a large-scale matrix factorization problem, which can be
done efficiently with classical optimization tools. The same approach has also
been used for learning features from data for other purposes, e.g., image
classification, but tuning the dictionary in a supervised way for these tasks
has proven to be more difficult. In this paper, we present a general
formulation for supervised dictionary learning adapted to a wide variety of
tasks, and present an efficient algorithm for solving the corresponding
optimization problem. Experiments on handwritten digit classification, digital
art identification, nonlinear inverse image problems, and compressed sensing
demonstrate that our approach is effective in large-scale settings, and is well
suited to supervised and semi-supervised classification, as well as regression
tasks for data that admit sparse representations.Comment: final draft post-refereein
Hierarchically-coupled hidden Markov models for learning kinetic rates from single-molecule data
We address the problem of analyzing sets of noisy time-varying signals that
all report on the same process but confound straightforward analyses due to
complex inter-signal heterogeneities and measurement artifacts. In particular
we consider single-molecule experiments which indirectly measure the distinct
steps in a biomolecular process via observations of noisy time-dependent
signals such as a fluorescence intensity or bead position. Straightforward
hidden Markov model (HMM) analyses attempt to characterize such processes in
terms of a set of conformational states, the transitions that can occur between
these states, and the associated rates at which those transitions occur; but
require ad-hoc post-processing steps to combine multiple signals. Here we
develop a hierarchically coupled HMM that allows experimentalists to deal with
inter-signal variability in a principled and automatic way. Our approach is a
generalized expectation maximization hyperparameter point estimation procedure
with variational Bayes at the level of individual time series that learns an
single interpretable representation of the overall data generating process.Comment: 9 pages, 5 figure
Estimation of pulse heights and arrival times
The problem is studied of estimating the arrival times and heights of pulses of known shape observed with white additive noise. The main difficulty is estimating the number of pulses. When a maximum likelihood formulation is employed for the estimation problem, difficulties similar to the problem of estimating the order of an unknown system arise. The problem may be overcome using Rissanen's shortest data description approach. An estimation algorithm is described, and its consistency is proved. The results are illustrated by a simulation study using an example from seismic data processing also studied by Mendel
Stochastic Analysis of the LMS Algorithm for System Identification with Subspace Inputs
This paper studies the behavior of the low rank LMS adaptive algorithm for the general case in which the input transformation may not capture the exact input subspace. It is shown that the Independence Theory and the independent additive noise model are not applicable to this case. A new theoretical model for the weight mean and fluctuation behaviors is developed which incorporates the correlation between successive data vectors (as opposed to the Independence Theory model). The new theory is applied to a network echo cancellation scheme which uses partial-Haar input vector transformations. Comparison of the new model predictions with Monte Carlo simulations shows good-to-excellent agreement, certainly much better than predicted by the Independence Theory based model available in the literature
Data-Dependent Stability of Stochastic Gradient Descent
We establish a data-dependent notion of algorithmic stability for Stochastic
Gradient Descent (SGD), and employ it to develop novel generalization bounds.
This is in contrast to previous distribution-free algorithmic stability results
for SGD which depend on the worst-case constants. By virtue of the
data-dependent argument, our bounds provide new insights into learning with SGD
on convex and non-convex problems. In the convex case, we show that the bound
on the generalization error depends on the risk at the initialization point. In
the non-convex case, we prove that the expected curvature of the objective
function around the initialization point has crucial influence on the
generalization error. In both cases, our results suggest a simple data-driven
strategy to stabilize SGD by pre-screening its initialization. As a corollary,
our results allow us to show optimistic generalization bounds that exhibit fast
convergence rates for SGD subject to a vanishing empirical risk and low noise
of stochastic gradient
Hierarchical Decomposition of Nonlinear Dynamics and Control for System Identification and Policy Distillation
The control of nonlinear dynamical systems remains a major challenge for
autonomous agents. Current trends in reinforcement learning (RL) focus on
complex representations of dynamics and policies, which have yielded impressive
results in solving a variety of hard control tasks. However, this new
sophistication and extremely over-parameterized models have come with the cost
of an overall reduction in our ability to interpret the resulting policies. In
this paper, we take inspiration from the control community and apply the
principles of hybrid switching systems in order to break down complex dynamics
into simpler components. We exploit the rich representational power of
probabilistic graphical models and derive an expectation-maximization (EM)
algorithm for learning a sequence model to capture the temporal structure of
the data and automatically decompose nonlinear dynamics into stochastic
switching linear dynamical systems. Moreover, we show how this framework of
switching models enables extracting hierarchies of Markovian and
auto-regressive locally linear controllers from nonlinear experts in an
imitation learning scenario.Comment: 2nd Annual Conference on Learning for Dynamics and Contro
- …