183 research outputs found
Bethe Projections for Non-Local Inference
Many inference problems in structured prediction are naturally solved by
augmenting a tractable dependency structure with complex, non-local auxiliary
objectives. This includes the mean field family of variational inference
algorithms, soft- or hard-constrained inference using Lagrangian relaxation or
linear programming, collective graphical models, and forms of semi-supervised
learning such as posterior regularization. We present a method to
discriminatively learn broad families of inference objectives, capturing
powerful non-local statistics of the latent variables, while maintaining
tractable and provably fast inference using non-Euclidean projected gradient
descent with a distance-generating function given by the Bethe entropy. We
demonstrate the performance and flexibility of our method by (1) extracting
structured citations from research papers by learning soft global constraints,
(2) achieving state-of-the-art results on a widely-used handwriting recognition
task using a novel learned non-convex inference procedure, and (3) providing a
fast and highly scalable algorithm for the challenging problem of inference in
a collective graphical model applied to bird migration.Comment: minor bug fix to appendix. appeared in UAI 201
Variational Approaches for Image Labeling on the Assignment Manifold
The image labeling problem refers to the task of assigning to each pixel a single element from a finite predefined set of labels. In classical approaches the labeling task is formulated as a minimization problem of specifically structured objective functions.
Assignment flows for contextual image labeling are a recently proposed alternative formulation via spatially coupled replicator equations.
In this work, the classical and dynamical viewpoint of image labeling are combined into a variational formulation. This is accomplished by following the induced Riemannian gradient descent flow on an elementary statistical manifold with respect to the underlying information geometry.
Convergence and stability behavior of this approach are investigated using the log-barrier method. A novel parameterization of the assignment flow by its dominant component is derived, revealing a Riemannian gradient flow structure that clearly identifies the two governing processes of the flow: spatial regularization of assignments and gradual enforcement of unambiguous label decisions. Also, a continuous-domain formulation of the corresponding potential is presented and well-posedness of the related optimization problem is established. Furthermore, an alternative smooth variational approach to maximum a-posteriori inference based on discrete graphical models is derived by utilizing local Wasserstein distances. Following the resulting Riemannian gradient flow leads to an inference process which always satisfies the local marginalization constraints and incorporates a smooth rounding mechanism towards unambiguous assignments
Non-Convex and Geometric Methods for Tomography and Label Learning
Data labeling is a fundamental problem of mathematical data analysis in which each data point is assigned exactly one single label (prototype) from a finite predefined set. In this thesis we study two challenging extensions, where either the input data cannot be observed directly or prototypes are not available beforehand.
The main application of the first setting is discrete tomography. We propose several non-convex variational as well as smooth geometric approaches to joint image label assignment and reconstruction from indirect measurements with known prototypes. In particular, we consider spatial regularization of assignments, based on the KL-divergence, which takes into account the smooth geometry of discrete probability distributions endowed with the Fisher-Rao (information) metric, i.e. the assignment manifold. Finally, the geometric point of view leads to a smooth flow evolving on a Riemannian submanifold including the tomographic projection constraints directly into the geometry of assignments. Furthermore we investigate corresponding implicit numerical schemes which amount to solving a sequence of convex problems.
Likewise, for the second setting, when the prototypes are absent, we introduce and study a smooth dynamical system for unsupervised data labeling which evolves by geometric integration on the assignment manifold. Rigorously abstracting from ``data-label'' to ``data-data'' decisions leads to interpretable low-rank data representations, which themselves are parameterized by label assignments. The resulting self-assignment flow simultaneously performs learning of latent prototypes in the very same framework while they are used for inference. Moreover, a single parameter, the scale of regularization in terms of spatial context, drives the entire process. By smooth geodesic interpolation between different normalizations of self-assignment matrices on the positive definite matrix manifold, a one-parameter family of self-assignment flows is defined. Accordingly, the proposed approach can be characterized from different viewpoints such as discrete optimal transport, normalized spectral cuts and combinatorial optimization by completely positive factorizations, each with additional built-in spatial regularization
Exponential families on resource-constrained systems
This work is about the estimation of exponential family models on resource-constrained
systems. Our main goal is learning probabilistic models on devices with highly restricted
storage, arithmetic, and computational capabilities—so called, ultra-low-power
devices. Enhancing the learning capabilities of such devices opens up opportunities for
intelligent ubiquitous systems in all areas of life, from medicine, over robotics, to home
automation—to mention just a few. We investigate the inherent resource consumption of
exponential families, review existing techniques, and devise new methods to reduce the
resource consumption. The resource consumption, however, must not be reduced at all
cost. Exponential families possess several desirable properties that must be preserved:
Any probabilistic model encodes a conditional independence structure—our methods
keep this structure intact. Exponential family models are theoretically well-founded.
Instead of merely finding new algorithms based on intuition, our models are formalized
within the framework of exponential families and derived from first principles. We do
not introduce new assumptions which are incompatible with the formal derivation of the
base model, and our methods do not rely on properties of particular high-level applications.
To reduce the memory consumption, we combine and adapt reparametrization
and regularization in an innovative way that facilitates the sparse parametrization of
high-dimensional non-stationary time-series. The procedure allows us to load models in
memory constrained systems, which would otherwise not fit. We provide new theoretical
insights and prove that the uniform distance between the data generating process
and our reparametrized solution is bounded. To reduce the arithmetic complexity of
the learning problem, we derive the integer exponential family, based on the very definition
of sufficient statistics and maximum entropy estimation. New integer-valued
inference and learning algorithms are proposed, based on variational inference, proximal
optimization, and regularization. The benefit of this technique is larger, the weaker
the underlying system is, e.g., the probabilistic inference on a state-of-the-art ultra-lowpower
microcontroller can be accelerated by a factor of 250. While our integer inference
is fast, the underlying message passing relies on the variational principle, which is inexact
and has unbounded error on general graphs. Since exact inference and other existing
methods with bounded error exhibit exponential computational complexity, we employ
near minimax optimal polynomial approximations to yield new stochastic algorithms
for approximating the partition function and the marginal probabilities. Changing the
polynomial degree allows us to control the complexity and the error of our new stochastic
method. We provide an error bound that is parametrized by the number of samples, the
polynomial degree, and the norm of the model’s parameter vector. Moreover, important
intermediate quantities can be precomputed and shared with the weak computational device
to reduce the resource requirement of our method even further. All new techniques
are empirically evaluated on synthetic and real-world data, and the results confirm the
properties which are predicted by our theoretical derivation. Our novel techniques allow
a broader range of models to be learned on resource-constrained systems and imply
several new research possibilities
Learning Theory and Approximation
The main goal of this workshop – the third one of this type at the MFO – has been to blend mathematical results from statistical learning theory and approximation theory to strengthen both disciplines and use synergistic effects to work on current research questions. Learning theory aims at modeling unknown function relations and data structures from samples in an automatic manner. Approximation theory is naturally used for the advancement and closely connected to the further development of learning theory, in particular for the exploration of new useful algorithms, and for the theoretical understanding of existing methods. Conversely, the study of learning theory also gives rise to interesting theoretical problems for approximation theory such as the approximation and sparse representation of functions or the construction of rich kernel reproducing Hilbert spaces on general metric spaces. This workshop has concentrated on the following recent topics: Pitchfork bifurcation of dynamical systems arising from mathematical foundations of cell development; regularized kernel based learning in the Big Data situation; deep learning; convergence rates of learning and online learning algorithms; numerical refinement algorithms to learning; statistical robustness of regularized kernel based learning
Recommended from our members
Deep Energy-Based Models for Structured Prediction
We introduce structured prediction energy networks (SPENs), a flexible frame- work for structured prediction. A deep architecture is used to define an energy func- tion over candidate outputs and predictions are produced by gradient-based energy minimization. This deep energy captures dependencies between labels that would lead to intractable graphical models, and allows us to automatically discover discrim- inative features of the structured output. Furthermore, practitioners can explore a wide variety of energy function architectures without having to hand-design predic- tion and learning methods for each model. This is because all of our prediction and learning methods interact with the energy only via the standard interface for deep networks: forward and back-propagation. In a variety of applications, we find that we can obtain better accuracy using approximate minimization of non-convex deep energy functions than baseline models that employ simple energy functions for which exact minimization is tractable
- …