Search CORE

6,098 research outputs found

Maximum Likelihood Learning With Arbitrary Treewidth via Fast-Mixing Parameter Sets

Author: Domke Justin
Publication venue
Publication date: 30/10/2015
Field of study

Inference is typically intractable in high-treewidth undirected graphical models, making maximum likelihood learning a challenge. One way to overcome this is to restrict parameters to a tractable set, most typically the set of tree-structured parameters. This paper explores an alternative notion of a tractable set, namely a set of "fast-mixing parameters" where Markov chain Monte Carlo (MCMC) inference can be guaranteed to quickly converge to the stationary distribution. While it is common in practice to approximate the likelihood gradient using samples obtained from MCMC, such procedures lack theoretical guarantees. This paper proves that for any exponential family with bounded sufficient statistics, (not just graphical models) when parameters are constrained to a fast-mixing set, gradient descent with gradients approximated by sampling will approximate the maximum likelihood solution inside the set with high-probability. When unregularized, to find a solution epsilon-accurate in log-likelihood requires a total amount of effort cubic in 1/epsilon, disregarding logarithmic factors. When ridge-regularized, strong convexity allows a solution epsilon-accurate in parameter distance with effort quadratic in 1/epsilon. Both of these provide of a fully-polynomial time randomized approximation scheme.Comment: Advances in Neural Information Processing Systems 201

arXiv.org e-Print Archive

The Australian National University

Heuristic Ranking in Tightly Coupled Probabilistic Description Logics

Author: Lukasiewicz Thomas
Martinez Maria Vanina
Orsi Giorgio
Simari Gerardo I.
Publication venue
Publication date: 01/01/2012
Field of study

The Semantic Web effort has steadily been gaining traction in the recent years. In particular,Web search companies are recently realizing that their products need to evolve towards having richer semantic search capabilities. Description logics (DLs) have been adopted as the formal underpinnings for Semantic Web languages used in describing ontologies. Reasoning under uncertainty has recently taken a leading role in this arena, given the nature of data found on theWeb. In this paper, we present a probabilistic extension of the DL EL++ (which underlies the OWL2 EL profile) using Markov logic networks (MLNs) as probabilistic semantics. This extension is tightly coupled, meaning that probabilistic annotations in formulas can refer to objects in the ontology. We show that, even though the tightly coupled nature of our language means that many basic operations are data-intractable, we can leverage a sublanguage of MLNs that allows to rank the atomic consequences of an ontology relative to their probability values (called ranking queries) even when these values are not fully computed. We present an anytime algorithm to answer ranking queries, and provide an upper bound on the error that it incurs, as well as a criterion to decide when results are guaranteed to be correct.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Bethe Projections for Non-Local Inference

Author: Belanger David
McCallum Andrew
Sheldon Daniel
Vilnis Luke
Publication venue
Publication date: 28/11/2016
Field of study

Many inference problems in structured prediction are naturally solved by augmenting a tractable dependency structure with complex, non-local auxiliary objectives. This includes the mean field family of variational inference algorithms, soft- or hard-constrained inference using Lagrangian relaxation or linear programming, collective graphical models, and forms of semi-supervised learning such as posterior regularization. We present a method to discriminatively learn broad families of inference objectives, capturing powerful non-local statistics of the latent variables, while maintaining tractable and provably fast inference using non-Euclidean projected gradient descent with a distance-generating function given by the Bethe entropy. We demonstrate the performance and flexibility of our method by (1) extracting structured citations from research papers by learning soft global constraints, (2) achieving state-of-the-art results on a widely-used handwriting recognition task using a novel learned non-convex inference procedure, and (3) providing a fast and highly scalable algorithm for the challenging problem of inference in a collective graphical model applied to bird migration.Comment: minor bug fix to appendix. appeared in UAI 201

arXiv.org e-Print Archive

CiteSeerX

Modeling networks of spiking neurons as interacting processes with memory of variable length

Author: Galves A.
Löcherbach E.
Publication venue
Publication date: 23/02/2015
Field of study

We consider a new class of non Markovian processes with a countable number of interacting components, both in discrete and continuous time. Each component is represented by a point process indicating if it has a spike or not at a given time. The system evolves as follows. For each component, the rate (in continuous time) or the probability (in discrete time) of having a spike depends on the entire time evolution of the system since the last spike time of the component. In discrete time this class of systems extends in a non trivial way both Spitzer's interacting particle systems, which are Markovian, and Rissanen's stochastic chains with memory of variable length which have finite state space. In continuous time they can be seen as a kind of Rissanen's variable length memory version of the class of self-exciting point processes which are also called "Hawkes processes", however with infinitely many components. These features make this class a good candidate to describe the time evolution of networks of spiking neurons. In this article we present a critical reader's guide to recent papers dealing with this class of models, both in discrete and in continuous time. We briefly sketch results concerning perfect simulation and existence issues, de-correlation between successive interspike intervals, the longtime behavior of finite non-excited systems and propagation of chaos in mean field systems

arXiv.org e-Print Archive

Numérisation de Documents Anciens Mathématiques

Recommended from our members

A Stochastic Grammar of Images

Author: Mumford David Bryant
Zhu Song Chun
Publication venue: 'Now Publishers'
Publication date: 12/02/2010
Field of study

This exploratory paper quests for a stochastic and context sensitive grammar of images. The grammar should achieve the following four objectives and thus serves as a unified framework of representation, learning, and recognition for a large number of object categories. (i) The grammar represents both the hierarchical decompositions from scenes, to objects, parts, primitives and pixels by terminal and non-terminal nodes and the contexts for spatial and functional relations by horizontal links between the nodes. It formulates each object category as the set of all possible valid configurations produced by the grammar. (ii) The grammar is embodied in a simple And-Or graph representation where each Or-node points to alternative sub-configurations and an And-node is decomposed into a number of components. This representation supports recursive top-down/bottom-up procedures for image parsing under the Bayesian framework and make it convenient to scale up in complexity. Given an input image, the image parsing task constructs a most probable parse graph on-the-fly as the output interpretation and this parse graph is a subgraph of the And-Or graph after making choice on the Or-nodes. (iii) A probabilistic model is defined on this And-Or graph representation to account for the natural occurrence frequency of objects and parts as well as their relations. This model is learned from a relatively small training set per category and then sampled to synthesize a large number of configurations to cover novel object instances in the test set. This generalization capability is mostly missing in discriminative machine learning methods and can largely improve recognition performance in experiments. (iv) To fill the well-known semantic gap between symbols and raw signals, the grammar includes a series of visual dictionaries and organizes them through graph composition. At the bottom-level the dictionary is a set of image primitives each having a number of anchor points with open bonds to link with other primitives. These primitives can be combined to form larger and larger graph structures for parts and objects. The ambiguities in inferring local primitives shall be resolved through top-down computation using larger structures. Finally these primitives forms a primal sketch representation which will generate the input image with every pixels explained. The proposal grammar integrates three prominent representations in the literature: stochastic grammars for composition, Markov (or graphical) models for contexts, and sparse coding with primitives (wavelets). It also combines the structure-based and appearance based methods in the vision literature. Finally the paper presents three case studies to illustrate the proposed grammar.Mathematic

Harvard University - DASH