4,701 research outputs found
Bethe Projections for Non-Local Inference
Many inference problems in structured prediction are naturally solved by
augmenting a tractable dependency structure with complex, non-local auxiliary
objectives. This includes the mean field family of variational inference
algorithms, soft- or hard-constrained inference using Lagrangian relaxation or
linear programming, collective graphical models, and forms of semi-supervised
learning such as posterior regularization. We present a method to
discriminatively learn broad families of inference objectives, capturing
powerful non-local statistics of the latent variables, while maintaining
tractable and provably fast inference using non-Euclidean projected gradient
descent with a distance-generating function given by the Bethe entropy. We
demonstrate the performance and flexibility of our method by (1) extracting
structured citations from research papers by learning soft global constraints,
(2) achieving state-of-the-art results on a widely-used handwriting recognition
task using a novel learned non-convex inference procedure, and (3) providing a
fast and highly scalable algorithm for the challenging problem of inference in
a collective graphical model applied to bird migration.Comment: minor bug fix to appendix. appeared in UAI 201
A Generative Model of Words and Relationships from Multiple Sources
Neural language models are a powerful tool to embed words into semantic
vector spaces. However, learning such models generally relies on the
availability of abundant and diverse training examples. In highly specialised
domains this requirement may not be met due to difficulties in obtaining a
large corpus, or the limited range of expression in average use. Such domains
may encode prior knowledge about entities in a knowledge base or ontology. We
propose a generative model which integrates evidence from diverse data sources,
enabling the sharing of semantic information. We achieve this by generalising
the concept of co-occurrence from distributional semantics to include other
relationships between entities or words, which we model as affine
transformations on the embedding space. We demonstrate the effectiveness of
this approach by outperforming recent models on a link prediction task and
demonstrating its ability to profit from partially or fully unobserved data
training labels. We further demonstrate the usefulness of learning from
different data sources with overlapping vocabularies.Comment: 8 pages, 5 figures; incorporated feedback from reviewers; to appear
in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence
201
- …