7 research outputs found
Structured Prediction of Sequences and Trees using Infinite Contexts
Linguistic structures exhibit a rich array of global phenomena, however
commonly used Markov models are unable to adequately describe these phenomena
due to their strong locality assumptions. We propose a novel hierarchical model
for structured prediction over sequences and trees which exploits global
context by conditioning each generation decision on an unbounded context of
prior decisions. This builds on the success of Markov models but without
imposing a fixed bound in order to better represent global phenomena. To
facilitate learning of this large and unbounded model, we use a hierarchical
Pitman-Yor process prior which provides a recursive form of smoothing. We
propose prediction algorithms based on A* and Markov Chain Monte Carlo
sampling. Empirical results demonstrate the potential of our model compared to
baseline finite-context Markov models on part-of-speech tagging and syntactic
parsing
Recommended from our members
Improving PPM with dynamic parameter updates
This article makes several improvements to the classic PPM algorithm, resulting in a new algorithm with superior compression effectiveness on human text. The key differences of our algorithm to classic PPM are that (A) rather than the original escape mechanism, we use a generalised blending method with explicit hyper-parameters that control the way symbol counts are combined to form predictions; (B) different hyper-parameters are used for classes of different contexts; and (C) these hyper-parameters are updated dynamically using gradient information. The resulting algorithm (PPM-DP) compresses human text better than all currently published variants of PPM, CTW, DMC, LZ, CSE and BWT, with runtime only slightly slower than classic PPM.This is the accepted manuscript. The final version is available at http://dx.doi.org/10.1109/DCC.2015.77
Mining International Political Norms from the GDELT Database
Researchers have long been interested in the role that norms can play in
governing agent actions in multi-agent systems. Much work has been done on
formalising normative concepts from human society and adapting them for the
government of open software systems, and on the simulation of normative
processes in human and artificial societies. However, there has been
comparatively little work on applying normative MAS mechanisms to understanding
the norms in human society.
This work investigates this issue in the context of international politics.
Using the GDELT dataset, containing machine-encoded records of international
events extracted from news reports, we extracted bilateral sequences of
inter-country events and applied a Bayesian norm mining mechanism to identify
norms that best explained the observed behaviour. A statistical evaluation
showed that the normative model fitted the data significantly better than a
probabilistic discrete event model.Comment: 16 pages, 2 figures, pre-print for International Workshop on
Coordination, Organizations, Institutions, Norms and Ethics for Governance of
Multi-Agent Systems (COINE), co-located with AAMAS 202
Generative Non-Markov Models for Information Extraction
Learning from unlabeled data is a long-standing challenge in machine learning. A
principled solution involves modeling the full joint distribution over inputs
and the latent structure of interest, and imputing the missing data via
marginalization. Unfortunately, such marginalization is expensive for most
non-trivial problems, which places practical limits on the expressiveness of
generative models. As a result, joint models often encode strict assumptions
about the underlying process such as fixed-order Markovian assumptions and
employ simple count-based features of the inputs. In contrast, conditional
models, which do not directly model the observed data, are free to incorporate
rich overlapping features of the input in order to predict the latent structure
of interest. It would be desirable to develop expressive generative models that
retain tractable inference. This is the topic of this thesis. In particular, we
explore joint models which relax fixed-order Markov assumptions, and investigate
the use of recurrent neural networks for automatic feature induction in the
generative process.
We focus on two structured prediction problems: (1) imputing labeled segmentions
of input character sequences, and (2) imputing directed spanning trees relating
strings in text corpora. These problems arise in many applications of practical
interest, but we are primarily concerned with named-entity recognition and
cross-document coreference resolution in this work.
For named-entity recognition, we propose a generative model in which the
observed characters originate from a latent non-Markov process over words, and
where the characters are themselves produced via a non-Markov process: a
recurrent neural network (RNN). We propose a sampler for the proposed model in
which sequential Monte Carlo is used as a transition kernel for a Gibbs sampler.
The kernel is amenable to a fast parallel implementation, and results in fast
mixing in practice.
For cross-document coreference resolution, we move beyond sequence modeling to
consider string-to-string transduction. We stipulate a generative process for a
corpus of documents in which entity names arise from copying---and optionally
transforming---previous names of the same entity. Our proposed model is
sensitive to both the context in which the names occur as well as their
spelling. The string-to-string transformations correspond to systematic
linguistic processes such as abbreviation, typos, and nicknaming, and by analogy
to biology, we think of them as mutations along the edges of a phylogeny. We
propose a novel block Gibbs sampler for this problem that alternates between
sampling an ordering of the mentions and a spanning tree relating all mentions
in the corpus
Bayesian nonparametric models of genetic variation
We will develop three new Bayesian nonparametric models for genetic variation. These models are all dynamic-clustering approximations of the ancestral recombination graph (or ARG), a structure that fully describes the genetic history of a population. Due to its complexity, efficient inference for the ARG is not possible. However, different aspects of the ARG can be captured by the approximations discussed in our work. The ARG can be described by a tree valued HMM where the trees vary along the genetic sequence. Many modern models of genetic variation proceed by approximating these trees with (often finite) clusterings. We will consider Bayesian nonparametric priors for the clustering, thereby providing nonparametric generalizations of these models and avoiding problems with model selection and label switching. Further, we will compare the performance of these models on a wide selection of inference problems in genetics such as phasing, imputation, genome wide association and admixture or bottleneck discovery. These experiments should provide a common testing ground on which the different approximations inherent in modern genetic models can be compared. The results of these experiments should shed light on the nature of the approximations and guide future application of these models
Improvements to the sequence memoizer
The sequence memoizer is a model for sequence data with state-of-the-art performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memory-efficient representation, and inference algorithms operating on the new representation. Our derivations are based on precise definitions of the various processes that will also allow us to provide an elementary proof of the "mysterious" coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements
Improvements to the Sequence Memoizer
The sequence memoizer is a model for sequence data with state-of-the-art performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memory-efficient representation, and inference algorithms operating on the new representation. Our derivations are based on precise definitions of the various processes that will also allow us to provide an elementary proof of the “mysterious ” coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements.