33 research outputs found
Structured Prediction of Sequences and Trees using Infinite Contexts
Linguistic structures exhibit a rich array of global phenomena, however
commonly used Markov models are unable to adequately describe these phenomena
due to their strong locality assumptions. We propose a novel hierarchical model
for structured prediction over sequences and trees which exploits global
context by conditioning each generation decision on an unbounded context of
prior decisions. This builds on the success of Markov models but without
imposing a fixed bound in order to better represent global phenomena. To
facilitate learning of this large and unbounded model, we use a hierarchical
Pitman-Yor process prior which provides a recursive form of smoothing. We
propose prediction algorithms based on A* and Markov Chain Monte Carlo
sampling. Empirical results demonstrate the potential of our model compared to
baseline finite-context Markov models on part-of-speech tagging and syntactic
parsing
Comparing Probabilistic Models for Melodic Sequences
Modelling the real world complexity of music is a challenge for machine
learning. We address the task of modeling melodic sequences from the same music
genre. We perform a comparative analysis of two probabilistic models; a
Dirichlet Variable Length Markov Model (Dirichlet-VMM) and a Time Convolutional
Restricted Boltzmann Machine (TC-RBM). We show that the TC-RBM learns
descriptive music features, such as underlying chords and typical melody
transitions and dynamics. We assess the models for future prediction and
compare their performance to a VMM, which is the current state of the art in
melody generation. We show that both models perform significantly better than
the VMM, with the Dirichlet-VMM marginally outperforming the TC-RBM. Finally,
we evaluate the short order statistics of the models, using the
Kullback-Leibler divergence between test sequences and model samples, and show
that our proposed methods match the statistics of the music genre significantly
better than the VMM.Comment: in Proceedings of the ECML-PKDD 2011. Lecture Notes in Computer
Science, vol. 6913, pp. 289-304. Springer (2011
Language Modeling with Power Low Rank Ensembles
We present power low rank ensembles (PLRE), a flexible framework for n-gram
language modeling where ensembles of low rank matrices and tensors are used to
obtain smoothed probability estimates of words in context. Our method can be
understood as a generalization of n-gram modeling to non-integer n, and
includes standard techniques such as absolute discounting and Kneser-Ney
smoothing as special cases. PLRE training is efficient and our approach
outperforms state-of-the-art modified Kneser Ney baselines in terms of
perplexity on large corpora as well as on BLEU score in a downstream machine
translation task
Reified Context Models
A classic tension exists between exact inference in a simple model and
approximate inference in a complex model. The latter offers expressivity and
thus accuracy, but the former provides coverage of the space, an important
property for confidence estimation and learning with indirect supervision. In
this work, we introduce a new approach, reified context models, to reconcile
this tension. Specifically, we let the amount of context (the arity of the
factors in a graphical model) be chosen "at run-time" by reifying it---that is,
letting this choice itself be a random variable inside the model. Empirically,
we show that our approach obtains expressivity and coverage on three natural
language tasks
Neural probabilistic language model for system combination
This paper gives the system description of the neural probabilistic language modeling (NPLM) team of Dublin City University for our participation in the system combination task in the Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-12). We used the information obtained by NPLM as meta information to the system combination module. For the Spanish-English data, our paraphrasing approach achieved 25.81 BLEU points, which lost 0.19 BLEU points absolute compared to the standard confusion network-based system combination. We note that our current usage of NPLM is very limited due to the difficulty in combining NPLM and system combination