Search CORE

14 research outputs found

Hierarchical Bayesian Nonparametric Models for Power-Law Sequences

Author: Gasthaus Jan Alexander
Publication venue: UCL (University College London)
Publication date: 28/03/2020
Field of study

Sequence data that exhibits power-law behavior in its marginal and conditional distributions arises frequently from natural processes, with natural language text being a prominent example. We study probabilistic models for such sequences based on a hierarchical non-parametric Bayesian prior, develop inference and learning procedures for making these models useful in practice and applicable to large, real-world data sets, and empirically demonstrate their excellent predictive performance. In particular, we consider models based on the infinite-depth variant of the hierarchical Pitman-Yor process (HPYP) language model [Teh, 2006b] known as the Sequence Memoizer, as well as Sequence Memoizer-based cache language models and hybrid models combining the HPYP with neural language models. We empirically demonstrate that these models performwell on languagemodelling and data compression tasks

UCL Discovery

Recommended from our members

Improving PPM with dynamic parameter updates

Author: Ghahramani Zoubin
MacKay David
Steinruecken Christian
Publication venue: Proceedings of the Data Compression Conference 2015
Publication date: 01/01/2015
Field of study

This article makes several improvements to the classic PPM algorithm, resulting in a new algorithm with superior compression effectiveness on human text. The key differences of our algorithm to classic PPM are that (A) rather than the original escape mechanism, we use a generalised blending method with explicit hyper-parameters that control the way symbol counts are combined to form predictions; (B) different hyper-parameters are used for classes of different contexts; and (C) these hyper-parameters are updated dynamically using gradient information. The resulting algorithm (PPM-DP) compresses human text better than all currently published variants of PPM, CTW, DMC, LZ, CSE and BWT, with runtime only slightly slower than classic PPM.This is the accepted manuscript. The final version is available at http://dx.doi.org/10.1109/DCC.2015.77

Apollo (Cambridge)

Neural probabilistic language model for system combination

Author: Okita Tsuyoshi
Publication venue
Publication date: 01/01/2012
Field of study

This paper gives the system description of the neural probabilistic language modeling (NPLM) team of Dublin City University for our participation in the system combination task in the Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-12). We used the information obtained by NPLM as meta information to the system combination module. For the Spanish-English data, our paraphrasing approach achieved 25.81 BLEU points, which lost 0.19 BLEU points absolute compared to the standard confusion network-based system combination. We note that our current usage of NPLM is very limited due to the difficulty in combining NPLM and system combination

CiteSeerX

Irish Universities

DCU Online Research Access Service

Context tree switching

Author: Bowling Michael
Hutter Marcus
Ng Kee Siong
Veness Joel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

This paper describes the Context Tree Switching technique, a modification of Context Tree Weighting for the prediction of binary, stationary, n-Markov sources. By modifying Context Tree Weighting’s recursive weighting scheme, it is possible to mix over a strictly larger class of models without increasing the asymptotic time or space complexity of the original algorithm. We prove that this generalization preserves the desirable theoretical properties of Context Tree Weighting on stationary n-Markov sources, and show empirically that this new technique leads to consistent improvements over Context Tree Weighting as measured on the Calgary Corpus

The Australian National University

Context Tree Switching

Author: Bowling Michael
Hutter Marcus
Ng Kee Siong
Veness Joel
Publication venue
Publication date: 24/02/2016
Field of study

The Australian National University

Fast, Small and Exact: Infinite-order Language Modelling with Compressed Suffix Trees

Author: Cohn Trevor
Haffari Gholamreza
Petri Matthias
Shareghi Ehsan
Publication venue
Publication date: 01/01/2016
Field of study

Efficient methods for storing and querying are critical for scaling high-order n-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily held in memory, while supporting queries needed in computing language model probabilities on-the-fly. We present several optimisations which improve query runtimes up to 2500x, despite only incurring a modest increase in construction time and memory usage. For large corpora and high Markov orders, our method is highly competitive with the state-of-the-art KenLM package. It imposes much lower memory requirements, often by orders of magnitude, and has runtimes that are either similar (for training) or comparable (for querying).Comment: 14 pages in Transactions of the Association for Computational Linguistics (TACL) 201

arXiv.org e-Print Archive

University of Melbourne Institutional Repository

Monash University Research Portal

Semantic representation and compression system for GPS using coresets

Author: Wu Cathy, M. Eng. Massachusetts Institute of Technology
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 76-79).We present a semantic approach for compressing mobile sensor data and focus on GPS streams. Unlike popular text-compression methods, our approach takes advantage of the fact that agents (robotic, personal, or vehicular) perform tasks in a physical space, and the resulting sensor stream usually contains repeated observations of the same locations, actions, or scenes. We model this sensor stream as a Markov process with unobserved states, and our goal is to compute the Hidden Markov Model (HMM) that maximizes the likelihood estimation (MLE) of generating the stream. Our semantic representation and compression system comprises of two main parts: 1) trajectory mapping and 2) trajectory compression. The trajectory mapping stage extracts a semantic representation (topological map) from raw sensor data. Our trajectory compression stage uses a recursive binary search algorithm to take advantage of the information captured by our constructed map. To improve efficiency and scalability, we utilize 2 coresets: we formalize the coreset for 1-segment and apply our system on a small k-segment coreset of the data rather than the original data. The compressed trajectory compresses the original sensor stream and approximates its likelihood up to a provable (1 + E)-multiplicative factor for any candidate Markov model. We conclude with experimental results on data sets from several robots, personal smartphones, and taxicabs. In a robotics experiment of more than 72K points, we show that the size of our compression is smaller by a factor of 650 when compared to the original signal, and by factor of 170 when compared to bzip2. We additionally demonstrate the capability of our system to automatically summarize a personal GPS stream, generate a sketch of a city map, and merge trajectories from multiple taxicabs for a more complete map.by Cathy Wu.M. Eng

DSpace@MIT

Probabilistic machine learning and artificial intelligence.

Author: A Doucet
A Gelman
A Korattikara
A Krizhevsky
A O'Hagan
A Pfeffer
A Pfeffer
A Pfeffer
B Bakker
B De Finetti
B Fischer
B Milch
B Paige
C Freer
C Kemp
C Lu
C Shannon
C Thornton
CE Rasmussen
CE Rasmussen
CE Rasmussen
CM Bishop
CM Bishop
D Koller
D Koller
D Wingate
DE Wolstenholme
DJ Hand
DJ Lunn
DJC MacKay
DM Wolpert
DR Jones
ET Jaynes
F Wood
F Wood
G Hinton
GE Hinton
GF Marcus
H Kushner
H Robbins
I Sutskever
J Bergstra
J Hensman
J Snoek
JB Tenenbaum
JM Hernández-Lobato
JR Lloyd
K Doya
K Miller
KP Murphy
KS Van Horn
L Li
LR Rabiner
M Girolami
M Hoffman
M Jordan
M Medvedovic
M Schmidt
M Welling
MI Jordan
MP Deisenroth
N Goodman
N Hjort
N Houlsby
ND Goodman
ND Goodman
P Diaconis
P Hennig
P Marjoram
P Orbanz
P Poupart
P Sermanet
RB Grosse
RD King
RM Neal
RM Neal
RM Neal
RM Neal
RP Adams
RT Cox
S Deneve
S Russell
S Thrun
SJ Russell
TL Griffiths
TL Griffiths
TP Minka
TP Minka
TS Ferguson
V Mansinghka
WH Jefferys
Y Bengio
YW Teh
Z Ghahramani
Publication venue: 'The Nature Conservancy'
Publication date: 01/05/2015
Field of study

How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.The author acknowledges an EPSRC grant EP/I036575/1, the DARPA PPAML programme, a Google Focused Research Award for the Automatic Statistician and support from Microsoft Research.This is the author accepted manuscript. The final version is available from NPG at http://www.nature.com/nature/journal/v521/n7553/full/nature14541.html#abstract

Crossref

Apollo (Cambridge)