14 research outputs found

    Hierarchical Bayesian Nonparametric Models for Power-Law Sequences

    Get PDF
    Sequence data that exhibits power-law behavior in its marginal and conditional distributions arises frequently from natural processes, with natural language text being a prominent example. We study probabilistic models for such sequences based on a hierarchical non-parametric Bayesian prior, develop inference and learning procedures for making these models useful in practice and applicable to large, real-world data sets, and empirically demonstrate their excellent predictive performance. In particular, we consider models based on the infinite-depth variant of the hierarchical Pitman-Yor process (HPYP) language model [Teh, 2006b] known as the Sequence Memoizer, as well as Sequence Memoizer-based cache language models and hybrid models combining the HPYP with neural language models. We empirically demonstrate that these models performwell on languagemodelling and data compression tasks

    Neural probabilistic language model for system combination

    Get PDF
    This paper gives the system description of the neural probabilistic language modeling (NPLM) team of Dublin City University for our participation in the system combination task in the Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-12). We used the information obtained by NPLM as meta information to the system combination module. For the Spanish-English data, our paraphrasing approach achieved 25.81 BLEU points, which lost 0.19 BLEU points absolute compared to the standard confusion network-based system combination. We note that our current usage of NPLM is very limited due to the difficulty in combining NPLM and system combination

    Context tree switching

    No full text
    This paper describes the Context Tree Switching technique, a modification of Context Tree Weighting for the prediction of binary, stationary, n-Markov sources. By modifying Context Tree Weighting’s recursive weighting scheme, it is possible to mix over a strictly larger class of models without increasing the asymptotic time or space complexity of the original algorithm. We prove that this generalization preserves the desirable theoretical properties of Context Tree Weighting on stationary n-Markov sources, and show empirically that this new technique leads to consistent improvements over Context Tree Weighting as measured on the Calgary Corpus

    Fast, Small and Exact: Infinite-order Language Modelling with Compressed Suffix Trees

    Get PDF
    Efficient methods for storing and querying are critical for scaling high-order n-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily held in memory, while supporting queries needed in computing language model probabilities on-the-fly. We present several optimisations which improve query runtimes up to 2500x, despite only incurring a modest increase in construction time and memory usage. For large corpora and high Markov orders, our method is highly competitive with the state-of-the-art KenLM package. It imposes much lower memory requirements, often by orders of magnitude, and has runtimes that are either similar (for training) or comparable (for querying).Comment: 14 pages in Transactions of the Association for Computational Linguistics (TACL) 201

    Semantic representation and compression system for GPS using coresets

    Get PDF
    Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 76-79).We present a semantic approach for compressing mobile sensor data and focus on GPS streams. Unlike popular text-compression methods, our approach takes advantage of the fact that agents (robotic, personal, or vehicular) perform tasks in a physical space, and the resulting sensor stream usually contains repeated observations of the same locations, actions, or scenes. We model this sensor stream as a Markov process with unobserved states, and our goal is to compute the Hidden Markov Model (HMM) that maximizes the likelihood estimation (MLE) of generating the stream. Our semantic representation and compression system comprises of two main parts: 1) trajectory mapping and 2) trajectory compression. The trajectory mapping stage extracts a semantic representation (topological map) from raw sensor data. Our trajectory compression stage uses a recursive binary search algorithm to take advantage of the information captured by our constructed map. To improve efficiency and scalability, we utilize 2 coresets: we formalize the coreset for 1-segment and apply our system on a small k-segment coreset of the data rather than the original data. The compressed trajectory compresses the original sensor stream and approximates its likelihood up to a provable (1 + E)-multiplicative factor for any candidate Markov model. We conclude with experimental results on data sets from several robots, personal smartphones, and taxicabs. In a robotics experiment of more than 72K points, we show that the size of our compression is smaller by a factor of 650 when compared to the original signal, and by factor of 170 when compared to bzip2. We additionally demonstrate the capability of our system to automatically summarize a personal GPS stream, generate a sketch of a city map, and merge trajectories from multiple taxicabs for a more complete map.by Cathy Wu.M. Eng

    Probabilistic machine learning and artificial intelligence.

    Get PDF
    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.The author acknowledges an EPSRC grant EP/I036575/1, the DARPA PPAML programme, a Google Focused Research Award for the Automatic Statistician and support from Microsoft Research.This is the author accepted manuscript. The final version is available from NPG at http://www.nature.com/nature/journal/v521/n7553/full/nature14541.html#abstract
    corecore