Location of Repository

Gatsby Computational

By Frank Wood, Lancelot James, Jan Gasthaus, Cédric Archambeau and Yee Whye Teh

Abstract

Probabilistic models of sequences play a central role in most machine translation, automated speech recognition, lossless compression, spell-checking, and gene identification applications to name but a few. Unfortunately, real-world sequence data often exhibit long range dependencies which can only be captured by computationally challenging, complex models. Sequence data arising from natural processes also often exhibit power-law properties, yet common sequence models do not capture such properties. The sequence memoizer is a new hierarchical Bayesian model for discrete sequence data that captures long range dependencies and power-law characteristics while remaining computationally attractive. Its utility as a language model and general purpose lossless compressor is demonstrated. 1

Year: 2011
OAI identifier: oai:CiteSeerX.psu:10.1.1.194.2666
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.gatsby.ucl.ac.uk/%7... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.