Skip to main content
Article thumbnail
Location of Repository

A Statistical Model for Lost Language Decipherment

By Benjamin Snyder, Regina Barzilay and Kevin Knight

Abstract

In this paper we propose a method for the automatic decipherment of lost languages. Given a non-parallel corpus in a known related language, our model produces both alphabetic mappings and translations of words into their corresponding cognates. We employ a non-parametric Bayesian framework to simultaneously capture both low-level character mappings and highlevel morphemic correspondences. This formulation enables us to encode some of the linguistic intuitions that have guided human decipherers. When applied to the ancient Semitic language Ugaritic, the model correctly maps 29 of 30 letters to their Hebrew counterparts, and deduces the correct Hebrew cognate for 60 % of the Ugaritic words which have cognates in Hebrew.

Year: 2010
OAI identifier: oai:CiteSeerX.psu:10.1.1.169.3493
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://people.csail.mit.edu/bs... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.