Article thumbnail
Location of Repository

Postcat - posterior constrained alignment toolkit

By João Graça, Kuzman Ganchev and Ben Taskar

Abstract

In this paper we present a new open-source toolkit for statistical word alignments- Posterior Constrained Alignment Toolkit (PostCAT). �e toolkit implements three well known word alignment algorithms (IBM M1, IBM M2, HMM) as well as six new models. In addition to the usual Viterbi decoding scheme, the toolkit provides posterior decoding with several flavors for tuning the threshold. �e toolkit also provides an implementation of alignment symmetrization heuristics and a set of utilities for analyzing and pretty printing alignments. �e new models have already been shown to improve intrinsic alignment metrics and also to lead to better translations when integrated into a state of the art machine translation system. �e toolkit is developed in Java and available in source at its website . We encourage other researchers to build on our work by modifying the toolkit and using it for their research. 1

Year: 2011
OAI identifier: oai:CiteSeerX.psu:10.1.1.192.6026
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.seas.upenn.edu/%7Ek... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.