We present TMop, the first open-source
tool for automatic Translation Memory
(TM) cleaning. The tool implements a
fully unsupervised approach to the task,
which allows spotting unreliable translation
units (sentence pairs in different languages,
which are supposed to be translations
of each other) without requiring
labeled training data. TMop includes a
highly configurable and extensible set of
filters capturing different aspects of translation
quality. It has been evaluated on
a test set composed of 1,000 translation
units (TUs) randomly extracted from the
English-Italian version of MyMemory, a
large-scale public TM. Results indicate its
effectiveness in automatic removing “bad”
TUs, with comparable performance to a
state-of-the-art supervised method (76.3
vs. 77.7 balanced accuracy)