Fast and Accurate Misspelling Correction in Large Corpora

Abstract

There are several NLP systems whose ac- curacy depends crucially on finding mis- spellings fast. However, the classical approach is based on a quadratic time algo- rithm with 80% coverage. We present a novel algorithm for misspelling detection, which runs in constant time and improves the coverage to more than 96%. We use this algorithm together with a cross docu- ment coreference system in order to find proper name misspellings. The experiments confirmed significant improvement over the state of the art

    Similar works