1 research outputs found
Hierarchical Attention Transformer Architecture For Syntactic Spell Correction
The attention mechanisms are playing a boosting role in advancements in
sequence-to-sequence problems. Transformer architecture achieved new state of
the art results in machine translation, and it's variants are since being
introduced in several other sequence-to-sequence problems. Problems which
involve a shared vocabulary, can benefit from the similar semantic and
syntactic structure in the source and target sentences. With the motivation of
building a reliable and fast post-processing textual module to assist all the
text-related use cases in mobile phones, we take on the popular spell
correction problem. In this paper, we propose multi encoder-single decoder
variation of conventional transformer. Outputs from the three encoders with
character level 1-gram, 2-grams and 3-grams inputs are attended in hierarchical
fashion in the decoder. The context vectors from the encoders clubbed with
self-attention amplify the n-gram properties at the character level and helps
in accurate decoding. We demonstrate our model on spell correction dataset from
Samsung Research, and report significant improvement of 0.11\%, 0.32\% and
0.69\% in character (CER), word (WER) and sentence (SER) error rates from
existing state-of-the-art machine-translation architectures. Our architecture
is also trains ~7.8 times faster, and is only about 1/3 in size from the next
most accurate model