2 research outputs found
DeepSubQE: Quality estimation for subtitle translations
Quality estimation (QE) for tasks involving language data is hard owing to
numerous aspects of natural language like variations in paraphrasing, style,
grammar, etc. There can be multiple answers with varying levels of
acceptability depending on the application at hand. In this work, we look at
estimating quality of translations for video subtitles. We show how existing QE
methods are inadequate and propose our method DeepSubQE as a system to estimate
quality of translation given subtitles data for a pair of languages. We rely on
various data augmentation strategies for automated labelling and synthesis for
training. We create a hybrid network which learns semantic and syntactic
features of bilingual data and compare it with only-LSTM and only-CNN networks.
Our proposed network outperforms them by significant margin
A context sensitive real-time Spell Checker with language adaptability
We present a novel language adaptable spell checking system which detects
spelling errors and suggests context sensitive corrections in real-time. We
show that our system can be extended to new languages with minimal
language-specific processing. Available literature majorly discusses spell
checkers for English but there are no publicly available systems which can be
extended to work for other languages out of the box. Most of the systems do not
work in real-time. We explain the process of generating a language's word
dictionary and n-gram probability dictionaries using Wikipedia-articles data
and manually curated video subtitles. We present the results of generating a
list of suggestions for a misspelled word. We also propose three approaches to
create noisy channel datasets of real-world typographic errors. We compare our
system with industry-accepted spell checker tools for 11 languages. Finally, we
show the performance of our system on synthetic datasets for 24 languages.Comment: 7 pages, 6 image