41,126 research outputs found
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Hierarchical Character-Word Models for Language Identification
Social media messages' brevity and unconventional spelling pose a challenge
to language identification. We introduce a hierarchical model that learns
character and contextualized word-level representations for language
identification. Our method performs well against strong base- lines, and can
also reveal code-switching
Character-Word LSTM Language Models
We present a Character-Word Long Short-Term Memory Language Model which both
reduces the perplexity with respect to a baseline word-level language model and
reduces the number of parameters of the model. Character information can reveal
structural (dis)similarities between words and can even be used when a word is
out-of-vocabulary, thus improving the modeling of infrequent and unknown words.
By concatenating word and character embeddings, we achieve up to 2.77% relative
improvement on English compared to a baseline model with a similar amount of
parameters and 4.57% on Dutch. Moreover, we also outperform baseline word-level
models with a larger number of parameters
- …