A thesis submitted in partial ful filment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Emotions are biological states of feeling that humans may verbally express to
communicate their negative or positive mood, influence others, or even afflict
harm. Although emotions such as anger, happiness, affection, or fear are
supposedly universal experiences, the lingual realisation of the emotional
experience may vary in subtle ways across different languages. For this reason,
preserving the original sentiment of the source text has always been a challenging
task that draws in a translator's competence and fi nesse. In the professional
translation industry, an incorrect translation of the sentiment-carrying lexicon is
considered a critical error as it can be either misleading or in some cases harmful
since it misses the fundamental aspect of the source text, i.e. the author's
sentiment.
Since the advent of Neural Machine Translation (NMT), there has been a
tremendous improvement in the quality of automatic translation. This has lead to
an extensive use of NMT online tools to translate User-Generated Text (UGT)
such as reviews, tweets, and social media posts, where the main message is often
the author's positive or negative attitude towards an entity. In such scenarios, the
process of translating the user's sentiment is entirely automatic with no human
intervention, neither for post-editing nor for accuracy checking. However, NMT
output still lacks accuracy in some low-resource languages and sometimes makes
critical translation errors that may not only distort the sentiment but at times flips
the polarity of the source text to its exact opposite.
In this thesis, we tackle the translation of sentiment in UGT by NMT systems from two perspectives: analytical and experimental. First, the analytical approach
introduces a list of linguistic features that can lead to a mistranslation of
ne-grained emotions between different language pairs in the UGT domain. It also
presents an error-typology specifi c to Arabic UGT illustrating the main linguistic
phenomena that can cause mistranslation of sentiment polarity when translating
Arabic UGT into English by NMT systems. Second, the experimental approach
attempts to improve the translation of sentiment by addressing some of the
linguistic challenges identifi ed in the analysis as causing mistranslation of
sentiment both on the word-level and on the sentence-level. On the word-level, we
propose a Transformer NMT model trained on a sentiment-oriented vector space
model (VSM) of UGT data that is capable of translating the correct sentiment
polarity of challenging contronyms. On the sentence-level, we propose a
semi-supervised approach to overcome the problem of translating sentiment
expressed by dialectical language in UGT data. We take the translation of
dialectical Arabic UGT into English as a case study. Our semi-supervised AR-EN
NMT model shows improved performance over the online MT Twitter tool in
translating dialectical Arabic UGT not only in terms of translation quality but
also in the preservation of the sentiment polarity of the source text. The
experimental section also presents an empirical method to quantify the notion of
sentiment transfer by an MT system and, more concretely, to modify automatic
metrics such that its MT ranking comes closer to a human judgement of a poor or
good translation of sentiment