Text Representation for Nonconcatenative Morphology

Abstract

The last six years have seen the immense improvement of the NMT in terms of translation quality. With the help of the neural networks, the NMT has been able to achieve the state-of-the-art results in transla- tion quality. However, the NMT is still not able to achieve translation quality near human levels. In this thesis, we propose new approaches to improve the language representation as input to the NMT. This can be achieved by exploiting language specific knowledge, such as phonetic alterations, the morphology, and the syntax. We propose a new approach to improve the language representation by exploiting mor- phological phenomena in Turkish and Hebrew and show that the proposed segmentation approaches can improve translation quality. We have used several different segmentation approaches and compared them with each other. All of the segmentation approaches are rooted in the language specific morphological analysis of Turkish and Hebrew. We have also looked at the effect of the specific segmentation approach on translation quality. We have trained six different models of the type transformer with different seg- mentation approaches and compared them with each other. For each of the segmentation approaches, we have evaluated the translation quality using two automatic metrics and the human evaluation. We have also observed that the segmentation approaches can improve the translation quality in the case of the human evaluation, but not in the case of the automatic metrics. We have emphasized the importance of the human evaluation for NMT, and have shown that the automatic metrics can often be misleading

    Similar works