Search CORE

4 research outputs found

Char-RNN and Active Learning for Hashtag Segmentation

Author: Artemova Ekaterina
Glushkova Taisiya
Publication venue
Publication date: 08/11/2019
Field of study

We explore the abilities of character recurrent neural network (char-RNN) for hashtag segmentation. Our approach to the task is the following: we generate synthetic training dataset according to frequent n-grams that satisfy predefined morpho-syntactic patterns to avoid any manual annotation. The active learning strategy limits the training dataset and selects informative training subset. The approach does not require any language-specific settings and is compared for two languages, which differ in inflection degree.Comment: to appear in Cicling201

arXiv.org e-Print Archive

Incorporating Chinese radicals into neural machine translation: deeper than character level

Author: Han Lifeng
Kuang Shaohui
Publication venue: Association for Logic, Language and Information (FoLLI)
Publication date: 01/08/2018
Field of study

In neural machine translation (NMT), researchers face the challenge of un-seen (or out-of-vocabulary OOV) words translation. To solve this, some researchers propose the splitting of western languages such as English and German into sub-words or compounds. In this paper, we try to address this OOV issue and improve the NMT adequacy with a harder language Chinese whose characters are even more sophisticated in composition. We integrate the Chinese radicals into the NMT model with different settings to address the unseen words challenge in Chinese to English translation. On the other hand, this also can be considered as semantic part of the MT system since the Chinese radicals usually carry the essential meaning of the words they are constructed in. Meaningful radicals and new characters can be integrated into the NMT systems with our models. We use an attention-based NMT system as a strong baseline system. The experiments on standard Chinese-to-English NIST translation shared task data 2006 and 2008 show that our designed models outperform the baseline model in a wide range of state-of-the-art evaluation metrics including LEPOR, BEER, and CharacTER, in addition to the traditional BLEU and NIST scores, especially on the adequacy-level translation. We also have some interesting findings from the results of our various experiment settings about the performance of words and characters in Chinese NMT, which is different with other languages. For instance, the fully character level NMT may perform very well or the state of the art in some other languages as researchers demonstrated recently, however, in the Chinese NMT model, word boundary knowledge is important for the model learning

arXiv.org e-Print Archive

Irish Universities

DCU Online Research Access Service

The University of Manchester - Institutional Repository

Metafictional anaphora:A comparison of different accounts

Author: Semeijn Merel
Publication venue: ESSLLI
Publication date: 01/01/2018
Field of study

I argue that pronominal anaphora across mixed parafictional/ metafictional discourse (e.g. In The Lord of the Rings, Frodoi goes through an immense mental struggle. Hei is an intriguing fictional character! ) poses a problem for a workspace account. I evaluate different possible solutions based on a descriptivist approach, Zalta's logic of abstract objects and Recanati's dot-object theory

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Metafictional anaphora:A comparison of different accounts

Author: Semeijn Merel
Publication venue: ESSLLI
Publication date: 01/01/2018
Field of study

University of Groningen