30 research outputs found
Character-level Chinese-English Translation through ASCII Encoding
Character-level Neural Machine Translation (NMT) models have recently
achieved impressive results on many language pairs. They mainly do well for
Indo-European language pairs, where the languages share the same writing
system. However, for translating between Chinese and English, the gap between
the two different writing systems poses a major challenge because of a lack of
systematic correspondence between the individual linguistic units. In this
paper, we enable character-level NMT for Chinese, by breaking down Chinese
characters into linguistic units similar to that of Indo-European languages. We
use the Wubi encoding scheme, which preserves the original shape and semantic
information of the characters, while also being reversible. We show promising
results from training Wubi-based models on the character- and subword-level
with recurrent as well as convolutional models.Comment: 7 pages, 3 figures, 3rd Conference on Machine Translation (WMT18),
201
Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training
We introduce CDBERT, a new learning paradigm that enhances the semantics
understanding ability of the Chinese PLMs with dictionary knowledge and
structure of Chinese characters. We name the two core modules of CDBERT as
Shuowen and Jiezi, where Shuowen refers to the process of retrieving the most
appropriate meaning from Chinese dictionaries and Jiezi refers to the process
of enhancing characters' glyph representations with structure understanding. To
facilitate dictionary understanding, we propose three pre-training tasks, i.e.,
Masked Entry Modeling, Contrastive Learning for Synonym and Antonym, and
Example Learning. We evaluate our method on both modern Chinese understanding
benchmark CLUE and ancient Chinese benchmark CCLUE. Moreover, we propose a new
polysemy discrimination task PolyMRC based on the collected dictionary of
ancient Chinese. Our paradigm demonstrates consistent improvements on previous
Chinese PLMs across all tasks. Moreover, our approach yields significant
boosting on few-shot setting of ancient Chinese understanding.Comment: To appear at ACL 2023 Finding
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese character decomposition has been used as a feature to enhance Machine
Translation (MT) models, combining radicals into character and word level
models. Recent work has investigated ideograph or stroke level embedding.
However, questions remain about different decomposition levels of Chinese
character representations, radical and strokes, best suited for MT. To
investigate the impact of Chinese decomposition embedding in detail, i.e.,
radical, stroke, and intermediate levels, and how well these decompositions
represent the meaning of the original character sequences, we carry out
analysis with both automated and human evaluation of MT. Furthermore, we
investigate if the combination of decomposed Multiword Expressions (MWEs) can
enhance the model learning. MWE integration into MT has seen more than a decade
of exploration. However, decomposed MWEs has not previously been explored.Comment: Accepted to publish in NoDaLiDa202
An Investigation on Cognitive-Linguistic Skills of English-Chinese Bilingual Learners with and without Dyslexia in Singapore
This thesis investigates dyslexia and the cognitive-linguistics skills, namely phonological awareness,
orthographic knowledge, morphological awareness and rapid naming, of bilingual learners in
Singapore whose first language is English and second language is Chinese. The two main research aims
are to investigate whether the English-Chinese bilingual learners with dyslexia diagnosed only in
English are weaker than their typical counterparts in reading and all cognitive-linguistic skills in both
languages or either language, and to investigate which cognitive-linguistic skills are strong predictors
of reading in each language. Results show that the bilingual learners with dyslexia performed
significantly poorer than their typical counterparts in reading and all cognitive-linguistic skills in both
languages, although their dyslexia were diagnosed only in English. Results also found all English
cognitive-linguistic skills predictive of English word reading, especially the unique predictive roles of
morphological awareness and orthographic knowledge after rapid naming and phonological
awareness were controlled. However, only rapid naming and morphological awareness were found to
be predictive of Chinese word reading. The results suggest that dyslexia may manifest differently in
reading and cognitive-linguistic skills of English and Chinese languages in the English-Chinese bilingual
learners, based on the two different predictive models with different empirically and theoretically
supported orders of cognitive-linguistic skills as predictors for reading development in the two
languages. The difference in the unique contributions of the four cognitive-linguistic skills underlying
the reading development of both languages may suggest the difference lies in language structure and
instruction.
Keywords: dyslexia, bilingualism, English reading, Chinese reading, cognitive-linguistic skill