Search CORE

63 research outputs found

Generative Input: Towards Next-Generation Input Methods Paradigm

Author: Chen Enhong
Ding Keyu
Jia Zhenzhen
Liu Cong
Wang Shijin
Wang Yongcan
Xu Zihang
Publication venue
Publication date: 02/11/2023
Field of study

Since the release of ChatGPT, generative models have achieved tremendous success and become the de facto approach for various NLP tasks. However, its application in the field of input methods remains under-explored. Many neural network approaches have been applied to the construction of Chinese input method engines(IMEs).Previous research often assumed that the input pinyin was correct and focused on Pinyin-to-character(P2C) task, which significantly falls short of meeting users' demands. Moreover, previous research could not leverage user feedback to optimize the model and provide personalized results. In this study, we propose a novel Generative Input paradigm named GeneInput. It uses prompts to handle all input scenarios and other intelligent auxiliary input functions, optimizing the model with user feedback to deliver personalized results. The results demonstrate that we have achieved state-of-the-art performance for the first time in the Full-mode Key-sequence to Characters(FK2C) task. We propose a novel reward model training method that eliminates the need for additional manual annotations and the performance surpasses GPT-4 in tasks involving intelligent association and conversational assistance. Compared to traditional paradigms, GeneInput not only demonstrates superior performance but also exhibits enhanced robustness, scalability, and online learning capabilities

arXiv.org e-Print Archive

Neural Network Language Model for Chinese Pinyin Input Method Engine

Author: Chen Shen-Yuan
Wang Rui
Zhao Hai
Publication venue
Publication date: 01/01/2015
Field of study

Waseda University Repository

Bipartite Flat-Graph Network for Nested Named Entity Recognition

Author: Luo Ying
Zhao Hai
Publication venue
Publication date: 01/01/2020
Field of study

In this paper, we propose a novel bipartite flat-graph network (BiFlaG) for nested named entity recognition (NER), which contains two subgraph modules: a flat NER module for outermost entities and a graph module for all the entities located in inner layers. Bidirectional LSTM (BiLSTM) and graph convolutional network (GCN) are adopted to jointly learn flat entities and their inner dependencies. Different from previous models, which only consider the unidirectional delivery of information from innermost layers to outer ones (or outside-to-inside), our model effectively captures the bidirectional interaction between them. We first use the entities recognized by the flat NER module to construct an entity graph, which is fed to the next graph module. The richer representation learned from graph module carries the dependencies of inner entities and can be exploited to improve outermost entity predictions. Experimental results on three standard nested NER datasets demonstrate that our BiFlaG outperforms previous state-of-the-art models.Comment: Accepted by ACL202

arXiv.org e-Print Archive

Crossref

Translating science fiction in a CAT tool:machine translation and segmentation settings

Author: Carl Michael
Nunes Vieira Lucas
Youdale Roy L
Zelenka Natalie R
Zhang Xiaochun
Publication venue: 'University of Western Sydney SOHACA'
Publication date: 28/02/2023
Field of study

There is increasing interest in machine assistance for literary translation, but research on how computer-assisted translation (CAT) tools and machine translation (MT) combine in the translation of literature is still incipient, especially for non-Europeanlanguages. This article presents two exploratory studies where English-to-Chinese translators used neural MT to translate science fiction short stories in Trados Studio. One of the studies compares post-editing with a ‘no MT’ condition. The other examinestwo ways of presenting the texts on screen for post-editing, namely by segmenting them into paragraphs or into sentences. We collected the data with the Qualititivity plugin for Trados Studio and describe a method for analysing data collected with this plugin through the translation process research database of the Center for Research in Translation and Translation Technology (CRITT). While post-editing required less technical effort, we did not find MT to be appreciably timesaving. Paragraph segmentation was associated with less post-editing effort on average, though with high translator variability. We discuss the results in the light of broader concepts, such as status-quo bias, and call for more research on the different ways in which MT may assist literary translation, including its use for comparison purposes or, as mentioned by a participant, for ‘inspiration’

UCL Discovery

Explore Bristol Research

An Efficient tone classifier for speech recognition of Cantonese.

Author
Publication venue: Department of Cultural and Religious Studies, The Chinese University of Hong Kong
Publication date: 01/01/1991
Field of study

by Cheng Yat Ho.Thesis (M.Phil.)--Chinese University of Hong Kong, 1991.Bibliography: leaves 106-108.Chapter Chapter 1 --- Introduction --- p.1Chapter Chapter 2 --- Preliminary Considerations --- p.8Chapter 2.1 --- Tone System of Cantonese --- p.8Chapter 2.2 --- Tone Classification Systems --- p.14Chapter 2.3 --- Design of a Speech Corpus --- p.17Chapter Chapter 3 --- Feature Parameters for Tone Classification --- p.22Chapter 3.1 --- Methodology --- p.22Chapter 3.2 --- Endpoint Detection and Time Alignment --- p.23Chapter 3.3 --- Pitch --- p.27Chapter 3.3.1 --- Pitch Profile Extraction --- p.28Chapter 3.3.2 --- Evaluation of Pitch Profile --- p.33Chapter 3.3.3 --- Feature Parameters Derived from Pitch Profile --- p.40Chapter 3.4 --- Duration --- p.46Chapter 3.5 --- Energy --- p.49Chapter 3.5.1 --- Energy Profile Extraction --- p.49Chapter 3.5.2 --- Evaluation of Energy Profile --- p.50Chapter 3.6 --- Summary --- p.54Chapter Chapter 4 --- Implementation of the Tone Classification System --- p.56Chapter 4.1 --- Intrinsic Pitch Estimation --- p.59Chapter 4.2 --- The Classifier --- p.63Chapter 4.2.1 --- Neural Network --- p.64Chapter 4.2.2 --- Post-Processing Unit --- p.74Chapter Chapter 5 --- Performance Evaluation on the Tone Classification System --- p.76Chapter 5.1 --- Single Speaker Tone Classification --- p.77Chapter 5.2 --- Multi-Speaker and Speaker Independent Tone Classification --- p.82Chapter 5.2.1 --- Classification with no Phonetic Information --- p.83Chapter 5.2.2 --- Classification with Known Final Consonants --- p.88Chapter 5.3 --- Confidence Improvement of the Recognition Results --- p.95Chapter 5.4 --- Summary --- p.101Chapter Chapter 6 --- Conclusions and Discussions --- p.102References --- p.106Chapter Appendix A --- Vocabulary of the Speech Corpus --- p.A1-A4Chapter Appendix B --- Statistics of the Pitch Profiles --- p.Bl-Bl5Chapter Appendix C --- Statistics of the Energy Profiles --- p.Cl-Cl1RESULT

CUHK Digital Repository