Search CORE

4 research outputs found

Chinese Spelling Error Detection Using a Fusion Lattice LSTM

Author: Duan Jianyong
Wang Bing
Wang Hao
Zhang Jiajun
Publication venue
Publication date: 25/11/2019
Field of study

Spelling error detection serves as a crucial preprocessing in many natural language processing applications. Due to the characteristics of Chinese Language, Chinese spelling error detection is more challenging than error detection in English. Existing methods are mainly under a pipeline framework, which artificially divides error detection process into two steps. Thus, these methods bring error propagation and cannot always work well due to the complexity of the language environment. Besides existing methods only adopt character or word information, and ignore the positive effect of fusing character, word, pinyin1 information together. We propose an LF-LSTM-CRF model, which is an extension of the LSTMCRF with word lattices and character-pinyin-fusion inputs. Our model takes advantage of the end-to-end framework to detect errors as a whole process, and dynamically integrates character, word and pinyin information. Experiments on the SIGHAN data show that our LF-LSTM-CRF outperforms existing methods with similar external resources consistently, and confirm the feasibility of adopting the end-to-end framework and the availability of integrating of character, word and pinyin information.Comment: 8 pages,5 figure

arXiv.org e-Print Archive

Domain-shift Conditioning using Adaptable Filtering via Hierarchical Embeddings for Robust Chinese Spell Check

Author: Chen Nancy F.
Ngo Gia H.
Nguyen Minh
Publication venue
Publication date: 15/01/2021
Field of study

Spell check is a useful application which processes noisy human-generated text. Spell check for Chinese poses unresolved problems due to the large number of characters, the sparse distribution of errors, and the dearth of resources with sufficient coverage of heterogeneous and shifting error domains. For Chinese spell check, filtering using confusion sets narrows the search space and makes finding corrections easier. However, most, if not all, confusion sets used to date are fixed and thus do not include new, shifting error domains. We propose a scalable adaptable filter that exploits hierarchical character embeddings to (1) obviate the need to handcraft confusion sets, and (2) resolve sparsity problems related to infrequent errors. Our approach compares favorably with competitive baselines and obtains SOTA results on the 2014 and 2015 Chinese Spelling Check Bake-off datasets

arXiv.org e-Print Archive

Chinese Spelling Checker Based on Statistical Machine Translation

Author: Hsun-wen Chiu
Jason S. Chang
Jian-cheng Wu
Publication venue
Publication date
Field of study

Chinese spelling check is an important component for many NLP applications, including word processor and search engines. However, compared to checkers for alphabetical languages (e.g., English or French), Chinese spelling checkers are more difficult to develop, because there are no word boundaries in Chinese writing system, and errors may be caused by various Chinese input methods. In this paper, we proposed a novel method to Chinese spelling checking. Our approach involves error detection and correction based on the phrasal statistical machine translation framework. The results show that the proposed system achieves significantly better accuracy in error detecting and more satisfactory performance in error correcting.

CiteSeerX

Chinese Spelling Checker Based on Statistical Machine Translation 邱絢紋 Hsun-wen Chiu

Author: 吳鑑城 Jian-cheng Wu
Publication venue
Publication date
Field of study

Chinese spell check is an important component for many NLP applications, including word processors, search engines, and automatic essay rating. However, compared to spell checkers for alphabetical languages (e.g., English or French), Chinese spell checkers are more difficult to develop, because there are no word boundaries in Chinese writing system, and errors may be caused by various Chinese input methods. Chinese spell check involves automatically detecting and correcting typos, roughly corresponding to misspelled words in English. Liu et al. (2011) show that people tend to unintentionally generate typos that sound similar (e.g.,

CiteSeerX