Location of Repository

Statistical Analysis on Large Scale Chinese Short Message Corpus and Automatic Short Message Error Correction

By Rile Hu, Yuezhong Tang, Chen Li and Xia Wang

Abstract

Abstract. Analysis of short message corpus is an important foundation for research of automatic short message processing technology. Based on large scale short message corpus, this paper firstly presents statistical data and performs analysis in detail on basic information of short message corpus and special language phenomena in it. The distributions of the corpus parameters and special language phenomena are also given out. The statistical results presented in the paper are meaningful for research of robust short message understanding and implementation of short message based manmachine dialog system and short message based machine translation system. And we also build an automatic error correction system on mobile phone to correct the misapplication of Chinese character in short messages. The preliminary results show that our method is effective

Topics: computer application, Chinese information processing, corpus technology, statistical analysis, short message
Year: 2013
OAI identifier: oai:CiteSeerX.psu:10.1.1.359.9894
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.aclweb.org/antholog... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.