Abstract. Analysis of short message corpus is an important foundation for research of automatic short message processing technology. Based on large scale short message corpus, this paper firstly presents statistical data and performs analysis in detail on basic information of short message corpus and special language phenomena in it. The distributions of the corpus parameters and special language phenomena are also given out. The statistical results presented in the paper are meaningful for research of robust short message understanding and implementation of short message based manmachine dialog system and short message based machine translation system. And we also build an automatic error correction system on mobile phone to correct the misapplication of Chinese character in short messages. The preliminary results show that our method is effective
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.