1 research outputs found

    Cascaded Chinese Weibo Segmentation Based on CRFs

    No full text
    With the developments of Web2.0, the process for the data on Internet becomes necessary. This Paper reports our work for Chinese weibo segmentation in the 2012 CIPS-SIGHAN bakeoff. In order to improve the recognition accuracy of out-ofvocabulary words, we propose a cascaded model which first segments and disambiguates in-vocabulary words, then recovers out-of-vocabulary words from the fragments. Both the two process are trained by a character-based CRFs model with useredited external vocabulary. The final performance on the test data shows that our system achieves a promising result.
    corecore