7,961 research outputs found
Approaching Neural Chinese Word Segmentation as a Low-Resource Machine Translation Task
Supervised Chinese word segmentation has entered the deep learning era which
reduces the hassle of feature engineering. Recently, some researchers attempted
to treat it as character-level translation which further simplified model
designing and building, but there is still a performance gap between the
translation-based approach and other methods. In this work, we apply the best
practices from low-resource neural machine translation to Chinese word
segmentation. We build encoder-decoder models with attention, and examine a
series of techniques including regularization, data augmentation, objective
weighting, transfer learning and ensembling. Our method is generic for word
segmentation, without the need for feature engineering or model implementation.
In the closed test with constrained data, our method ties with the state of the
art on the MSR dataset and is comparable to other methods on the PKU dataset
- …