Unsupervised Statistical Segmentation of Japanese Kanji Strings

Ando, Rie; Lee, Lillian

research

Unsupervised Statistical Segmentation of Japanese Kanji Strings

Authors: Rie Ando
Lillian Lee
Publication date: 1 January 1999
Publisher: 'SAGE Publications'

Abstract

Word segmentation is an important issue in Japanese language processing because Japanese is written without space delimiters between words. We propose a simple dictionary-less method to segment Japanese kanji sequences into words based solely on character

n

-gram counts from an unannotated corpus. The performance was often better than that of rule-based morphological analyzers over a variety of both standard and novel error metrics

Similar works

Full text

Available Versions

eCommons@Cornell

oai:ecommons.cornell.edu:1813/...

Last time updated on 08/03/2017

CiteSeerX

oai:CiteSeerX.psu:10.1.1.73.69...

Last time updated on 22/10/2014