1 research outputs found

    An Improved MDL-Based Compression Algorithm for Unsupervised Word Segmentation

    No full text
    We study the mathematical properties of a recently proposed MDL-based unsupervised word segmentation algorithm, called regularized compression. Our analysis shows that its objective function can be efficiently approximated using the negative empirical pointwise mutual information. The proposed extension improves the baseline performance in both efficiency and accuracy on a standard benchmark.
    corecore