1 research outputs found
Incremental Learning for Fully Unsupervised Word Segmentation Using Penalized Likelihood and Model Selection
We present a novel incremental learning approach for unsupervised word
segmentation that combines features from probabilistic modeling and model
selection. This includes super-additive penalties for addressing the cognitive
burden imposed by long word formation, and new model selection criteria based
on higher-order generative assumptions. Our approach is fully unsupervised; it
relies on a small number of parameters that permits flexible modeling and a
mechanism that automatically learns parameters from the data. Through
experimentation, we show that this intricate design has led to top-tier
performance in both phonemic and orthographic word segmentation.Comment: 12 pages, 2014, unpublishe