Skip to main content
Article thumbnail
Location of Repository

Smoothing for Bracketing Induction

By Xiangyu Duan, Min Zhang and Wenliang Chen


Bracketing induction is the unsupervised learning of hierarchical constituents without labeling their syntactic categories such as verb phrase (VP) from natural raw sentences. Constituent Context Model (CCM) is an effective generative model for the bracketing induction, but the CCM computes probability of a constituent in a very straightforward way no matter how long this constituent is. Such method causes severe data sparse problem because long constituents are more unlikely to appear in test set. To overcome the data sparse problem, this paper proposes to define a non-parametric Bayesian prior distribution, namely the Pitman-Yor Process (PYP) prior, over constituents for constituent smoothing. The PYP prior functions as a back-off smoothing method through using a hierarchical smoothing scheme (HSS). Various kinds of HSS are proposed in this paper. We find that two kinds of HSS are effective, attaining or significantly improving the state-of-the-art performance of the bracketing induction evaluated on standard treebanks of various languages, while another kind of HSS, which is commonly used for smoothing sequences by n-gram Markovization, is not effective for improving the performance of the CCM

Year: 2013
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.