Learning a Grammar from a Bracketed Corpus

Abstract

In this paper, we propose a method to group brackets in a bracketed corpus (with lexical tags), according to their local contextual information, as a rst step towards the automatic acquisition of a context-free grammar. Using a bracketed corpus, the learning task is reduced to the problem of how to determine the nonterminal label of each bracket in the corpus. In a grouping process, a single nonterminal label is assigned to each group of brackets which are similar. Two techniques, distributional analysis and hierarchical Bayesian clustering, are applied to exploit local contextual information for computing similarity between two brackets. We also show a technique developed for determining the appropriate number of bracket groups based on the concept of entropy analysis. Finally, we present a set of experimental results and evaluate the obtained results with a model solution given by humans. Key Words grammar acquisition, distribution analysis, hierarchical Bayesian clustering, local c..

    Similar works

    Full text

    thumbnail-image

    Available Versions