1 research outputs found

    Quantitative analysis of treebanks using frequent subtree mining methods

    No full text
    The first task of statistical computational linguistics, or any other type of datadriven processing of language, is the extraction of counts and distributions of phenomena. This is much more difficult for the type of complex structured data found in treebanks and in corpora with sophisticated annotation than for tokenized texts. Recent developments in data mining, particularly in the extraction of frequent subtrees from treebanks, offer some solutions. We have applied a modified version of the TreeMiner algorithm to a small treebank and present some promising results.
    corecore