Search CORE

1 research outputs found

Quantitative analysis of treebanks using frequent subtree mining methods

Author: Scott Martens
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

The first task of statistical computational linguistics, or any other type of datadriven processing of language, is the extraction of counts and distributions of phenomena. This is much more difficult for the type of complex structured data found in treebanks and in corpora with sophisticated annotation than for tokenized texts. Recent developments in data mining, particularly in the extraction of frequent subtrees from treebanks, offer some solutions. We have applied a modified version of the TreeMiner algorithm to a small treebank and present some promising results.

CiteSeerX

Crossref