2 research outputs found
Static and Dynamic Feature Selection in Morphosyntactic Analyzers
We study the use of greedy feature selection methods for morphosyntactic
tagging under a number of different conditions. We compare a static ordering of
features to a dynamic ordering based on mutual information statistics, and we
apply the techniques to standalone taggers as well as joint systems for tagging
and parsing. Experiments on five languages show that feature selection can
result in more compact models as well as higher accuracy under all conditions,
but also that a dynamic ordering works better than a static ordering and that
joint systems benefit more than standalone taggers. We also show that the same
techniques can be used to select which morphosyntactic categories to predict in
order to maximize syntactic accuracy in a joint system. Our final results
represent a substantial improvement of the state of the art for several
languages, while at the same time reducing both the number of features and the
running time by up to 80% in some cases
Morphological and Syntactic Case in Statistical Dependency Parsing
Most morphologically rich languages with free word order use case systems to mark the grammatical function of nominal elements, especially for the core argument functions of a verb. The standard pipeline approach in syntactic dependency parsing assumes a complete disambiguation of morphological (case) information prior to automatic syntactic analysis. Parsing experiments on Czech, German, and Hungarian show that this approach is susceptible to propagating morphological annotation errors when parsing languages displaying syncretism in their morphological case paradigms. We develop a different architecture where we use case as a possibly underspecified filtering device restricting the options for syntactic analysis. Carefully designed morpho-syntactic constraints can delimit the search space of a statistical dependency parser and exclude solutions that would violate the restrictions overtly marked in the morphology of the words in a given sentence. The constrained system outperforms a state-of-the-art data-driven pipeline architecture, as we show experimentally, and, in addition, the parser output comes with guarantees about local and global morpho-syntactic wellformedness, which can be useful for downstream applications. 1