Search CORE

4,430 research outputs found

Statistical parsing of morphologically rich languages (SPMRL): what, how and whither

Author: Candito Marie
Foster Jennifer
Goldberg Yoav
Kübler Sandra
Rehbein Ines
Seddah Djamé
Tounsi Lamia
Tsarfaty Reut
Versley Yannick
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

The term Morphologically Rich Languages (MRLs) refers to languages in which significant information concerning syntactic units and relations is expressed at word-level. There is ample evidence that the application of readily available statistical parsing models to such languages is susceptible to serious performance degradation. The first workshop on statistical parsing of MRLs hosts a variety of contributions which show that despite language-specific idiosyncrasies, the problems associated with parsing MRLs cut across languages and parsing frameworks. In this paper we review the current state-of-affairs with respect to parsing MRLs and point out central challenges. We synthesize the contributions of researchers working on parsing Arabic, Basque, French, German, Hebrew, Hindi and Korean to point out shared solutions across languages. The overarching analysis suggests itself as a source of directions for future investigations

CiteSeerX

INRIA a CCSD electronic archive server

Irish Universities

DCU Online Research Access Service

Hal-Diderot

Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French

Author: Candito Marie
Cetinoglu Ozlem
Chrupała Grzegorz
Seddah Djamé
van Genabith Josef
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

This paper shows that training a lexicalized parser on a lemmatized morphologically-rich treebank such as the French Treebank slightly improves parsing results. We also show that lemmatizing a similar in size subset of the English Penn Treebank has almost no effect on parsing performance with gold lemmas and leads to a small drop of performance when automatically assigned lemmas and POS tags are used. This highlights two facts: (i) lemmatization helps to reduce lexicon data-sparseness issues for French, (ii) it also makes the parsing process sensitive to correct assignment of POS tags to unknown words

Irish Universities

DCU Online Research Access Service

Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean

Author: Lee Geunbae
Lee Jong-Hyeok
Publication venue
Publication date: 01/01/1996
Field of study

A new tightly coupled speech and natural language integration model is presented for a TDNN-based continuous possibly large vocabulary speech recognition system for Korean. Unlike popular n-best techniques developed for integrating mainly HMM-based speech recognition and natural language processing in a {\em word level}, which is obviously inadequate for morphologically complex agglutinative languages, our model constructs a spoken language system based on a {\em morpheme-level} speech and language integration. With this integration scheme, the spoken Korean processing engine (SKOPE) is designed and implemented using a TDNN-based diphone recognition module integrated with a Viterbi-based lexical decoding and symbolic phonological/morphological co-analysis. Our experiment results show that the speaker-dependent continuous {\em eojeol} (Korean word) recognition and integrated morphological analysis can be achieved with over 80.6% success rate directly from speech inputs for the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer processing of oriental language journa

arXiv.org e-Print Archive

포항공과대학교

Parsing as Reduction

Author: Fernández-González Daniel
Martins André F. T.
Publication venue
Publication date: 01/01/2015
Field of study

We reduce phrase-representation parsing to dependency parsing. Our reduction is grounded on a new intermediate representation, "head-ordered dependency trees", shown to be isomorphic to constituent trees. By encoding order information in the dependency labels, we show that any off-the-shelf, trainable dependency parser can be used to produce constituents. When this parser is non-projective, we can perform discontinuous parsing in a very natural manner. Despite the simplicity of our approach, experiments show that the resulting parsers are on par with strong baselines, such as the Berkeley parser for English and the best single system in the SPMRL-2014 shared task. Results are particularly striking for discontinuous parsing of German, where we surpass the current state of the art by a wide margin

arXiv.org e-Print Archive

Crossref

Morphologically complex words in L1 and L2 processing: Evidence from masked priming experiments in English

Author: Allan
Aronoff
Baayen
Clahsen
Davis
Forster
Forster
Frost
HARALD CLAHSEN
Lardiere
Pinker
RENITA SILVA
Ullman
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 20/06/2008
Field of study

This paper reports results from masked priming experiments investigating regular past-tense forms and deadjectival nominalizations with -ness and -ity in adult native (L1) speakers of English and in different groups of advanced adult second language (L2) learners of English. While the L1 group showed efficient priming for both inflected and derived word forms, the L2 learners demonstrated repetition-priming effects (like the L1 group), but no priming for inflected and reduced priming for derived word forms. We argue that this striking contrast between L1 and L2 processing supports the view that adult L2 learners rely more on lexical storage and less on combinatorial processing of morphologically complex words than native speakers.</jats:p

University of Essex Research Repository

Crossref

Proceedings of the LREC workshop on partial parsing : between chunk parsing and deep parsing

Author: Kübler Sandra
Piskorski Jakub
Przepiorkowski Adam
Publication venue
Publication date: 03/11/2008
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

Decreasing lexical data sparsity in statistical syntactic parsing - experiments with named entities

Author: Foster Jennifer
Hogan Deirdre
van Genabith Josef
Publication venue
Publication date: 01/01/2011
Field of study

In this paper we present preliminary experiments that aim to reduce lexical data sparsity in statistical parsing by exploiting information about named entities. Words in the WSJ corpus are mapped to named entity clusters and a latent variable constituency parser is trained and tested on the transformed corpus. We explore two different methods for mapping words to entities, and look at the effect of mapping various subsets of named entity types. Thus far, results show no improvement in parsing accuracy over the best baseline score; we identify possible problems and outline suggestions for future directions

CiteSeerX

Irish Universities

DCU Online Research Access Service