Search CORE

39 research outputs found

Semantic Mapping for Lexical Sparseness Reduction in Parsing

Author: Suster Simon
van Noord Gertjan
Publication venue
Publication date: 01/01/2013
Field of study

Bilexical information is known to be helpful inparse disambiguation, but the benefit is limitedbecause of lexical sparseness. An approach us-ing word classes can reduce sparseness and po-tentially leads to more accurate parsing. Firstly,we describe a method identifying the depen-dency types of the Alpino parser for Dutchto which we would like to apply generaliza-tion. These are the types which are most likelyto reduce the sparseness and positively affectparsing at the same time. Secondly, we providepreliminary results for enhancement of depen-dency types with semantic classes derived froma WordNet-like inventory for Dutch. Classesof varying degrees of generality are appliedto three dependency types: nominal conjunc-tion, modification of adjective and modificationof noun. We observe improvements in someconcrete cases, whereas the overall parsing ac-curacy either remains unchanged or decreases.We identify drawbacks of human-built senseinventories, which provides motivation for adistributional semantic approach

ARTS repository - University of Groningen

Empirical studies on word representations

Author: Suster Simon
Publication venue: Rijksuniversiteit Groningen
Publication date: 01/01/2016
Field of study

One of the most fundamental tasks in natural language processing is representing words with mathematical objects (such as vectors). The word representations, which are most often estimated from data, allow capturing the meaning of words. They enable comparing words according to their semantic similarity, and have been shown to work extremely well when included in complex real-world applications. A large part of our work deals with ways of estimating word representations directly from large quantities of text. Our methods exploit the idea that words which occur in similar contexts have a similar meaning. How we define the context is an important focus of our thesis. The context can consist of a number of words to the left and to the right of the word in question, but, as we show, obtaining context words via syntactic links (such as the link between the verb and its subject) often works better. We furthermore investigate word representations that accurately capture multiple meanings of a single word. We show that translation of a word in context contains information that can be used to disambiguate the meaning of that word

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Institutional Repository Universiteit Antwerpen

Dissertations of the University of Groningen

Empirical studies on word representations

Author: Suster Simon
Publication venue: Rijksuniversiteit Groningen
Publication date: 01/01/2016
Field of study

ARTS repository - University of Groningen

Recommended from our members

Parser lexicalisation through self-learning

Author: Briscoe T
Rei M
Publication venue: NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference
Publication date: 01/01/2014
Field of study

We describe a new self-learning framework for parser lexicalisation that requires only a plain-text corpus of in-domain text. The method first creates augmented versions of dependency graphs by applying a series of modifications designed to directly capture higherorder lexical path dependencies. Scores are assigned to each edge in the graph using statistics from an automatically parsed background corpus. As bilexical dependencies are sparse, a novel directed distributional word similarity measure is used to smooth edge score estimates. Edge scores are then combined into graph scores and used for reranking the topn analyses found by the unlexicalised parser. The approach achieves significant improvements on WSJ and biomedical text over the unlexicalised baseline parser, which is originally trained on a subset of the Brown corpus

Apollo (Cambridge)

Semantic Mapping for Lexical Sparseness Reduction in Parsing

Author: Suster Simon
van Noord Gertjan
Publication venue
Publication date: 01/01/2013
Field of study

Bilexical information is known to be helpful in parse disambiguation, but the benefit is limited because of lexical sparseness. An approach us- ing word classes can reduce sparseness and po- tentially leads to more accurate parsing. Firstly, we describe a method identifying the depen- dency types of the Alpino parser for Dutch to which we would like to apply generaliza- tion. These are the types which are most likely to reduce the sparseness and positively affect parsing at the same time. Secondly, we provide preliminary results for enhancement of depen- dency types with semantic classes derived from a WordNet-like inventory for Dutch. Classes of varying degrees of generality are applied to three dependency types: nominal conjunc- tion, modification of adjective and modification of noun. We observe improvements in some concrete cases, whereas the overall parsing ac- curacy either remains unchanged or decreases. We identify drawbacks of human-built sense inventories, which provides motivation for a distributional semantic approach

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Institutional Repository Universiteit Antwerpen

Dissertations of the University of Groningen

Structural Ambiguity and its Disambiguation in Language Model Based Parsers: the Case of Dutch Clause Relativization

Author: Moortgat Michael
Wijnholds Gijs
Publication venue
Publication date: 24/05/2023
Field of study

This paper addresses structural ambiguity in Dutch relative clauses. By investigating the task of disambiguation by grounding, we study how the presence of a prior sentence can resolve relative clause ambiguities. We apply this method to two parsing architectures in an attempt to demystify the parsing and language model components of two present-day neural parsers. Results show that a neurosymbolic parser, based on proof nets, is more open to data bias correction than an approach based on universal dependencies, although both setups suffer from a comparable initial data bias

arXiv.org e-Print Archive