3,019 research outputs found
Combining Formal Concept Analysis and Translation to Assign Frames and Thematic Role Sets to French Verbs
International audienceWe present an application of Formal Concept Analysis in the domain of Natural Language Processing: We give a general overview of the framework, describe its goals, the data it is based on, the way it works and we illustrate the kind of data we expect as a result. More specifically, we examine the ability of the stability, separation and probability indices to select the most relevant concepts with respect to our FCA application. We show that the sum of stability and separation gives results close to those obtained when using the entire lattice
Lexical typology : a programmatic sketch
The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar
Predicate Matrix: an interoperable lexical knowledge base for predicates
183 p.La Matriz de Predicados (Predicate Matrix en inglés) es un nuevo recurso léxico-semántico resultado de la integración de múltiples fuentes de conocimiento, entre las cuales se encuentran FrameNet, VerbNet, PropBank y WordNet. La Matriz de Predicados proporciona un léxico extenso y robusto que permite mejorar la interoperabilidad entre los recursos semánticos mencionados anteriormente. La creación de la Matriz de Predicados se basa en la integración de Semlink y nuevos mappings obtenidos utilizando métodos automáticos que enlazan el conocimiento semántico a nivel léxico y de roles. Asimismo, hemos ampliado la Predicate Matrix para cubrir los predicados nominales (inglés, español) y predicados en otros idiomas (castellano, catalán y vasco). Como resultado, la Matriz de predicados proporciona un léxico multilingüe que permite el análisis semántico interoperable en múltiples idiomas
Automated Semantic Classification of French Verbs
The aim of this work is to explore (semi-)automatic means to create a Levin-type classification of French verbs, suitable for Natural Language Processing. For English, a classification based on Levin's method is VerbNet (Kipper 2005). VerbNet is an extensive digital verb lexicon which systematically extends Levin's classes while ensuring that class members have a common semantics and share a common set of syntactic frames and thematic roles. In this work we reorganise the verbs from three French syntax lexicons, namely Volem, the Grammar-Lexicon (Ladl) and Dicovalence, into VerbNet-like verb classes using the technique of Formal Concept Analysis. We automatically acquire syntactic-semantic verb class and diathesis alternation information. We create large scale verb classes and compare their verb and frame distributions to those of VerbNet. We discuss possible evaluation schemes and finally focus on an evaluation methodology with respect to VerbNet, of which we present the theoretical motivation and analyse the feasibility on a small hand-built example
D6.1: Technologies and Tools for Lexical Acquisition
This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)
Proceedings
Proceedings of the Ninth International Workshop
on Treebanks and Linguistic Theories.
Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti.
NEALT Proceedings Series, Vol. 9 (2010), 268 pages.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15891
Recommended from our members
Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing
Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs.
To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909
- …