Search CORE

86 research outputs found

Corpus annotation within the French FrameNet: a domain-by-domain methodology

Author: Candito Marie
Djemaa Marianne
Muller Philippe
Vieu Laure
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

International audienceThis paper reports on the development of a French FrameNet, within the ASFALDA project. While the first phase of the project focused on the development of a French set of frames and corresponding lexicon (Candito et al., 2014), this paper concentrates on the subsequent corpus annotation phase, which focused on four notional domains (commercial transactions, cognitive stances, causality and verbal communication). Given full coverage is not reachable for a relatively " new " FrameNet project, we advocate that focusing on specific notional domains allowed us to obtain full lexical coverage for the frames of these domains, while partially reflecting word sense ambiguities. Furthermore, as frames and roles were annotated on two French Treebanks (the French Treebank (Abeillé and Barrier, 2004) and the Sequoia Treebank (Candito and Seddah, 2012), we were able to extract a syntactico-semantic lexicon from the annotated frames. In the resource's current status, there are 98 frames, 662 frame-evoking words, 872 senses, and about 13000 annotated frames, with their semantic roles assigned to portions of text. The French FrameNet is freely available at alpage.inria.fr/asfalda

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Open Archive Toulouse Archive Ouverte

Hal-Diderot

Developing a French FrameNet: Methodology and First results

Author: Amsili Pascal
Barque Lucie
Benamara Farah
Candito Marie
De Chalendar Gaël
Djemaa Marianne
Haas Pauline
Huyghe Richard
Mathieu Yvette Yannick
Muller Philippe
Sagot Benoît
Vieu Laure
Publication venue: HAL CCSD
Publication date: 01/05/2014
Field of study

International audienceThe Asfalda project aims to develop a French corpus with frame-based semantic annotations and automatic tools for shallow semantic analysis. We present the ﬁrst part of the project: focusing on a set of notional domains, we delimited a subset of English frames, adapted them to French data when necessary, and developed the corresponding French lexicon. We believe that working domain by domain helped us to enforce the coherence of the resulting resource, and also has the advantage that, though the number of frames is limited (around a hundred), we obtain full coverage within a given domain

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Open Archive Toulouse Archive Ouverte

HAL-CEA

Hal-Diderot

Predicate Matrix: an interoperable lexical knowledge base for predicates

Author: López de Lacalle Maddalen
Publication venue
Publication date: 10/07/2023
Field of study

183 p.La Matriz de Predicados (Predicate Matrix en inglés) es un nuevo recurso léxico-semántico resultado de la integración de múltiples fuentes de conocimiento, entre las cuales se encuentran FrameNet, VerbNet, PropBank y WordNet. La Matriz de Predicados proporciona un léxico extenso y robusto que permite mejorar la interoperabilidad entre los recursos semánticos mencionados anteriormente. La creación de la Matriz de Predicados se basa en la integración de Semlink y nuevos mappings obtenidos utilizando métodos automáticos que enlazan el conocimiento semántico a nivel léxico y de roles. Asimismo, hemos ampliado la Predicate Matrix para cubrir los predicados nominales (inglés, español) y predicados en otros idiomas (castellano, catalán y vasco). Como resultado, la Matriz de predicados proporciona un léxico multilingüe que permite el análisis semántico interoperable en múltiples idiomas

Archivo Digital para la Docencia y la Investigación

Developing a large scale FrameNet for Italian - The IFrameNet experience

Author: Brambilla Silvia <1993>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 05/07/2022
Field of study

In this thesis we present the development and the current status of the IFrameNet project, aimed at the construction of a large-scale lexical semantic resource for the Italian language based on Frame Semantics theories. We will begin by contextualizing our work in the wider context of Frame Semantics and of the FrameNet project, which, since 1997, has attempted to apply these theories to lexicography. We will then analyse and discuss the applicability of the structure of the American resource to Italian and more specifically we will focus on the domain of fear, worry, and anxiety. We will finally propose some modifications aimed at improving this domain of the resource in relation to its coherence, its ability to accurately represent the linguistic reality and in particular in order to make it possible to apply it to Italian

AMS Tesi di Dottorato

Recommended from our members

Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing

Author: Majewska Olga
Publication venue: University of Cambridge
Publication date: 01/02/2021
Field of study

Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs. To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909

Apollo (Cambridge)

Using Semantic Frames for Measuring and Identifying Semantic Relationships in Software Descriptions

Author: Alhoshan Waad
Publication venue
Publication date: 01/08/2020
Field of study

The University of Manchester - Institutional Repository

Metaphor and Senses

Author: Zawislawska Magdalena
Publication venue: 'Peter Lang, International Academic Publishers'
Publication date: 15/07/2021
Field of study

The book deals with the synesthetic metaphors in Synamet – a semantically and grammatically annotated corpus. The texts included in the corpus are excerpted from blogs devoted to, among others, perfume, wine, beer, music, art, massage and wellness. The thesis presents a Conceptual Metaphor Theory (CMT) and frame-based analysis of synesthetic metaphors in Polish. Using data from the corpus, the book provides ample empirical support for embodiment in metaphor and internal logic of mappings between frames. The study proposes new models of verbal synesthesia in the corpus and calls into question a universality of hierarchy of senses. This book should be of interest to researchers working within cognitive linguistics, in particular metaphor theory, frame semantics, corpus linguistics, and sensory science

Directory of Open Access Books (DOAB)

The Usability of Language Technology Methods and Parallel Corpora in Bilingual Lexicography. Quantifying Translational Equivalence

Author: Héja Enikő
Publication venue
Publication date: 01/01/2015
Field of study

ELTE Digital Institutional Repository (EDIT)

Empirical studies on word representations

Author: Suster Simon
Publication venue: Rijksuniversiteit Groningen
Publication date: 01/01/2016
Field of study

One of the most fundamental tasks in natural language processing is representing words with mathematical objects (such as vectors). The word representations, which are most often estimated from data, allow capturing the meaning of words. They enable comparing words according to their semantic similarity, and have been shown to work extremely well when included in complex real-world applications. A large part of our work deals with ways of estimating word representations directly from large quantities of text. Our methods exploit the idea that words which occur in similar contexts have a similar meaning. How we define the context is an important focus of our thesis. The context can consist of a number of words to the left and to the right of the word in question, but, as we show, obtaining context words via syntactic links (such as the link between the verb and its subject) often works better. We furthermore investigate word representations that accurately capture multiple meanings of a single word. We show that translation of a word in context contains information that can be used to disambiguate the meaning of that word

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Institutional Repository Universiteit Antwerpen

Dissertations of the University of Groningen

Empirical studies on word representations

Author: Suster Simon
Publication venue: Rijksuniversiteit Groningen
Publication date: 01/01/2016
Field of study

Dissertations of the University of Groningen