Search CORE

158 research outputs found

Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation

Author: Korhonen Anna
Mrkšić Nikola
Vulić Ivan
Publication venue
Publication date: 01/01/2017
Field of study

Existing approaches to automatic VerbNet-style verb classification are heavily dependent on feature engineering and therefore limited to languages with mature NLP pipelines. In this work, we propose a novel cross-lingual transfer method for inducing VerbNets for multiple languages. To the best of our knowledge, this is the first study which demonstrates how the architectures for learning word embeddings can be applied to this challenging syntactic-semantic task. Our method uses cross-lingual translation pairs to tie each of the six target languages into a bilingual vector space with English, jointly specialising the representations to encode the relational information from English VerbNet. A standard clustering algorithm is then run on top of the VerbNet-specialised representations, using vector dimensions as features for learning verb classes. Our results show that the proposed cross-lingual transfer approach sets new state-of-the-art verb classification performance across all six target languages explored in this work.Comment: EMNLP 2017 (long paper

arXiv.org e-Print Archive

Crossref

Investigating the cross-lingual translatability of VerbNet-style classification.

Author: Huang Yan
Korhonen Anna
Laippala Veronika
Majewska Olga
McCarthy Diana
Murakami Akira
Vulić Ivan
Publication venue: Lang Resour Eval
Publication date: 20/10/2017
Field of study

VerbNet-the most extensive online verb lexicon currently available for English-has proved useful in supporting a variety of NLP tasks. However, its exploitation in multilingual NLP has been limited by the fact that such classifications are available for few languages only. Since manual development of VerbNet is a major undertaking, researchers have recently translated VerbNet classes from English to other languages. However, no systematic investigation has been conducted into the applicability and accuracy of such a translation approach across different, typologically diverse languages. Our study is aimed at filling this gap. We develop a systematic method for translation of VerbNet classes from English to other languages which we first apply to Polish and subsequently to Croatian, Mandarin, Japanese, Italian, and Finnish. Our results on Polish demonstrate high translatability with all the classes (96% of English member verbs successfully translated into Polish) and strong inter-annotator agreement, revealing a promising degree of overlap in the resultant classifications. The results on other languages are equally promising. This demonstrates that VerbNet classes have strong cross-lingual potential and the proposed method could be applied to obtain gold standards for automatic verb classification in different languages. We make our annotation guidelines and the six language-specific verb classifications available with this paper

University of Birmingham Research Portal

Apollo (Cambridge)

Italian VerbNet: A Construction based Approach to Italian Verb Classification

Author: Busso Lucia
Lenci Alessandro
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2016
Field of study

This paper proposes a new method for Italian verb classification -and a preliminary example of resulting classes- inspired by Levin (1993) and VerbNet (Kipper-Schuler, 2005), yet partially independent from these resources; we achieved such a result by integrating Levin and VerbNet’s models of classification with other theoretic frameworks and resources. The classification is rooted in the constructionist framework (Goldberg, 1995; 2006) and is distribution-based. It is also semantically characterized by a link to FrameNet’ssemanticframesto represent the event expressed by a class. However, the new Italian classes maintain the hierarchic “tree” structure and monotonic nature of VerbNet’s classes, and, where possible, the original names (e.g.: Verbs of Killing, Verbs of Putting, etc.). We therefore propose here a taxonomy compatible with VerbNet but at the same time adapted to Italian syntax and semantics. It also addresses a number of problems intrinsic to the original classifications, such as the role of argument alternations, here regarded simply as epiphenomena, consistently with the constructionist approach

Archivio della Ricerca - Università di Pisa

Aston Publications Explorer

Investigating the cross-lingual translatability of VerbNet-style classification

Author: Huang Y.
Korhonen A.
Laippala V.
Majewska O.
McCarthy D.
Murakami A.
Vulić I.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/10/2022
Field of study

VerbNet—the most extensive online verb lexicon currently available for English—has proved useful in supporting a variety of NLP tasks. However, its exploitation in multilingual NLP has been limited by the fact that such classifications are available for few languages only. Since manual development of VerbNet is a major undertaking, researchers have recently translated VerbNet classes from English to other languages. However, no systematic investigation has been conducted into the applicability and accuracy of such a translation approach across different, typologically diverse languages. Our study is aimed at filling this gap. We develop a systematic method for translation of VerbNet classes from English to other languages which we first apply to Polish and subsequently to Croatian, Mandarin, Japanese, Italian, and Finnish. Our results on Polish demonstrate high translatability with all the classes (96% of English member verbs successfully translated into Polish) and strong inter-annotator agreement, revealing a promising degree of overlap in the resultant classifications. The results on other languages are equally promising. This demonstrates that VerbNet classes have strong cross-lingual potential and the proposed method could be applied to obtain gold standards for automatic verb classification in different languages. We make our annotation guidelines and the six language-specific verb classifications available with this paper. © 2017 The Author(s)</p

UTUPub

Recommended from our members

Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing

Author: Majewska Olga
Publication venue: University of Cambridge
Publication date: 01/02/2021
Field of study

Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs. To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909

Apollo (Cambridge)

Vers la création d'un Verbnet du français

Author: Danlos Laurence
Nakamura Takuya
Pradet Quentin
Publication venue: HAL CCSD
Publication date: 01/07/2014
Field of study

International audienceVerbNet est une ressource lexicale pour les verbes anglais qui est bien utile pour le TAL grâce à sa large couverture et sa classification cohérente. Une telle ressource n'existe pas pour le français malgré quelques tentatives. Nous montrons comment adapter semi-automatiquement VerbNet en utilisant deux ressources lexicales existantes, le LVF (Les Verbes Français) et le LG (Lexique-Grammaire). Abstract. VerbNet is an English lexical resource that has proven useful for NLP due to its high coverage and coherent classification. Such a resource doesn't exist for French, despite some (mostly automatic and unsupervised) at-tempts. We show how to semi-automatically adapt VerbNet using existing lexical resources, namely LVF (Les Verbes Français) and LG (Lexique-Grammaire). Mots-clés : VerbNet, cadres de sous-catégorisations, rôles sémantiques

Hal-Diderot

Recommended from our members

Acquiring verb classes through bottom-up semantic verb clustering

Author: Korhonen A
Majewska O
McCarthy D
Vulić I
Publication venue: LREC 2018 - 11th International Conference on Language Resources and Evaluation
Publication date: 01/01/2018
Field of study

In this paper, we present the first analysis of bottom-up manual semantic clustering of verbs in three languages, English, Polish and Croatian. Verb classes including syntactic and semantic information have been shown to support many NLP tasks by allowing abstraction from individual words and thereby alleviating data sparseness. The availability of such classifications is however still non-existent or limited in most languages. While a range of automatic verb classification approaches have been proposed, high-quality resources and gold standards are needed for evaluation and to improve the performance of NLP systems. We investigate whether semantic verb classes in three different languages can be reliably obtained from native speakers without linguistics training. The analysis of inter-annotator agreement shows an encouraging degree of overlap in the classifications produced for each language individually, as well as across all three languages. Comparative examination of the resultant classifications provides interesting insights into cross-linguistic semantic commonalities and patterns of ambiguity

Apollo (Cambridge)

Adapting VerbNet to French using existing resources

Author: Danlos Laurence
De Chalendar Gaël
Pradet Quentin
Publication venue: HAL CCSD
Publication date: 28/05/2014
Field of study

International audienceVerbNet is an English lexical resource for verbs that has proven useful for English NLP due to its high coverage and coherent classification. Such a resource doesn’t exist for other languages, despite some (mostly automatic and unsupervised) attempts. We show how to semi-automatically adapt VerbNet using existing resources designed for different purposes. This study focuses on French and uses two French resources: a semantic lexicon (Les Verbes Français) and a syntactic lexicon (Lexique-Grammaire)

INRIA a CCSD electronic archive server

HAL-CEA

Hal-Diderot

Predicate Matrix: an interoperable lexical knowledge base for predicates

Author: López de Lacalle Maddalen
Publication venue
Publication date: 10/07/2023
Field of study

183 p.La Matriz de Predicados (Predicate Matrix en inglés) es un nuevo recurso léxico-semántico resultado de la integración de múltiples fuentes de conocimiento, entre las cuales se encuentran FrameNet, VerbNet, PropBank y WordNet. La Matriz de Predicados proporciona un léxico extenso y robusto que permite mejorar la interoperabilidad entre los recursos semánticos mencionados anteriormente. La creación de la Matriz de Predicados se basa en la integración de Semlink y nuevos mappings obtenidos utilizando métodos automáticos que enlazan el conocimiento semántico a nivel léxico y de roles. Asimismo, hemos ampliado la Predicate Matrix para cubrir los predicados nominales (inglés, español) y predicados en otros idiomas (castellano, catalán y vasco). Como resultado, la Matriz de predicados proporciona un léxico multilingüe que permite el análisis semántico interoperable en múltiples idiomas

Archivo Digital para la Docencia y la Investigación