7 research outputs found
On Singles, Couples and Extended Families. Measuring Overlapping between Latin Vallex and Latin WordNet
Different lexical resources may pursue different views on lexical meaning. However, all of them deal with lexical items as common basic components, which are described according to criteria that may vary from one resource to another. In this paper, we present a method for measuring the degree of similarity between a valency-based lexical resource and a WordNet. This is motivated by both theoretical and practical reasons. As for the former, we wonder if there are lexical classes that "impose" themselves regardless of the fact that they are explicitly recorded as such in source lexical resources. As for the latter, our work wants to contribute to the research task dealing with merging lexical resources. In order to apply and evaluate our method, we propose a normalized coefficient of overlapping that measures the overlapping rate between a valency lexicon and a WordNet. In particular, in the context of the exploitation of the linguistic resources for ancient languages built over the last decade, we compute and evaluate the overlapping between a selection of homogeneous lexical subsets extracted from two lexical resources for Latin
Recommended from our members
Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing
Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs.
To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909
Důležitá slova. Podklady ke kolokačnímu švédsko-českému slovníku základních sloves
Basic verbs, i.e. very common verbs that typically denote physical movements, locations, states or actions, undergo various semantic shifts and acquire different secondary uses. In extreme cases, the distribution of secondary uses grows so general that they are regarded as auxiliary verbs (go and to be going to), phase verbs (turn, grow), etc. ese uses are usually well-documented by grammars and language textbooks, and so are idiomatic expressions (phraseologisms) in dictionaries. ere is, however, a grey area in between, which is extremely difficult to learn for non-native speakers. is consists of secondary uses with limited collocability, in particular light verb constructions, and secondary meanings that only get activated under particular morphosyntactic conditions. e basic-verb secondary uses and constructions are usually semantically transparent, such that they do not pose understanding problems, but they are generally unpredictable and language-specific, such that they easily become an issue in non-native text production. In this thesis, Swedish basic verbs are approached from the contrastive point of view of an advanced Czech learner of Swedish. A selection of Swedish constructions with basic verbs is explored. e observations result in a proposal for the structure of a machine-readable Swedish-Czech...Základní slovesa (basic verbs), tj. frekventovaná významová slovesa, jež zpravidla popisují fyzický pohyb, umístění, stav, nebo děj, procházejí řadou sémantických posunů, díky kterým se používají k vyjádření druhotných, přenesených významů. V krajních případech se dané sloveso stává pomocným, způsobovým, nebo fázovým slovesem a přestávají pro ně platit kolokační omezení, jež se vztahují na sloveso užité v jeho primárním (tj. doslovném) významu. Tato užití sloves bývají většinou dobře dokumentována v gramatikách i učebnicích, stejně jako kvalitní slovníky podávají podrobnou informaci o užití těchto sloves v ustálených frazeologických spojeních. Mezi plně gramatikalizovaným užitím na jedné straně a idiomatickým, frazeologickým užitím na druhé straně však existuje celá škála užití základních sloves v přenesených významech, jejíž zvládnutí je pro nerodilého mluvčího značně obtížné: užití v přeneseném významu, jež mají omezenou kolokabilitu. To jsou především verbonominální konstrukce někdy nazývané analytické predikáty (light verb constructions), ale také užití, která za určitých omezených morfosyntaktických podmínek (např. pouze v negaci) aktivují abstraktní sémantické rysy u jiných predikátů, např. zesilují význam, nebo implikují, že daný děj již trvá dlouho, a podobně. Tato druhotná užití významových sloves...Institute of Germanic StudiesÚstav germánských studiíFilozofická fakultaFaculty of Art
Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan languages
Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan Languages publishes 22 papers that were presented at the conference organised in Dubrovnik, Croatia, 25-28 Septembre 2008
A computational approach to Latin verbs: new resources and methods
Questa tesi presenta l'applicazione di metodi computazionali allo studio dei verbi latini. In particolare, mostriamo la creazione di un lessico di sottocategorizzazione estratto automaticamente da corpora annotati; inoltre presentiamo un modello probabilistico per l'acquisizione di preferenze di selezione a partire da corpora annotati e da un'ontologia (Latin WordNet). Infine, descriviamo i risultati di uno studio diacronico e quantitativo sui preverbi spaziali latini