304 research outputs found

    Learning Compositionality Functions on Word Embeddings for Modelling Attribute Meaning in Adjective-Noun Phrases

    Get PDF
    Hartung M, Kaupmann F, Jebbara S, Cimiano P. Learning Compositionality Functions on Word Embeddings for Modelling Attribute Meaning in Adjective-Noun Phrases. In: Proceedings of the 15th Meeting of the European Chapter of the Association for Computational Linguistics (EACL). Vol Vol. 1 (Long Papers). 2017: 54-64.Word embeddings have been shown to be highly effective in a variety of lexical semantic tasks. They tend to capture meaningful relational similarities between individual words, at the expense of lacking the capabilty of making the underlying semantic relation explicit. In this paper, we investigate the attribute relation that often holds between the constituents of adjective-noun phrases. We use CBOW word embeddings to represent word meaning and learn a compositionality function that combines the individual constituents into a phrase representation, thus capturing the compositional attribute meaning. The resulting embedding model, while being fully interpretable, outperforms count- based distributional vector space models that are tailored to attribute meaning in the two tasks of attribute selection and phrase similarity prediction. Moreover, as the model captures a generalized layer of attribute meaning, it bears the potential to be used for predictions over various attribute inventories without re-training

    Does CLIP Bind Concepts? Probing Compositionality in Large Image Models

    Full text link
    Large-scale neural network models combining text and images have made incredible progress in recent years. However, it remains an open question to what extent such models encode compositional representations of the concepts over which they operate, such as correctly identifying ''red cube'' by reasoning over the constituents ''red'' and ''cube''. In this work, we focus on the ability of a large pretrained vision and language model (CLIP) to encode compositional concepts and to bind variables in a structure-sensitive way (e.g., differentiating ''cube behind sphere'' from ''sphere behind cube''). In order to inspect the performance of CLIP, we compare several architectures from research on compositional distributional semantics models (CDSMs), a line of research that attempts to implement traditional compositional linguistic structures within embedding spaces. We find that CLIP can compose concepts in a single-object setting, but in situations where concept binding is needed, performance drops dramatically. At the same time, CDSMs also perform poorly, with best performance at chance level

    Nouns in the Conceptual Framework "Node of Knowledge"

    Get PDF
    The "Node of Knowledge" method is one of the elements of the conceptual framework "Node of Knowledge (NOK)". It enables knowledge representation in graphical and formalized (textual) form and makes it possible to store formalized records of natural language sentences in a relational database. To enable correct transformation of all words from natural language sentences to formalized records, it is necessary to design a metamodel of a language, i.e. to analyse all word types of each particular natural language and define rules for transformation of sentences in natural language into formalized records. This paper analyses nouns in Croatian and English language. It presents rules for transformation of nouns and noun phrase structures into formalized records, with examples in both languages. The system is preliminary tested with a small set of sentences (used as input knowledge) and questions. Testing results are presented and discussed

    Discovering multiword expressions

    Get PDF
    In this paper, we provide an overview of research on multiword expressions (MWEs), from a natural lan- guage processing perspective. We examine methods developed for modelling MWEs that capture some of their linguistic properties, discussing their use for MWE discovery and for idiomaticity detection. We con- centrate on their collocational and contextual preferences, along with their fixedness in terms of canonical forms and their lack of word-for-word translatatibility. We also discuss a sample of the MWE resources that have been used in intrinsic evaluation setups for these methods

    Eesti keele ühendverbide automaattuvastus lingvistiliste ja statistiliste meetoditega

    Get PDF
    Tänapäeval on inimkeeli (kaasa arvatud eesti keelt) töötlevad tehnoloogiaseadmed igapäevaelu osa, kuid arvutite „keeleoskus“ pole kaugeltki täiuslik. Keele automaattöötluse kõige rohkem kasutust leidev rakendus on ilmselt masintõlge. Ikka ja jälle jagatakse sotsiaalmeedias, kuidas tuntud süsteemid (näiteks Google Translate) midagi valesti tõlgivad. Enamasti tekitavad absurdse olukorra mitmest sõnast koosnevad fraasid või laused. Näiteks ei suuda tõlkesüsteemid tabada lauses „Ta läks lepinguga alt“ ühendi alt minema tähendust petta saama, sest õige tähenduse edastamiseks ei saa selle ühendi komponente sõna-sõnalt tõlkida ja seetõttu satubki arvuti hätta. Selleks et nii masintõlkesüsteemide kui ka teiste kasulike rakenduste nagu libauudiste tuvastuse või küsimus-vastus süsteemide kvaliteet paraneks, on oluline, et arvuti oskaks tuvastada mitmesõnalisi üksuseid ja nende eri tähendusi, mida inimesed konteksti põhjal üpriski lihtalt teha suudavad. Püsiühendite (tähenduse) automaattuvastus on oluline kõikides keeltes ja on seetõttu pälvinud arvutilingvistikas rohkelt tähelepanu. Seega on eriti inglise keele põhjal välja pakutud terve hulk meetodeid, mida pole siiamaani eesti keele püsiühendite tuvastamiseks rakendatud. Doktoritöös kasutataksegi masinõppe meetodeid, mis on teiste keelte püsiühendite tuvastamisel edukad olnud, üht liiki eesti keele püsiühendi – ühendverbi – automaatseks tuvastamiseks. Töös demonstreeritakse suurte tekstiandmete põhjal, et seni eesti keele traditsioonilises käsitluses esitatud eesti keele ühendverbide jaotus ainukordseteks (ühendi komponentide koosesinemisel tekib uus tähendus) ja korrapärasteks (ühendi tähendus on tema komponentide summa) ei ole piisavalt põhjalik. Nimelt kinnitab töö arvutilingvistilistes uurimustes laialt levinud arusaama, et püsiühendid (k.a ühendverbid) jaotuvad skaalale, mille ühes otsas on ühendid, mille tähendus on selgelt komponentide tähenduste summa. ja teises need ühendid, mis saavad uue tähenduse. Uurimus näitab, et lisaks kontekstile aitavad arvutil tuvastada ühendverbi õiget tähendust mitmed teised tunnuseid, näiteks subjekti ja objekti elusus ja käänded. Doktoritöö raames valminud andmestikud ja vektoresitused on vajalikud uued ressursid, mis on avalikud edaspidisteks uurimusteks.Nowadays, applications that process human languages (including Estonian) are part of everyday life. However, computers are not yet able to understand every nuance of language. Machine translation is probably the most well-known application of natural language processing. Occasionally, the worst failures of machine translation systems (e.g. Google Translate) are shared on social media. Most of such cases happen when sequences longer than words are translated. For example, translation systems are not able to catch the correct meaning of the particle verb alt (‘from under’) minema (‘to go’) (‘to get deceived’) in the sentence Ta läks lepinguga alt because the literal translation of the components of the expression is not correct. In order to improve the quality of machine translation systems and other useful applications, e.g. spam detection or question answering systems, such (idiomatic) multi-word expressions and their meanings must be well detected. The detection of multi-word expressions and their meaning is important in all languages and therefore much research has been done in the field, especially in English. However, the suggested methods have not been applied to the detection of Estonian multi-word expressions before. The dissertation fills that gap and applies well-known machine learning methods to detect one type of Estonian multi-word expressions – the particle verbs. Based on large textual data, the thesis demonstrates that the traditional binary division of Estonian particle verbs to non-compositional (ainukordne, meaning is not predictable from the meaning of its components) and compositional (korrapärane, meaning is predictable from the meaning of its components) is not comprehensive enough. The research confirms the widely adopted view in computational linguistics that the multi-word expressions form a continuum between the compositional and non-compositional units. Moreover, it is shown that in addition to context, there are some linguistic features, e.g. the animacy and cases of subject and object that help computers to predict whether the meaning of a particle verb in a sentence is compositional or non-compositional. In addition, the research introduces novel resources for Estonian language – trained embeddings and created compositionality datasets are available for the future research.https://www.ester.ee/record=b5252157~S
    corecore