420 research outputs found
Human Associations Help to Detect Conventionalized Multiword Expressions
In this paper we show that if we want to obtain human evidence about
conventionalization of some phrases, we should ask native speakers about
associations they have to a given phrase and its component words. We have shown
that if component words of a phrase have each other as frequent associations,
then this phrase can be considered as conventionalized. Another type of
conventionalized phrases can be revealed using two factors: low entropy of
phrase associations and low intersection of component word and phrase
associations. The association experiments were performed for the Russian
language
Discovering multiword expressions
In this paper, we provide an overview of research on multiword expressions (MWEs), from a natural lan- guage processing perspective. We examine methods developed for modelling MWEs that capture some of their linguistic properties, discussing their use for MWE discovery and for idiomaticity detection. We con- centrate on their collocational and contextual preferences, along with their fixedness in terms of canonical forms and their lack of word-for-word translatatibility. We also discuss a sample of the MWE resources that have been used in intrinsic evaluation setups for these methods
An Examination of the Compositionality of Large Generative Vision-Language Models
With the success of Large Language Models (LLMs), a surge of Generative
Vision-Language Models (GVLMs) have been constructed via multimodal instruction
tuning. The tuning recipe substantially deviates from the common contrastive
vision-language learning. However, the performance of GVLMs in multimodal
compositional reasoning remains largely unexplored, as existing evaluation
metrics and benchmarks focus predominantly on assessing contrastive models like
CLIP. In this paper, we examine the potential evaluation metrics to assess the
GVLMs and hypothesize generative score methods are suitable for evaluating
compositionality. In addition, current benchmarks tend to prioritize syntactic
correctness over semantics. The presence of morphological bias in these
benchmarks can be exploited by GVLMs, leading to ineffective evaluations. To
combat this, we define a MorphoBias Score to quantify the morphological bias
and propose a novel LLM-based strategy to calibrate the bias. Moreover, a
challenging task is added to evaluate the robustness of GVLMs against inherent
inclination toward syntactic correctness. We include the calibrated dataset and
the task into a new benchmark, namely MOrphologicall De-biased Benchmark
(MODE). Our study provides the first unbiased benchmark for the
compositionality of GVLMs, facilitating future research in this direction. We
will release our code and datasets
Unsupervised compositionality prediction of nominal compounds
Nominal compounds such as red wine and nut case display a continuum of compositionality, with varying contributions from the components of the compound to its semantics. This article proposes a framework for compound compositionality prediction using distributional semantic models, evaluating to what extent they capture idiomaticity compared to human judgments. For evaluation, we introduce data sets containing human judgments in three languages: English, French, and Portuguese. The results obtained reveal a high agreement between the models and human predictions, suggesting that they are able to incorporate information about idiomaticity. We also present an in-depth evaluation of various factors that can affect prediction, such as model and corpus parameters and compositionality operations. General crosslingual analyses reveal the impact of morphological variation and corpus size in the ability of the model to predict compositionality, and of a uniform combination of the components for best results
- …