15 research outputs found
Distributed Smoothed Tree Kernel
In this paper we explore
the possibility to merge the world of
Compositional Distributional Semantic
Models (CDSM) with Tree Kernels
(TK). In particular, we will introduce a
specific tree kernel (smoothed tree kernel,
or STK) and then show that is
possibile to approximate such kernel
with the dot product of two vectors
obtained compositionally from the sentences,
creating in such a way a new
CDSM
Is Structure Necessary for Modeling Argument Expectations in Distributional Semantics?
Despite the number of NLP studies dedicated to thematic fit estimation,
little attention has been paid to the related task of composing and updating
verb argument expectations. The few exceptions have mostly modeled this
phenomenon with structured distributional models, implicitly assuming a
similarly structured representation of events. Recent experimental evidence,
however, suggests that human processing system could also exploit an
unstructured "bag-of-arguments" type of event representation to predict
upcoming input. In this paper, we re-implement a traditional structured model
and adapt it to compare the different hypotheses concerning the degree of
structure in our event knowledge, evaluating their relative performance in the
task of the argument expectations update.Comment: conference paper, IWC
Indra: A word embedding and semantic relatedness server
In recent years word embedding/distributional semantic models evolved to become a fundamental component in many natural language
processing (NLP) architectures due to their ability of capturing and quantifying semantic associations at scale. Word embedding models
can be used to satisfy recurrent tasks in NLP such as lexical and semantic generalisation in machine learning tasks, finding similar or
related words and computing semantic relatedness of terms. However, building and consuming specific word embedding models require
the setting of a large set of configurations, such as corpus-dependant parameters, distance measures as well as compositional models.
Despite their increasing relevance as a component in NLP architectures, existing frameworks provide limited options in their ability
to systematically build, parametrise, compare and evaluate different models. To answer this demand, this paper describes INDRA, a
multi-lingual word embedding/distributional semantics framework which supports the creation, use and evaluation of word embedding
models. In addition to the tool, INDRA also shares more than 65 pre-computed models in 14 languages
Indra: A word embedding and semantic relatedness server
In recent years word embedding/distributional semantic models evolved to become a fundamental component in many natural language
processing (NLP) architectures due to their ability of capturing and quantifying semantic associations at scale. Word embedding models
can be used to satisfy recurrent tasks in NLP such as lexical and semantic generalisation in machine learning tasks, finding similar or
related words and computing semantic relatedness of terms. However, building and consuming specific word embedding models require
the setting of a large set of configurations, such as corpus-dependant parameters, distance measures as well as compositional models.
Despite their increasing relevance as a component in NLP architectures, existing frameworks provide limited options in their ability
to systematically build, parametrise, compare and evaluate different models. To answer this demand, this paper describes INDRA, a
multi-lingual word embedding/distributional semantics framework which supports the creation, use and evaluation of word embedding
models. In addition to the tool, INDRA also shares more than 65 pre-computed models in 14 languages
Distributed Smoothed Tree Kernel
In this paper we explore the possibility to merge the world of Compositional Distributional Semantic Models (CDSM) with Tree Kernels (TK). In particular, we will introduce a specific tree kernel (smoothed tree kernel, or STK) and then show that is possibile to approximate such kernel with the dot product of two vectors obtained compositionally from the sentences, creating in such a way a new CDSM
Vector space models of ancient Greek word meaning, and a case study on homer
Our paper describes the creation and evaluation of a Distributional Semantics model of ancient Greek. We developed a vector space model where every word is represented by a vector which encodes information about its linguistic context(s). We validate different vector space models by testing their output against benchmarks obtained from scholarship from the ancient world, modern lexicography, and an NLP resource. Finally, to show how the model can be applied to a research task, we provide the example of a small-scale study of semantic variation in epic formulae, recurring units with limited linguistic flexibility
Panta rei: Tracking Semantic Change with Distributional Semantics in Ancient Greek
We present a method to explore semantic change as a function of variation in distributional semantic spaces. In this paper, we apply this approach to automatically identify the areas of semantic change in the lexicon of Ancient Greek between the pre-Christian and Christian era. Distributional Semantic Models are used to identify meaningful clusters and patterns of semantic shift within a set of target words, defined through a purely data-driven approach. The results emphasize the role played by the diffusion of Christianity and by technical languages in determining semantic change in Ancient Greek and show the potentialities of distributional models in diachronic semantics
A distributional semantic approach to the periodization of change in the productivity of constructions
Abstract
This paper describes a method to automatically identify stages of language change in diachronic corpus data, combining variability-based neighbour clustering, which offers objective and reproducible criteria for periodization, and distributional semantics as a representation of lexical meaning. This method partitions the history of a grammatical construction according to qualitative stages of productivity corresponding to different semantic sets of lexical items attested in it. Two case studies are presented. The first case study on the hell-construction (“Verb the hell out of NP”) shows that the semantic development of a construction does not always match that of its quantitative aspects, like token or type frequency. The second case study on the way-construction compares the results of the present method with those of collostructional analysis. It is shown that the former measures semantic changes and their chronology with greater precision. In sum, this method offers a promising approach to exploring semantic variation in the lexical fillers of constructions and to modelling constructional change.</jats:p