2 research outputs found
Unsupervised word sense disambiguation in dynamic semantic spaces
In this paper, we are mainly concerned with the ability to quickly and
automatically distinguish word senses in dynamic semantic spaces in which new
terms and new senses appear frequently. Such spaces are built '"on the fly"
from constantly evolving data sets such as Wikipedia, repositories of patent
grants and applications, or large sets of legal documents for Technology
Assisted Review and e-discovery. This immediacy rules out supervision as well
as the use of a priori training sets. We show that the various senses of a term
can be automatically made apparent with a simple clustering algorithm, each
sense being a vector in the semantic space. While we only consider here
semantic spaces built by using random vectors, this algorithm should work with
any kind of embedding, provided meaningful similarities between terms can be
computed and do fulfill at least the two basic conditions that terms which
close meanings have high similarities and terms with unrelated meanings have
near-zero similarities.Comment: 7 pages, 1 table, 5 example
Unsupervised detection of diachronic word sense evolution
Most words have several senses and connotations which evolve in time due to
semantic shift, so that closely related words may gain different or even
opposite meanings over the years. This evolution is very relevant to the study
of language and of cultural changes, but the tools currently available for
diachronic semantic analysis have significant, inherent limitations and are not
suitable for real-time analysis. In this article, we demonstrate how the
linearity of random vectors techniques enables building time series of
congruent word embeddings (or semantic spaces) which can then be compared and
combined linearly without loss of precision over any time period to detect
diachronic semantic shifts. We show how this approach yields time trajectories
of polysemous words such as amazon or apple, enables following semantic drifts
and gender bias across time, reveals the shifting instantiations of stable
concepts such as hurricane or president. This very fast, linear approach can
easily be distributed over many processors to follow in real time streams of
social media such as Twitter or Facebook; the resulting, time-dependent
semantic spaces can then be combined at will by simple additions or
subtractions.Comment: 10 pages, 1 figure, 10 table