521 research outputs found
Vector space models of ancient Greek word meaning, and a case study on homer
Our paper describes the creation and evaluation of a Distributional Semantics model of ancient Greek. We developed a vector space model where every word is represented by a vector which encodes information about its linguistic context(s). We validate different vector space models by testing their output against benchmarks obtained from scholarship from the ancient world, modern lexicography, and an NLP resource. Finally, to show how the model can be applied to a research task, we provide the example of a small-scale study of semantic variation in epic formulae, recurring units with limited linguistic flexibility
Accommodating (global-glocal) paradoxes across event planning
The aim of this research note is threefold: 1) to introduce the concept of paradox and its numerous applications to the study and management challenges associated with the planning and delivery of events, with a specific look at large-scale events like the Olympics to provide an extreme case; 2) to present a new paradox entitled the Global–Glocal Paradox that interrogates how inherent global and local stakeholder interests and tensions are managed; and 3) to present a series of conceptual and practical ways events can accommodate as opposed to resolve this paradox to help balance stakeholder interests instead of pitting one against the other
Recommended from our members
A distributional semantic methodology for enhanced search in historical records: A case study on smell
In this paper we present a methodology based on distributional semantic models that can be flexibly adapted to the specific challenges posed by historical texts and that allow users to retrieve semantically relevant text without the need to close-read the documents. We focus on a case study concerned with detecting smell-related sentences in historical medical reports. We demonstrate a process for moving from generic domain label input to a more nuanced evaluation of the semantics of smell in a set of sentences extracted from this corpus, and then develop a machine learning technique for compounding scores on a variety of modelling parameters into more effective classifications.This work was supported by the Chist-ERA Atlantis project. This work was supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1
Room to Glo: A systematic comparison of semantic change detection approaches with word embeddings
Word embeddings are increasingly used for
the automatic detection of semantic change;
yet, a robust evaluation and systematic comparison
of the choices involved has been lacking.
We propose a new evaluation framework
for semantic change detection and find that (i)
using the whole time series is preferable over
only comparing between the first and last time
points; (ii) independently trained and aligned
embeddings perform better than continuously
trained embeddings for long time periods; and
(iii) that the reference point for comparison
matters. We also present an analysis of the
changes detected on a large Twitter dataset
spanning 5.5 years.This work was supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1
Mining the UK web archive for semantic change detection
Semantic change detection (i.e., identify- ing words whose meaning has changed over time) started emerging as a grow- ing area of research over the past decade, with important downstream applications in natural language processing, historical linguistics and computational social sci- ence. However, several obstacles make progress in the domain slow and diffi- cult. These pertain primarily to the lack of well-established gold standard datasets, resources to study the problem at a fine- grained temporal resolution, and quantita- tive evaluation approaches. In this work, we aim to mitigate these issues by (a) re- leasing a new labelled dataset of more than 47K word vectors trained on the UK Web Archive over a short time-frame (2000- 2013); (b) proposing a variant of Pro- crustes alignment to detect words that have undergone semantic shift; and (c) intro- ducing a rank-based approach for evalu- ation purposes. Through extensive nu- merical experiments and validation, we il- lustrate the effectiveness of our approach against competitive baselines. Finally, we also make our resources publicly available to further enable research in the domain.This work was supported by The Alan Turing In- stitute under the EPSRC grant EP/N510129/1 and the seed funding grant SF099
DUKweb, diachronic word representations from the UK Web Archive corpus
Lexical semantic change (detecting shifts in the meaning and usage of words) is an important task for social and cultural studies as well as for Natural Language Processing applications. Diachronic word embeddings (time-sensitive vector representations of words that preserve their meaning) have become the standard resource for this task. However, given the significant computational resources needed for their generation, very few resources exist that make diachronic word embeddings available to the scientific community. In this paper we present DUKweb, a set of large-scale resources designed for the diachronic analysis of contemporary English. DUKweb was created from the JISC UK Web Domain Dataset (1996–2013), a very large archive which collects resources from the Internet Archive that were hosted on domains ending in ‘.uk’. DUKweb consists of a series word co-occurrence matrices and two types of word embeddings for each year in the JISC UK Web Domain dataset. We show the reuse potential of DUKweb and its quality standards via a case study on word meaning change detection
- …