521 research outputs found

    Vector space models of ancient Greek word meaning, and a case study on homer

    Get PDF
    Our paper describes the creation and evaluation of a Distributional Semantics model of ancient Greek. We developed a vector space model where every word is represented by a vector which encodes information about its linguistic context(s). We validate different vector space models by testing their output against benchmarks obtained from scholarship from the ancient world, modern lexicography, and an NLP resource. Finally, to show how the model can be applied to a research task, we provide the example of a small-scale study of semantic variation in epic formulae, recurring units with limited linguistic flexibility

    Accommodating (global-glocal) paradoxes across event planning

    Get PDF
    The aim of this research note is threefold: 1) to introduce the concept of paradox and its numerous applications to the study and management challenges associated with the planning and delivery of events, with a specific look at large-scale events like the Olympics to provide an extreme case; 2) to present a new paradox entitled the Global–Glocal Paradox that interrogates how inherent global and local stakeholder interests and tensions are managed; and 3) to present a series of conceptual and practical ways events can accommodate as opposed to resolve this paradox to help balance stakeholder interests instead of pitting one against the other

    Room to Glo: A systematic comparison of semantic change detection approaches with word embeddings

    Get PDF
    Word embeddings are increasingly used for the automatic detection of semantic change; yet, a robust evaluation and systematic comparison of the choices involved has been lacking. We propose a new evaluation framework for semantic change detection and find that (i) using the whole time series is preferable over only comparing between the first and last time points; (ii) independently trained and aligned embeddings perform better than continuously trained embeddings for long time periods; and (iii) that the reference point for comparison matters. We also present an analysis of the changes detected on a large Twitter dataset spanning 5.5 years.This work was supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1

    Mining the UK web archive for semantic change detection

    Get PDF
    Semantic change detection (i.e., identify- ing words whose meaning has changed over time) started emerging as a grow- ing area of research over the past decade, with important downstream applications in natural language processing, historical linguistics and computational social sci- ence. However, several obstacles make progress in the domain slow and diffi- cult. These pertain primarily to the lack of well-established gold standard datasets, resources to study the problem at a fine- grained temporal resolution, and quantita- tive evaluation approaches. In this work, we aim to mitigate these issues by (a) re- leasing a new labelled dataset of more than 47K word vectors trained on the UK Web Archive over a short time-frame (2000- 2013); (b) proposing a variant of Pro- crustes alignment to detect words that have undergone semantic shift; and (c) intro- ducing a rank-based approach for evalu- ation purposes. Through extensive nu- merical experiments and validation, we il- lustrate the effectiveness of our approach against competitive baselines. Finally, we also make our resources publicly available to further enable research in the domain.This work was supported by The Alan Turing In- stitute under the EPSRC grant EP/N510129/1 and the seed funding grant SF099

    DUKweb, diachronic word representations from the UK Web Archive corpus

    Get PDF
    Lexical semantic change (detecting shifts in the meaning and usage of words) is an important task for social and cultural studies as well as for Natural Language Processing applications. Diachronic word embeddings (time-sensitive vector representations of words that preserve their meaning) have become the standard resource for this task. However, given the significant computational resources needed for their generation, very few resources exist that make diachronic word embeddings available to the scientific community. In this paper we present DUKweb, a set of large-scale resources designed for the diachronic analysis of contemporary English. DUKweb was created from the JISC UK Web Domain Dataset (1996–2013), a very large archive which collects resources from the Internet Archive that were hosted on domains ending in ‘.uk’. DUKweb consists of a series word co-occurrence matrices and two types of word embeddings for each year in the JISC UK Web Domain dataset. We show the reuse potential of DUKweb and its quality standards via a case study on word meaning change detection
    • …
    corecore