223 research outputs found

    Dynamic Contextualized Word Embeddings

    Get PDF
    Static word embeddings that represent words by a single vector cannot capture the variability of word meaning in different linguistic and extralinguistic contexts. Building on prior work on contextualized and dynamic word embeddings, we introduce dynamic contextualized word embeddings that represent words as a function of both linguistic and extralinguistic context. Based on a pretrained language model (PLM), dynamic contextualized word embeddings model time and social space jointly, which makes them attractive for a range of NLP tasks involving semantic variability. We highlight potential application scenarios by means of qualitative and quantitative analyses on four English datasets

    Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time

    Full text link
    Dynamic topic modeling facilitates the identification of topical trends over time in temporal collections of unstructured documents. We introduce a novel unsupervised neural dynamic topic model named as Recurrent Neural Network-Replicated Softmax Model (RNNRSM), where the discovered topics at each time influence the topic discovery in the subsequent time steps. We account for the temporal ordering of documents by explicitly modeling a joint distribution of latent topical dependencies over time, using distributional estimators with temporal recurrent connections. Applying RNN-RSM to 19 years of articles on NLP research, we demonstrate that compared to state-of-the art topic models, RNNRSM shows better generalization, topic interpretation, evolution and trends. We also introduce a metric (named as SPAN) to quantify the capability of dynamic topic model to capture word evolution in topics over time.Comment: In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018

    ā€žDuboka leksikografijaā€ ā€“ pomodnost ili prilika?

    Get PDF
    In recent years, we are witnessing staggering improvements in various semantic data processing tasks due to the developments in the area of deep learning, ranging from image and video processing to speech processing, and natural language understanding. In this paper, we discuss the opportunities and challenges that these developments pose for the area of electronic lexicography. We primarily focus on the concept of representation learning of the basic elements of language, namely words, and the applicability of these word representations to lexicography. We first discuss well-known approaches to learning static representations of words, the so-called word embeddings, and their usage in lexicography-related tasks such as semantic shift detection, and cross-lingual prediction of lexical features such as concreteness and imageability. We wrap up the paper with the most recent developments in the area of word representation learning in form of learning dynamic, context-aware representations of words, showcasing some dynamic word embedding examples, and discussing improvements on lexicography-relevant tasks of word sense disambiguation and word sense induction.Posljednjih smo godina svjedoci velikoga napretka u različitim zadatcima inteligentne obrade podataka koji je posljedica razvoja dubokoga učenja. ti zadatci uključuju i obradu slike, videa, govora te razumijevanje jezika. u ovome se radu raspravlja o prilikama i izazovima koje taj napredak omogućuje u području digitalne leksikografije. Veći se dio rada odnosi na učenje prikaza različitih elemenata jezika ā€“ riječi, leksema te izjava ā€“ i njihovu primjenu u leksikografiji. Prikazuju se dobro poznati postupci učenja statičkih vektorskih prikaza riječi te njihova primjena u zadatcima poput prepoznavanja semantičkih pomaka te predviđanja leksičkih značajka riječi. U radu se dalje govori o viÅ”ejezičnoj razini učenja prikaza riječi te se rad zaključuje prikazom novijih postignuća u području strojnoga razumijevanja jezika ā€“ dinamičkih, kontekstnih prikaza riječi

    A comparative study of approaches for the diachronic analysis of the Italian language

    Get PDF
    In recent years, there has been a significant increase in interest in lexical semantic change detection. Many are the existing approaches, data used, and evaluation strategies to detect semantic drift. Most of those approaches rely on diachronic word embeddings. Some of them are created as post-processing of static word embeddings, while others produce dynamic word embeddings where vectors share the same geometric space for all time slices. The large majority of the methods use English as the target language for the diachronic analysis, while other languages remain under-explored. In this work, we compare state-of-the-art approaches in computational historical linguistics to evaluate the pros and cons of each model, and we present the results of an in-depth analysis conducted using an Italian diachronic corpus. Specifically, several approaches based on both static embeddings and dynamic ones are implemented and evaluated by using the Kronos-It dataset. We train all word embeddings on the Italian Google n-gram corpus. The main result of the evaluation is that all approaches fail to significantly reduce the number of false-positive change points, which confirms that lexical semantic change is still a challenging task
    • ā€¦
    corecore