688 research outputs found
Analysing Lexical Semantic Change with Contextualised Word Representations
This paper presents the first unsupervised approach to lexical semantic
change that makes use of contextualised word representations. We propose a
novel method that exploits the BERT neural language model to obtain
representations of word usages, clusters these representations into usage
types, and measures change along time with three proposed metrics. We create a
new evaluation dataset and show that the model representations and the detected
semantic shifts are positively correlated with human judgements. Our extensive
qualitative analysis demonstrates that our method captures a variety of
synchronic and diachronic linguistic phenomena. We expect our work to inspire
further research in this direction.Comment: To appear in Proceedings of the 58th Annual Meeting of the
Association for Computational Linguistics (ACL-2020
Neural models of language use:Studies of language comprehension and production in context
Artificial neural network models of language are mostly known and appreciated today for providing a backbone for formidable AI technologies. This thesis takes a different perspective. Through a series of studies on language comprehension and production, it investigates whether artificial neural networks—beyond being useful in countless AI applications—can serve as accurate computational simulations of human language use, and thus as a new core methodology for the language sciences
UiO-UvA at SemEval-2020 Task 1: Contextualised Embeddings for Lexical Semantic Change Detection
We apply contextualised word embeddings to lexical semantic change detection
in the SemEval-2020 Shared Task 1. This paper focuses on Subtask 2, ranking
words by the degree of their semantic drift over time. We analyse the
performance of two contextualising architectures (BERT and ELMo) and three
change detection algorithms. We find that the most effective algorithms rely on
the cosine similarity between averaged token embeddings and the pairwise
distances between token embeddings. They outperform strong baselines by a large
margin (in the post-evaluation phase, we have the best Subtask 2 submission for
SemEval-2020 Task 1), but interestingly, the choice of a particular algorithm
depends on the distribution of gold scores in the test set.Comment: To appear in Proceedings of the 14th International Workshop on
Semantic Evaluation (SemEval-2020
Construction Repetition Reduces Information Rate in Dialogue
Speakers repeat constructions frequently in dialogue. Due to their peculiar information-theoretic properties, repetitions can be thought of as a strategy for cost-effective communication. In this study, we focus on the repetition of lexicalised constructions—i.e., recurring multi-word units—in English open-domain spoken dialogues. We hypothesise that speakers use construction repetition to mitigate information rate, leading to an overall decrease in utterance information content over the course of a dialogue. We conduct a quantitative analysis, measuring the information content of constructions and that of their containing utterances, estimating information content with an adaptive neural language model. We observe that construction usage lowers the information content of utterances. This facilitating effect (i) increases throughout dialogues, (ii) is boosted by repetition, (iii) grows as a function of repetition frequency and density, and (iv) is stronger for repetitions of referential constructions
Is Information Density Uniform in Task-Oriented Dialogues?
Acknowledgements We would like to thank Jaap Jumelet for a helpful discussion on neural language models, the anonymous EMNLP-2021 reviewers for their valuable comments, as well as the anonymous ACL-2021 reviewers for feedback that led to a considerable improvement of the first version of this paper. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 819455).Publisher PD
- …