Inferring the location of authors from words in their texts

Bergren, Max; Karlgren, Jussi; Parkvall, Mikael; Östling, Robert

Inferring the location of authors from words in their texts

Authors: Max Bergren
Jussi Karlgren
Mikael Parkvall
Robert Östling
Publication date: 1 January 2015
Publisher

Abstract

For the purposes of computational dialec- tology or other geographically bound text analysis tasks, texts must be annotated with their or their authors’ location. Many texts are locatable but most have no ex- plicit annotation of place. This paper describes a series of experiments to de- termine how positionally annotated mi- croblog posts can be used to learn loca- tion indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We in- troduce the notion of placeness to describe how locational words are. We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating lo- cational information in a centroid for each text gives the most useful results. The re- sults are applied to data in the Swedish language. Qc 20150618SINUS (Spridning av innovationer i nutida svenska

Similar works

Full text

Available Versions

Digitala Vetenskapliga Arkivet - Academic Archive On-line

oai:DiVA.org:kth-169619

Last time updated on 25/05/2016

Publikationer från KTH

oai:DiVA.org:kth-169619

Last time updated on 16/06/2016

Academic Archive On-line (KTH, Sweden)

oai:DiVA.org:kth-169619

Last time updated on 09/02/2018