Inferring the location of authors from words in their texts

Abstract

For the purposes of computational dialec- tology or other geographically bound text analysis tasks, texts must be annotated with their or their authors’ location. Many texts are locatable but most have no ex- plicit annotation of place. This paper describes a series of experiments to de- termine how positionally annotated mi- croblog posts can be used to learn loca- tion indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We in- troduce the notion of placeness to describe how locational words are. We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating lo- cational information in a centroid for each text gives the most useful results. The re- sults are applied to data in the Swedish language. Qc 20150618SINUS (Spridning av innovationer i nutida svenska

    Similar works