4,557 research outputs found

    A Local-Global LDA Model for Discovering Geographical Topics from Social Media

    Full text link
    Micro-blogging services can track users' geo-locations when users check-in their places or use geo-tagging which implicitly reveals locations. This "geo tracking" can help to find topics triggered by some events in certain regions. However, discovering such topics is very challenging because of the large amount of noisy messages (e.g. daily conversations). This paper proposes a method to model geographical topics, which can filter out irrelevant words by different weights in the local and global contexts. Our method is based on the Latent Dirichlet Allocation (LDA) model but each word is generated from either a local or a global topic distribution by its generation probabilities. We evaluated our model with data collected from Weibo, which is currently the most popular micro-blogging service for Chinese. The evaluation results demonstrate that our method outperforms other baseline methods in several metrics such as model perplexity, two kinds of entropies and KL-divergence of discovered topics

    Measuring, Predicting and Visualizing Short-Term Change in Word Representation and Usage in VKontakte Social Network

    Full text link
    Language in social media is extremely dynamic: new words emerge, trend and disappear, while the meaning of existing words can fluctuate over time. Such dynamics are especially notable during a period of crisis. This work addresses several important tasks of measuring, visualizing and predicting short term text representation shift, i.e. the change in a word's contextual semantics, and contrasting such shift with surface level word dynamics, or concept drift, observed in social media streams. Unlike previous approaches on learning word representations from text, we study the relationship between short-term concept drift and representation shift on a large social media corpus - VKontakte posts in Russian collected during the Russia-Ukraine crisis in 2014-2015. Our novel contributions include quantitative and qualitative approaches to (1) measure short-term representation shift and contrast it with surface level concept drift; (2) build predictive models to forecast short-term shifts in meaning from previous meaning as well as from concept drift; and (3) visualize short-term representation shift for example keywords to demonstrate the practical use of our approach to discover and track meaning of newly emerging terms in social media. We show that short-term representation shift can be accurately predicted up to several weeks in advance. Our unique approach to modeling and visualizing word representation shifts in social media can be used to explore and characterize specific aspects of the streaming corpus during crisis events and potentially improve other downstream classification tasks including real-time event detection
    • …
    corecore