323 research outputs found

    Entity linking on microblogs with spatial and temporal signals

    Get PDF
    Invited for oral presentation at Conference on Empirical Methods on Natural Language Processing EMNLP 2014, October 25-29, Doha, Qatar</p

    A Survey of Location Prediction on Twitter

    Full text link
    Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

    Localized Events in Social Media Streams: Detection, Tracking, and Recommendation

    Get PDF
    From the recent proliferation of social media channels to the immense amount of user-generated content, an increasing interest in social media mining is currently being witnessed. Messages continuously posted via these channels report a broad range of topics from daily life to global and local events. As a consequence, this has opened new opportunities for mining event information crucial in many application domains, especially in increasing the situational awareness in critical scenarios. Interestingly, many of these messages are enriched with location information, due to the wide- spread of mobile devices and the recent advancements of today’s location acquisition techniques. This enables location-aware event mining, i.e., the detection and tracking of localized events. In this thesis, we propose novel frameworks and models that digest social media content for localized event detection, tracking, and recommendation. We first develop KeyPicker, a framework to extract and score event-related keywords in an online fashion, accounting for high levels of noise, temporal heterogeneity and outliers in the data. Then, LocEvent is proposed to incrementally detect and track events using a 4-stage procedure. That is, LocEvent receives the keywords extracted by KeyPicker, identifies local keywords, spatially clusters them, and finally scores the generated clusters. For each detected event, a set of descriptive keywords, a location, and a time interval are estimated at a fine-grained resolution. In addition to the sparsity of geo-tagged messages, people sometimes post about events far away from an event’s location. Such spatial problems are handled by novel spatial regularization techniques, namely, graph- and gazetteer-based regularization. To ensure scalability, we utilize a hierarchical spatial index in addition to a multi-stage filtering procedure that gradually suppresses noisy words and considers only event-related ones for complex spatial computations. As for recommendation applications, we propose an event recommender system built upon model-based collaborative filtering. Our model is able to suggest events to users, taking into account a number of contextual features including the social links between users, the topical similarities of events, and the spatio-temporal proximity between users and events. To realize this model, we employ and adapt matrix factorization, which allows for uncovering latent user-event patterns. Our proposed features contribute to directing the learning process towards recommendations that better suit the taste of users, in particular when new users have very sparse (or even no) event attendance history. To evaluate the effectiveness and efficiency of our proposed approaches, extensive comparative experiments are conducted using datasets collected from social media channels. Our analysis of the experimental results reveals the superiority and advantages of our frameworks over existing methods in terms of the relevancy and precision of the obtained results

    Inferring Missing Entity Type Instances for Knowledge Base Completion: New Dataset and Methods

    Full text link
    Most of previous work in knowledge base (KB) completion has focused on the problem of relation extraction. In this work, we focus on the task of inferring missing entity type instances in a KB, a fundamental task for KB competition yet receives little attention. Due to the novelty of this task, we construct a large-scale dataset and design an automatic evaluation methodology. Our knowledge base completion method uses information within the existing KB and external information from Wikipedia. We show that individual methods trained with a global objective that considers unobserved cells from both the entity and the type side gives consistently higher quality predictions compared to baseline methods. We also perform manual evaluation on a small subset of the data to verify the effectiveness of our knowledge base completion methods and the correctness of our proposed automatic evaluation method.Comment: North American Chapter of the Association for Computational Linguistics- Human Language Technologies, 201

    Temporal Information Models for Real-Time Microblog Search

    Get PDF
    Real-time search in Twitter and other social media services is often biased towards the most recent results due to the “in the moment” nature of topic trends and their ephemeral relevance to users and media in general. However, “in the moment”, it is often difficult to look at all emerging topics and single-out the important ones from the rest of the social media chatter. This thesis proposes to leverage on external sources to estimate the duration and burstiness of live Twitter topics. It extends preliminary research where itwas shown that temporal re-ranking using external sources could indeed improve the accuracy of results. To further explore this topic we pursued three significant novel approaches: (1) multi-source information analysis that explores behavioral dynamics of users, such as Wikipedia live edits and page view streams, to detect topic trends and estimate the topic interest over time; (2) efficient methods for federated query expansion towards the improvement of query meaning; and (3) exploiting multiple sources towards the detection of temporal query intent. It differs from past approaches in the sense that it will work over real-time queries, leveraging on live user-generated content. This approach contrasts with previous methods that require an offline preprocessing step
    • …
    corecore