5 research outputs found

    On the Impact of Entity Linking in Microblog Real-Time Filtering

    Full text link
    Microblogging is a model of content sharing in which the temporal locality of posts with respect to important events, either of foreseeable or unforeseeable nature, makes applica- tions of real-time filtering of great practical interest. We propose the use of Entity Linking (EL) in order to improve the retrieval effectiveness, by enriching the representation of microblog posts and filtering queries. EL is the process of recognizing in an unstructured text the mention of relevant entities described in a knowledge base. EL of short pieces of text is a difficult task, but it is also a scenario in which the information EL adds to the text can have a substantial impact on the retrieval process. We implement a start-of-the-art filtering method, based on the best systems from the TREC Microblog track realtime adhoc retrieval and filtering tasks , and extend it with a Wikipedia-based EL method. Results show that the use of EL significantly improves over non-EL based versions of the filtering methods.Comment: 6 pages, 1 figure, 1 table. SAC 2015, Salamanca, Spain - April 13 - 17, 201

    Knowledge-based Query Expansion in Real-Time Microblog Search

    Full text link
    Since the length of microblog texts, such as tweets, is strictly limited to 140 characters, traditional Information Retrieval techniques suffer from the vocabulary mismatch problem severely and cannot yield good performance in the context of microblogosphere. To address this critical challenge, in this paper, we propose a new language modeling approach for microblog retrieval by inferring various types of context information. In particular, we expand the query using knowledge terms derived from Freebase so that the expanded one can better reflect users' search intent. Besides, in order to further satisfy users' real-time information need, we incorporate temporal evidences into the expansion method, which can boost recent tweets in the retrieval results with respect to a given topic. Experimental results on two official TREC Twitter corpora demonstrate the significant superiority of our approach over baseline methods.Comment: 9 pages, 9 figure

    Data Quality Challenges in Twitter Content Analysis for Informing Policy Making in Health Care

    Get PDF
    Social media platforms and microblogs have become popular fora where the general public expresses opinions and concerns on a variety of matters. As a result, private and public organizations have been looking into ways for finding, understanding and communicating insights extracted from this massive amount of text-based interconnected data. There are, however, important difficulties associated with the noisiness and reliability of the content that hinder the analysis of the data. This paper reports the main challenges found in a real-world experience with social media used as a source of data to support policy making and assessment. We also propose a set of strategies for the precise retrieval of data, the profiling of social media users, and the involvement of policy makers in the analytical process

    Adaptive Method for Following Dynamic Topics on Twitter

    Get PDF
    Many research social studies of public response on social media require following (i.e., tracking) topics on Twitter for long periods of time. The current approaches rely on streaming tweets based on some hashtags or keywords, or following some Twitter accounts. Such approaches lead to limited coverage of on-topic tweets. In this paper, we introduce a novel technique for following such topics in a more effective way. A topic is defined as a set of well-prepared queries that cover the static side of the topic. We propose an automatic approach that adapts to emerging aspects of a tracked broad topic over time. We tested our tracking approach on three broad dynamic topics that are hot in different categories: Egyptian politics, Syrian conflict, and international sports. We measured the effectiveness of our approach over four full days spanning a period of four months to ensure consistency in effectiveness. Experimental results showed that, on average, our approach achieved over 100 % increase in recall relative to the baseline Boolean approach, while maintaining an acceptable precision of 83%

    ON RELEVANCE FILTERING FOR REAL-TIME TWEET SUMMARIZATION

    Get PDF
    Real-time tweet summarization systems (RTS) require mechanisms for capturing relevant tweets, identifying novel tweets, and capturing timely tweets. In this thesis, we tackle the RTS problem with a main focus on the relevance filtering. We experimented with different traditional retrieval models. Additionally, we propose two extensions to alleviate the sparsity and topic drift challenges that affect the relevance filtering. For the sparsity, we propose leveraging word embeddings in Vector Space model (VSM) term weighting to empower the system to use semantic similarity alongside the lexical matching. To mitigate the effect of topic drift, we exploit explicit relevance feedback to enhance profile representation to cope with its development in the stream over time. We conducted extensive experiments over three standard English TREC test collections that were built specifically for RTS. Although the extensions do not generally exhibit better performance, they are comparable to the baselines used. Moreover, we extended an event detection Arabic tweets test collection, called EveTAR, to support tasks that require novelty in the system's output. We collected novelty judgments using in-house annotators and used the collection to test our RTS system. We report preliminary results on EveTAR using different models of the RTS system.This work was made possible by NPRP grants # NPRP 7-1313-1-245 and # NPRP 7-1330-2-483 from the Qatar National Research Fund (a member of Qatar Foundation)
    corecore