33 research outputs found

    IRIT at TREC Microblog Track 2013

    Get PDF
    National audienceThis paper describes the participation of the IRIT lab, University of Toulouse, France, to the Microblog Track of TREC 2013. Two different approaches are experimented by our team for the real-time ad-hoc search task: (i) a Bayesian network retrieval model for tweet search and (ii) a document and query expansion model for microblog search

    IRIT at TREC Microblog 2015

    Get PDF
    International audienceThis paper presents the participation of the IRIT laboratory (University of Toulouse) to the Microblog Track of TREC 2015. This track consists in a real-time filtering task aiming at monitoring a stream of social media posts in accordance to a user's interest profile. In this context, our team proposes three approaches: (a) a novel selective summarization approach based on a decision of selecting/ignoring tweets without the use of external knowledge and relying on novelty and redundancy factors, (b) a processing workflow enabling to index tweets in real-time and enhanced by a notification and digests method guided by diversity and user personalization, and (c) a step by step stream selection method focusing on rapidity, and taking into account tweet similarity as well as several features including content, entities and user-related aspects. For all these approaches, we discuss the obtained results during the experimental evaluation

    Query Expansion for Survey Question Retrieval in the Social Sciences

    Full text link
    In recent years, the importance of research data and the need to archive and to share it in the scientific community have increased enormously. This introduces a whole new set of challenges for digital libraries. In the social sciences typical research data sets consist of surveys and questionnaires. In this paper we focus on the use case of social science survey question reuse and on mechanisms to support users in the query formulation for data sets. We describe and evaluate thesaurus- and co-occurrence-based approaches for query expansion to improve retrieval quality in digital libraries and research data archives. The challenge here is to translate the information need and the underlying sociological phenomena into proper queries. As we can show retrieval quality can be improved by adding related terms to the queries. In a direct comparison automatically expanded queries using extracted co-occurring terms can provide better results than queries manually reformulated by a domain expert and better results than a keyword-based BM25 baseline.Comment: to appear in Proceedings of 19th International Conference on Theory and Practice of Digital Libraries 2015 (TPDL 2015

    Recherche d'information dans les microblogs : que manque-t-il aux approches classiques ?

    Get PDF
    National audienceNous nous intéressons dans cet article à la recherche d'information dans les microblogs. Les modèles de RI classiques, conçus pour des textes plus longs que les 140 caractères d'un microblog, ne sont pas forcément adaptés pour ces derniers. Une analyse de leurs résultats nous a permis d'identifier la différence de vocabulaire entre les microblogs et la requête comme étant la raison principale de leur manque de performance. Pour améliorer la qualité de la recherche, nous proposons d'étendre les microblogs grâce au texte des URL qu'ils contiennent, et également d'étendre les requêtes avec WordNet ou en utilisant des articles de presse. Les résultats montrent l'intérêt de l'extension des tweets, celui de l'extension des requêtes restant à prouver

    Report on the Second International Workshop on the Evaluation on Collaborative Information Seeking and Retrieval (ECol'2017 @ CHIIR)

    Get PDF
    The 2nd workshop on the evaluation of collaborative information retrieval and seeking (ECol) was held in conjunction with the ACM SIGIR Conference on Human Information Interaction & Retrieval (CHIIR) in Oslo, Norway. The workshop focused on discussing the challenges and difficulties of researching and studying collaborative information retrieval and seeking (CIS/CIR). After an introductory and scene setting overview of developments in CIR/CIS, participants were challenged with devising a range of possible CIR/CIS tasks that could be used for evaluation purposes. Through the brainstorming and discussions, valuable insights regarding the evaluation of CIR/CIS tasks become apparent ? for particular tasks efficiency and/or effectiveness is most important, however for the majority of tasks the success and quality of outcomes along with knowledge sharing and sense-making were most important ? of which these latter attributes are much more difficult to measure and evaluate. Thus the major challenge for CIR/CIS research is to develop methods, measures and methodologies to evaluate these high order attributes

    iAggregator: Multidimensional Relevance Aggregation Based on a Fuzzy Operator

    Get PDF
    International audienceRecently, an increasing number of information retrieval studies have triggered a resurgence of interest in redefining the algorithmic estimation of relevance, which implies a shift from topical to multidimensional relevance assessment. A key underlying aspect that emerged when addressing this concept is the aggregation of the relevance assessments related to each of the considered dimensions. The most commonly adopted forms of aggregation are based on classical weighted means and linear combination schemes to address this issue. Although some initiatives were recently proposed, none was concerned with considering the inherent dependencies and interactions existing among the relevance criteria, as is the case in many real-life applications. In this article, we present a new fuzzy-based operator, called iAggregator, for multidimensional relevance aggregation. Its main originality, beyond its ability to model interactions between different relevance criteria, lies in its generalization of many classical aggregation functions. To validate our proposal, we apply our operator within a tweet search task. Experiments using a standard benchmark, namely, Text REtrieval Conference Microblog,1 emphasize the relevance of our contribution when compared with traditional aggregation schemes. In addition, it outperforms state-of-the-art aggregation operators such as the Scoring and the And prioritized operators as well as some representative learning-to-rank algorithms

    ON RELEVANCE FILTERING FOR REAL-TIME TWEET SUMMARIZATION

    Get PDF
    Real-time tweet summarization systems (RTS) require mechanisms for capturing relevant tweets, identifying novel tweets, and capturing timely tweets. In this thesis, we tackle the RTS problem with a main focus on the relevance filtering. We experimented with different traditional retrieval models. Additionally, we propose two extensions to alleviate the sparsity and topic drift challenges that affect the relevance filtering. For the sparsity, we propose leveraging word embeddings in Vector Space model (VSM) term weighting to empower the system to use semantic similarity alongside the lexical matching. To mitigate the effect of topic drift, we exploit explicit relevance feedback to enhance profile representation to cope with its development in the stream over time. We conducted extensive experiments over three standard English TREC test collections that were built specifically for RTS. Although the extensions do not generally exhibit better performance, they are comparable to the baselines used. Moreover, we extended an event detection Arabic tweets test collection, called EveTAR, to support tasks that require novelty in the system's output. We collected novelty judgments using in-house annotators and used the collection to test our RTS system. We report preliminary results on EveTAR using different models of the RTS system.This work was made possible by NPRP grants # NPRP 7-1313-1-245 and # NPRP 7-1330-2-483 from the Qatar National Research Fund (a member of Qatar Foundation)
    corecore