7,670 research outputs found

    Synchronous collaborative information retrieval: techniques and evaluation

    Get PDF
    Synchronous Collaborative Information Retrieval refers to systems that support multiple users searching together at the same time in order to satisfy a shared information need. To date most SCIR systems have focussed on providing various awareness tools in order to enable collaborating users to coordinate the search task. However, requiring users to both search and coordinate the group activity may prove too demanding. On the other hand without effective coordination policies the group search may not be effective. In this paper we propose and evaluate novel system-mediated techniques for coordinating a group search. These techniques allow for an effective division of labour across the group whereby each group member can explore a subset of the search space.We also propose and evaluate techniques to support automated sharing of knowledge across searchers in SCIR, through novel collaborative and complementary relevance feedback techniques. In order to evaluate these techniques, we propose a framework for SCIR evaluation based on simulations. To populate these simulations we extract data from TREC interactive search logs. This work represent the first simulations of SCIR to date and the first such use of this TREC data

    EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

    Full text link
    This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR , the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets

    On the use of clustering and the MeSH controlled vocabulary to improve MEDLINE abstract search

    Get PDF
    Databases of genomic documents contain substantial amounts of structured information in addition to the texts of titles and abstracts. Unstructured information retrieval techniques fail to take advantage of the structured information available. This paper describes a technique to improve upon traditional retrieval methods by clustering the retrieval result set into two distinct clusters using additional structural information. Our hypothesis is that the relevant documents are to be found in the tightest cluster of the two, as suggested by van Rijsbergen's cluster hypothesis. We present an experimental evaluation of these ideas based on the relevance judgments of the 2004 TREC workshop Genomics track, and the CLUTO software clustering package

    Division of labour and sharing of knowledge for synchronous collaborative information retrieval

    Get PDF
    Synchronous collaborative information retrieval (SCIR) is concerned with supporting two or more users who search together at the same time in order to satisfy a shared information need. SCIR systems represent a paradigmatic shift in the way we view information retrieval, moving from an individual to a group process and as such the development of novel IR techniques is needed to support this. In this article we present what we believe are two key concepts for the development of effective SCIR namely division of labour (DoL) and sharing of knowledge (SoK). Together these concepts enable coordinated SCIR such that redundancy across group members is reduced whilst enabling each group member to benefit from the discoveries of their collaborators. In this article we outline techniques from state-of-the-art SCIR systems which support these two concepts, primarily through the provision of awareness widgets. We then outline some of our own work into system-mediated techniques for division of labour and sharing of knowledge in SCIR. Finally we conclude with a discussion on some possible future trends for these two coordination techniques

    Creating a test collection to evaluate diversity in image retrieval

    Get PDF
    This paper describes the adaptation of an existing test collection for image retrieval to enable diversity in the results set to be measured. Previous research has shown that a more diverse set of results often satisfies the needs of more users better than standard document rankings. To enable diversity to be quantified, it is necessary to classify images relevant to a given theme to one or more sub-topics or clusters. We describe the challenges in building (as far as we are aware) the first test collection for evaluating diversity in image retrieval. This includes selecting appropriate topics, creating sub-topics, and quantifying the overall effectiveness of a retrieval system. A total of 39 topics were augmented for cluster-based relevance and we also provide an initial analysis of assessor agreement for grouping relevant images into sub-topics or clusters

    Improving Entity Retrieval on Structured Data

    Full text link
    The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper, we propose a two-fold entity retrieval approach. In a first, offline preprocessing step, we cluster entities based on the \emph{x--means} and \emph{spectral} clustering algorithms. In the second step, we propose an optimized retrieval model which takes advantage of our precomputed clusters. For a given set of entities retrieved by the BM25F retrieval approach and a given user query, we further expand the result set with relevant entities by considering features of the queries, entities and the precomputed clusters. Finally, we re-rank the expanded result set with respect to the relevance to the query. We perform a thorough experimental evaluation on the Billions Triple Challenge (BTC12) dataset. The proposed approach shows significant improvements compared to the baseline and state of the art approaches

    The relationship between IR and multimedia databases

    Get PDF
    Modern extensible database systems support multimedia data through ADTs. However, because of the problems with multimedia query formulation, this support is not sufficient.\ud \ud Multimedia querying requires an iterative search process involving many different representations of the objects in the database. The support that is needed is very similar to the processes in information retrieval.\ud \ud Based on this observation, we develop the miRRor architecture for multimedia query processing. We design a layered framework based on information retrieval techniques, to provide a usable query interface to the multimedia database.\ud \ud First, we introduce a concept layer to enable reasoning over low-level concepts in the database.\ud \ud Second, we add an evidential reasoning layer as an intermediate between the user and the concept layer.\ud \ud Third, we add the functionality to process the users' relevance feedback.\ud \ud We then adapt the inference network model from text retrieval to an evidential reasoning model for multimedia query processing.\ud \ud We conclude with an outline for implementation of miRRor on top of the Monet extensible database system
    corecore