3,028,995 research outputs found

    NTCIR Lifelog: The First Test Collection for Lifelog Research

    Get PDF
    Test collections have a long history of supporting repeatable and comparable evaluation in Information Retrieval (IR). However, thus far, no shared test collection exists for IR systems that are designed to index and retrieve multimodal lifelog data. In this paper we introduce the first test col- lection for personal lifelog data. The requirements for such a test collection are motivated, the process of creating the test collection is described, along with an overview of the test collection and finally suggestions are given for possible applications of the test collection, which has been employed for the NTCIR12-Lifelog task

    Creating a test collection to evaluate diversity in image retrieval

    Get PDF
    This paper describes the adaptation of an existing test collection for image retrieval to enable diversity in the results set to be measured. Previous research has shown that a more diverse set of results often satisfies the needs of more users better than standard document rankings. To enable diversity to be quantified, it is necessary to classify images relevant to a given theme to one or more sub-topics or clusters. We describe the challenges in building (as far as we are aware) the first test collection for evaluating diversity in image retrieval. This includes selecting appropriate topics, creating sub-topics, and quantifying the overall effectiveness of a retrieval system. A total of 39 topics were augmented for cluster-based relevance and we also provide an initial analysis of assessor agreement for grouping relevant images into sub-topics or clusters

    Improved metrics collection and correlation for the CERN cloud storage test framework

    Get PDF
    Storage space is one of the most important ingredients that the European Organization for Nuclear Research (CERN) needs for its experiments and operation. Part of the Data & Storage Services (IT-DSS) group’s work at CERN is focused on testing and evaluating the cloud storage system that is provided by the openlab partner Huawei, Huawei Universal Disk Storage System (UDS). As a whole, the system consists of both software and hardware. The objective of the Huawei-CERN partnership is to investigate the performance of the cloud storage system. Among the interesting questions are the system’s scalability, reliability and ability to store and retrieve files. During the tests, possible bugs and malfunctions can be discovered and corrected. Different versions of the storage software that runs inside the storage system can also be compared to each other. The nature of testing and benchmarking a storage system gives rise to several small tasks that can be done during a short summer internship. In order to test the storage system a test framework developed by the DSS group is used. The framework consists of various types of file transfer tests, client and server monitoring programs and log file analysis programs. Part of the work done was additions to the existing framework and part was developing new tools. Metrics collection was the central theme. Metrics are to be understood as system statistics, such as memory consumption or processor usage. Memory usage and disk reads/writes were added to the existing client real-time monitoring framework. CPU and memory usage, network traffic (bytes received/sent) and the number of processes running are collected from a client computer before and after a daily test. Two other additions are visualization for storage system log files, as well as a new monitoring tool for the storage system. This report is divided into parts describing each part of the framework that was improved or added, the problem and the final solution. A short description of the code and the architecture are also included

    EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

    Full text link
    This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR , the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets

    The minimum test collection problem

    Get PDF
    In this paper we consider an approach to solve the minimum test collection problem. This approach is based on an explicit reduction from the problem to the satisfiability problem
    corecore