3,028,995 research outputs found
NTCIR Lifelog: The First Test Collection for Lifelog Research
Test collections have a long history of supporting repeatable
and comparable evaluation in Information Retrieval (IR).
However, thus far, no shared test collection exists for IR
systems that are designed to index and retrieve multimodal
lifelog data. In this paper we introduce the first test col-
lection for personal lifelog data. The requirements for such
a test collection are motivated, the process of creating the
test collection is described, along with an overview of the
test collection and finally suggestions are given for possible
applications of the test collection, which has been employed
for the NTCIR12-Lifelog task
Creating a test collection to evaluate diversity in image retrieval
This paper describes the adaptation of an existing test collection
for image retrieval to enable diversity in the results set to be
measured. Previous research has shown that a more diverse set of
results often satisfies the needs of more users better than standard
document rankings. To enable diversity to be quantified, it is
necessary to classify images relevant to a given theme to one or
more sub-topics or clusters. We describe the challenges in
building (as far as we are aware) the first test collection for
evaluating diversity in image retrieval. This includes selecting
appropriate topics, creating sub-topics, and quantifying the overall
effectiveness of a retrieval system. A total of 39 topics were
augmented for cluster-based relevance and we also provide an
initial analysis of assessor agreement for grouping relevant
images into sub-topics or clusters
Recommended from our members
Set covering and set partitioning: a collection of test problems
It is now well established that set covering and set partitioning models play a central role
in many scheduling applications. There are many algorithms which solve these
problems. In order to test and validate such implementations we have collected a range
of test problems taken from different contexts. A brief description of these models,
their applications and summary model data, are supplied in this paper
Improved metrics collection and correlation for the CERN cloud storage test framework
Storage space is one of the most important ingredients that the European Organization for Nuclear Research (CERN) needs for its experiments and operation. Part of the Data & Storage Services (IT-DSS) group’s work at CERN is focused on testing and evaluating the cloud storage system that is provided by the openlab partner Huawei, Huawei Universal Disk Storage System (UDS). As a whole, the system consists of both software and hardware.
The objective of the Huawei-CERN partnership is to investigate the performance of the cloud storage system. Among the interesting questions are the system’s scalability, reliability and ability to store and retrieve files. During the tests, possible bugs and malfunctions can be discovered and corrected. Different versions of the storage software that runs inside the storage system can also be compared to each other.
The nature of testing and benchmarking a storage system gives rise to several small tasks that can be done during a short summer internship. In order to test the storage system a test framework developed by the DSS group is used. The framework consists of various types of file transfer tests, client and server monitoring programs and log file analysis programs. Part of the work done was additions to the existing framework and part was developing new tools. Metrics collection was the central theme. Metrics are to be understood as system statistics, such as memory consumption or processor usage.
Memory usage and disk reads/writes were added to the existing client real-time monitoring framework. CPU and memory usage, network traffic (bytes received/sent) and the number of processes running are collected from a client computer before and after a daily test. Two other additions are visualization for storage system log files, as well as a new monitoring tool for the storage system. This report is divided into parts describing each part of the framework that was improved or added, the problem and the final solution. A short description of the code and the architecture are also included
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
This article introduces a new language-independent approach for creating a
large-scale high-quality test collection of tweets that supports multiple
information retrieval (IR) tasks without running a shared-task campaign. The
adopted approach (demonstrated over Arabic tweets) designs the collection
around significant (i.e., popular) events, which enables the development of
topics that represent frequent information needs of Twitter users for which
rich content exists. That inherently facilitates the support of multiple tasks
that generally revolve around events, namely event detection, ad-hoc search,
timeline generation, and real-time summarization. The key highlights of the
approach include diversifying the judgment pool via interactive search and
multiple manually-crafted queries per topic, collecting high-quality
annotations via crowd-workers for relevancy and in-house annotators for
novelty, filtering out low-agreement topics and inaccessible tweets, and
providing multiple subsets of the collection for better availability. Applying
our methodology on Arabic tweets resulted in EveTAR , the first
freely-available tweet test collection for multiple IR tasks. EveTAR includes a
crawl of 355M Arabic tweets and covers 50 significant events for which about
62K tweets were judged with substantial average inter-annotator agreement
(Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating
existing algorithms in the respective tasks. Results indicate that the new
collection can support reliable ranking of IR systems that is comparable to
similar TREC collections, while providing strong baseline results for future
studies over Arabic tweets
The minimum test collection problem
In this paper we consider an approach to solve the minimum test collection problem. This approach is based on an explicit reduction from the problem to the satisfiability problem
- …