2,834 research outputs found

    A topical approach to retrievability bias estimation

    Get PDF
    Retrievability is an independent evaluation measure that offers insights to an aspect of retrieval systems that performance and efficiency measures do not. Retrievability is often used to calculate the retrievability bias, an indication of how accessible a system makes all the documents in a collection. Generally, computing the retrievability bias of a system requires a colossal number of queries to be issued for the system to gain an accurate estimate of the bias. However, it is often the case that the accuracy of the estimate is not of importance, but the relationship between the estimate of bias and performance when tuning a systems parameters. As such, reaching a stable estimation of bias for the system is more important than getting very accurate retrievability scores for individual documents. This work explores the idea of using topical subsets of the collection for query generation and bias estimation to form a local estimate of bias which correlates with the global estimate of retrievability bias. By using topical subsets, it would be possible to reduce the volume of queries required to reach an accurate estimate of retrievability bias, reducing the time and resources required to perform a retrievability analysis. Findings suggest that this is a viable approach to estimating retrievability bias and that the number of queries required can be reduced to less than a quarter of what was previously thought necessary

    Querylog-based assessment of retrievability bias in a large newspaper corpus

    Get PDF
    Bias in the retrieval of documents can directly influence the information access of a digital library. In the worst case, systematic favoritism for a certain type of document can render other parts of the collection invisible to users. This potential bias can be evaluated by measuring the retrievability for all documents in a collection. Previous evaluations have been performed on TREC collections using simulated query sets. The question remains, however, how representative this approach is of more realistic settings. To address this question, we investigate the effectiveness of the retrievability measure using a large digitized newspaper corpus, featuring two characteristics that distinguishes our experiments from previous studies: (1) compared to TREC collections, our collection contains noise originating from OCR processing, historical spelling and use of language; and (2) instead of simula

    Report on the Information Retrieval Festival (IRFest2017)

    Get PDF
    The Information Retrieval Festival took place in April 2017 in Glasgow. The focus of the workshop was to bring together IR researchers from the various Scottish universities and beyond in order to facilitate more awareness, increased interaction and reflection on the status of the field and its future. The program included an industry session, research talks, demos and posters as well as two keynotes. The first keynote was delivered by Prof. Jaana Kekalenien, who provided a historical, critical reflection of realism in Interactive Information Retrieval Experimentation, while the second keynote was delivered by Prof. Maarten de Rijke, who argued for more Artificial Intelligence usage in IR solutions and deployments. The workshop was followed by a "Tour de Scotland" where delegates were taken from Glasgow to Aberdeen for the European Conference in Information Retrieval (ECIR 2017

    Keyword-Based Delegable Proofs of Storage

    Full text link
    Cloud users (clients) with limited storage capacity at their end can outsource bulk data to the cloud storage server. A client can later access her data by downloading the required data files. However, a large fraction of the data files the client outsources to the server is often archival in nature that the client uses for backup purposes and accesses less frequently. An untrusted server can thus delete some of these archival data files in order to save some space (and allocate the same to other clients) without being detected by the client (data owner). Proofs of storage enable the client to audit her data files uploaded to the server in order to ensure the integrity of those files. In this work, we introduce one type of (selective) proofs of storage that we call keyword-based delegable proofs of storage, where the client wants to audit all her data files containing a specific keyword (e.g., "important"). Moreover, it satisfies the notion of public verifiability where the client can delegate the auditing task to a third-party auditor who audits the set of files corresponding to the keyword on behalf of the client. We formally define the security of a keyword-based delegable proof-of-storage protocol. We construct such a protocol based on an existing proof-of-storage scheme and analyze the security of our protocol. We argue that the techniques we use can be applied atop any existing publicly verifiable proof-of-storage scheme for static data. Finally, we discuss the efficiency of our construction.Comment: A preliminary version of this work has been published in International Conference on Information Security Practice and Experience (ISPEC 2018

    A study of query expansion methods for patent retrieval

    Get PDF
    Patent retrieval is a recall-oriented search task where the objective is to find all possible relevant documents. Queries in patent retrieval are typically very long since they take the form of a patent claim or even a full patent application in the case of priorart patent search. Nevertheless, there is generally a significant mismatch between the query and the relevant documents, often leading to low retrieval effectiveness. Some previous work has tried to address this mismatch through the application of query expansion (QE) techniques which have generally showed effectiveness for many other retrieval tasks. However, results of QE on patent search have been found to be very disappointing. We present a review of previous investigations of QE in patent retrieval, and explore some of these techniques on a prior-art patent search task. In addition, a novel method for QE using automatically generated synonyms set is presented. While previous QE techniques fail to improve over baseline retrieval, our new approach show statistically better retrieval precision over the baseline, although not for recall. In addition, it proves to be significantly more efficient than existing techniques. An extensive analysis to the results is presented which seeks to better understand situations where these QE techniques succeed or fail

    Nuclear Waste Disposal in France : the Contribution of Economic Analysis.

    Get PDF
    This article addresses the following question: How to deal with uncertainty, emergence of new information and irreversibility in the decision process of the long-term disposal of radioactive waste? Intuitively, one might think that measures taken today are more relevant when they are ‡exible. We show that the theoretical economic insights supplements this intuition and more precisely we emphasize the real options theory as one means of valuing ‡exible strategies in the disposal of highly radioactive waste. Moreover, we argue that the optional approach must involve a more complex utilization in the recently developed French project of reversible repository given the presence of multiple disposal stages.Radioactive waste, Real options, Reversibility.

    Reversibility and switching options values in the geological disposal of radioactive waste.

    Get PDF
    This article offers some economic insights for the debate on the reversible geological disposal of radioactive waste. Irreversibility due to large sunk costs, an important degree of flexibility and several sources of uncertainty are taken into account in the decision process relative to the radioactive waste disposal. We draw up a stochastic model in a continuous time framework to study the decision problem of a reversible repository project for the radioactive waste, with multiple disposal stages. We consider that the value of reversibility, related to the radioactive waste packages, is jointly affected by economic and technological uncertainty. These uncertainties are modeled, first, by a 2-Dimensional Geometric Brown- ian Motion, and, second, by a Geometric Brownian Motion with a Poisson jump process. A numerical analysis and a sensitivity study of various parameters are also proposed.radioactive waste, reversibility, switching, real option theory.

    PRES: A score metric for evaluating recall-oriented information retrieval applications

    Get PDF
    Information retrieval (IR) evaluation scores are generally designed to measure the effectiveness with which relevant documents are identified and retrieved. Many scores have been proposed for this purpose over the years. These have primarily focused on aspects of precision and recall, and while these are often discussed with equal importance, in practice most attention has been given to precision focused metrics. Even for recalloriented IR tasks of growing importance, such as patent retrieval, these precision based scores remain the primary evaluation measures. Our study examines different evaluation measures for a recall-oriented patent retrieval task and demonstrates the limitations of the current scores in comparing different IR systems for this task. We introduce PRES, a novel evaluation metric for this type of application taking account of recall and the user’s search effort. The behaviour of PRES is demonstrated on 48 runs from the CLEF-IP 2009 patent retrieval track. A full analysis of the performance of PRES shows its suitability for measuring the retrieval effectiveness of systems from a recall focused perspective taking into account the user’s expected search effort

    An empirical analysis of pruning techniques performance, retrievability and bias

    Get PDF
    Prior work on using retrievability measures in the evaluation of information retrieval (IR) systems has laid out the foundations for investigating the relation between retrieval performance and retrieval bias. While various factors influencing retrievability have been examined, showing how the retrieval model may influence bias, no prior work has examined the impact of the index (and how it is optimized) on retrieval bias. Intuitively, how the documents are represented, and what terms they contain, will influence whether they are retrievable or not. In this paper, we investigate how the retrieval bias of a system changes as the inverted index is optimized for efficiency through static index pruning. In our analysis, we consider four pruning methods and examine how they affect performance and bias on the TREC GOV2 Collection. Our results show that the relationship between these factors is varied and complex-and very much dependent on the pruning algorithm. We find that more pruning results in relatively little change or a slight decrease in bias up to a point, and then a dramatic increase. The increase in bias corresponds to a sharp decrease in early precision such as NDCG@10 and is also indicative of a large decrease in MAP. The findings suggest that the impact of pruning algorithms can be quite varied-but retrieval bias could be used to guide the pruning process. Further work is required to determine precisely which documents are most affected and how this impacts upon performance
    corecore