563 research outputs found

    A retrieval evaluation methodology for incomplete relevance assessments

    Get PDF
    In this paper we a propose an extended methodology for laboratory based Information Retrieval evaluation under in complete relevance assessments. This new protocol aims to identify potential uncertainty during system comparison that may result from incompleteness. We demonstrate how this methodology can lead towards a finer grained analysis of systems. This is advantageous, because the detection of uncertainty during the evaluation process can guide and direct researchers when evaluating new systems over existing and future test collections

    Distributed Information Retrieval using Keyword Auctions

    Get PDF
    This report motivates the need for large-scale distributed approaches to information retrieval, and proposes solutions based on keyword auctions

    PFTijah: text search in an XML database system

    Get PDF
    This paper introduces the PFTijah system, a text search system that is integrated with an XML/XQuery database management system. We present examples of its use, we explain some of the system internals, and discuss plans for future work. PFTijah is part of the open source release of MonetDB/XQuery

    Automatic assignment of biomedical categories: toward a generic approach

    Get PDF
    Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. Methods: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. Results and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods. Contact: [email protected]

    Understanding Mobile Search Task Relevance and User Behaviour in Context

    Full text link
    Improvements in mobile technologies have led to a dramatic change in how and when people access and use information, and is having a profound impact on how users address their daily information needs. Smart phones are rapidly becoming our main method of accessing information and are frequently used to perform `on-the-go' search tasks. As research into information retrieval continues to evolve, evaluating search behaviour in context is relatively new. Previous research has studied the effects of context through either self-reported diary studies or quantitative log analysis; however, neither approach is able to accurately capture context of use at the time of searching. In this study, we aim to gain a better understanding of task relevance and search behaviour via a task-based user study (n=31) employing a bespoke Android app. The app allowed us to accurately capture the user's context when completing tasks at different times of the day over the period of a week. Through analysis of the collected data, we gain a better understanding of how using smart phones on the go impacts search behaviour, search performance and task relevance and whether or not the actual context is an important factor.Comment: To appear in CHIIR 2019 in Glasgow, U

    Deriving query suggestions for site search

    Get PDF
    Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single-shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine-grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files. © 2013 ASIS&T

    A Large-Scale Comparison of Historical Text Normalization Systems

    Get PDF
    There is no consensus on the state-of-the-art approach to historical text normalization. Many techniques have been proposed, including rule-based methods, distance metrics, character-based statistical machine translation, and neural encoder--decoder models, but studies have used different datasets, different evaluation methods, and have come to different conclusions. This paper presents the largest study of historical text normalization done so far. We critically survey the existing literature and report experiments on eight languages, comparing systems spanning all categories of proposed normalization techniques, analysing the effect of training data quantity, and using different evaluation methods. The datasets and scripts are made publicly available.Comment: Accepted at NAACL 201
    corecore