288 research outputs found

    Toward higher effectiveness for recall-oriented information retrieval: A patent retrieval case study

    Get PDF
    Research in information retrieval (IR) has largely been directed towards tasks requiring high precision. Recently, other IR applications which can be described as recall-oriented IR tasks have received increased attention in the IR research domain. Prominent among these IR applications are patent search and legal search, where users are typically ready to check hundreds or possibly thousands of documents in order to find any possible relevant document. The main concerns in this kind of application are very different from those in standard precision-oriented IR tasks, where users tend to be focused on finding an answer to their information need that can typically be addressed by one or two relevant documents. For precision-oriented tasks, mean average precision continues to be used as the primary evaluation metric for almost all IR applications. For recall-oriented IR applications the nature of the search task, including objectives, users, queries, and document collections, is different from that of standard precision-oriented search tasks. In this research study, two dimensions in IR are explored for the recall-oriented patent search task. The study includes IR system evaluation and multilingual IR for patent search. In each of these dimensions, current IR techniques are studied and novel techniques developed especially for this kind of recall-oriented IR application are proposed and investigated experimentally in the context of patent retrieval. The techniques developed in this thesis provide a significant contribution toward evaluating the effectiveness of recall-oriented IR in general and particularly patent search, and improving the efficiency of multilingual search for this kind of task

    Query refinement for patent prior art search

    Get PDF
    A patent is a contract between the inventor and the state, granting a limited time period to the inventor to exploit his invention. In exchange, the inventor must put a detailed description of his invention in the public domain. Patents can encourage innovation and economic growth but at the time of economic crisis patents can hamper such growth. The long duration of the application process is a big obstacle that needs to be addressed to maximize the benefit of patents on innovation and economy. This time can be significantly improved by changing the way we search the patent and non-patent literature.Despite the recent advancement of general information retrieval and the revolution of Web Search engines, there is still a huge gap between the emerging technologies from the research labs and adapted by major Internet search engines, and the systems which are in use by the patent search communities.In this thesis we investigate the problem of patent prior art search in patent retrieval with the goal of finding documents which describe the idea of a query patent. A query patent is a full patent application composed of hundreds of terms which does not represent a single focused information need. Other relevance evidences (e.g. classification tags, and bibliographical data) provide additional details about the underlying information need of the query patent. The first goal of this thesis is to estimate a uni-gram query model from the textual fields of a query patent. We then improve the initial query representation using noun phrases extracted from the query patent. We show that expansion in a query-dependent manner is useful.The second contribution of this thesis is to address the term mismatch problem from a query formulation point of view by integrating multiple relevance evidences associated with the query patent. To do this, we enhance the initial representation of the query with the term distribution of the community of inventors related to the topic of the query patent. We then build a lexicon using classification tags and show that query expansion using this lexicon and considering proximity information (between query and expansion terms) can improve the retrieval performance. We perform an empirical evaluation of our proposed models on two patent datasets. The experimental results show that our proposed models can achieve significantly better results than the baseline and other enhanced models

    A ranking framework and evaluation for diversity-based retrieval

    Get PDF
    There has been growing momentum in building information retrieval (IR) systems that consider both relevance and diversity of retrieved information, which together improve the usefulness of search results as perceived by users. Some users may genuinely require a set of multiple results to satisfy their information need as there is no single result that completely fulfils the need. Others may be uncertain about their information need and they may submit ambiguous or broad (faceted) queries, either intentionally or unintentionally. A sensible approach to tackle these problems is to diversify search results to address all possible senses underlying those queries or all possible answers satisfying the information need. In this thesis, we explore three aspects of diversity-based document retrieval: 1) recommender systems, 2) retrieval algorithms, and 3) evaluation measures. This first goal of this thesis is to provide an understanding of the need for diversity in search results from the users’ perspective. We develop an interactive recommender system for the purpose of a user study. Designed to facilitate users engaged in exploratory search, the system is featured with content-based browsing, aspectual interfaces, and diverse recommendations. While the diverse recommendations allow users to discover more and different aspects of a search topic, the aspectual interfaces allow users to manage and structure their own search process and results regarding aspects found during browsing. The recommendation feature mines implicit relevance feedback information extracted from a user’s browsing trails and diversifies recommended results with respect to document contents. The result of our user-centred experiment shows that result diversity is needed in realistic retrieval scenarios. Next, we propose a new ranking framework for promoting diversity in a ranked list. We combine two distinct result diversification patterns; this leads to a general framework that enables the development of a variety of ranking algorithms for diversifying documents. To validate our proposal and to gain more insights into approaches for diversifying documents, we empirically compare our integration framework against a common ranking approach (i.e. the probability ranking principle) as well as several diversity-based ranking strategies. These include maximal marginal relevance, modern portfolio theory, and sub-topic-aware diversification based on sub-topic modelling techniques, e.g. clustering, latent Dirichlet allocation, and probabilistic latent semantic analysis. Our findings show that the two diversification patterns can be employed together to improve the effectiveness of ranking diversification. Furthermore, we find that the effectiveness of our framework mainly depends on the effectiveness of the underlying sub-topic modelling techniques. Finally, we examine evaluation measures for diversity retrieval. We analytically identify an issue affecting the de-facto standard measure, novelty-biased discounted cumulative gain (α-nDCG). This issue prevents the measure from behaving as desired, i.e. assessing the effectiveness of systems that provide complete coverage of sub-topics by avoiding excessive redundancy. We show that this issue is of importance as it highly affects the evaluation of retrieval systems, specifically by overrating top-ranked systems that repeatedly retrieve redundant information. To overcome this issue, we derive a theoretically sound solution by defining a safe threshold on a query-basis. We examine the impact of arbitrary settings of the α-nDCG parameter. We evaluate the intuitiveness and reliability of α-nDCG when using our proposed setting on both real and synthetic rankings. We demonstrate that the diversity of document rankings can be intuitively measured by employing the safe threshold. Moreover, our proposal does not harm, but instead increases the reliability of the measure in terms of discriminative power, stability, and sensitivity

    Evaluating Information Retrieval and Access Tasks

    Get PDF
    This open access book summarizes the first two decades of the NII Testbeds and Community for Information access Research (NTCIR). NTCIR is a series of evaluation forums run by a global team of researchers and hosted by the National Institute of Informatics (NII), Japan. The book is unique in that it discusses not just what was done at NTCIR, but also how it was done and the impact it has achieved. For example, in some chapters the reader sees the early seeds of what eventually grew to be the search engines that provide access to content on the World Wide Web, today’s smartphones that can tailor what they show to the needs of their owners, and the smart speakers that enrich our lives at home and on the move. We also get glimpses into how new search engines can be built for mathematical formulae, or for the digital record of a lived human life. Key to the success of the NTCIR endeavor was early recognition that information access research is an empirical discipline and that evaluation therefore lay at the core of the enterprise. Evaluation is thus at the heart of each chapter in this book. They show, for example, how the recognition that some documents are more important than others has shaped thinking about evaluation design. The thirty-three contributors to this volume speak for the many hundreds of researchers from dozens of countries around the world who together shaped NTCIR as organizers and participants. This book is suitable for researchers, practitioners, and students—anyone who wants to learn about past and present evaluation efforts in information retrieval, information access, and natural language processing, as well as those who want to participate in an evaluation task or even to design and organize one

    A ranking framework and evaluation for diversity-based retrieval

    Get PDF
    There has been growing momentum in building information retrieval (IR) systems that consider both relevance and diversity of retrieved information, which together improve the usefulness of search results as perceived by users. Some users may genuinely require a set of multiple results to satisfy their information need as there is no single result that completely fulfils the need. Others may be uncertain about their information need and they may submit ambiguous or broad (faceted) queries, either intentionally or unintentionally. A sensible approach to tackle these problems is to diversify search results to address all possible senses underlying those queries or all possible answers satisfying the information need. In this thesis, we explore three aspects of diversity-based document retrieval: 1) recommender systems, 2) retrieval algorithms, and 3) evaluation measures. This first goal of this thesis is to provide an understanding of the need for diversity in search results from the users’ perspective. We develop an interactive recommender system for the purpose of a user study. Designed to facilitate users engaged in exploratory search, the system is featured with content-based browsing, aspectual interfaces, and diverse recommendations. While the diverse recommendations allow users to discover more and different aspects of a search topic, the aspectual interfaces allow users to manage and structure their own search process and results regarding aspects found during browsing. The recommendation feature mines implicit relevance feedback information extracted from a user’s browsing trails and diversifies recommended results with respect to document contents. The result of our user-centred experiment shows that result diversity is needed in realistic retrieval scenarios. Next, we propose a new ranking framework for promoting diversity in a ranked list. We combine two distinct result diversification patterns; this leads to a general framework that enables the development of a variety of ranking algorithms for diversifying documents. To validate our proposal and to gain more insights into approaches for diversifying documents, we empirically compare our integration framework against a common ranking approach (i.e. the probability ranking principle) as well as several diversity-based ranking strategies. These include maximal marginal relevance, modern portfolio theory, and sub-topic-aware diversification based on sub-topic modelling techniques, e.g. clustering, latent Dirichlet allocation, and probabilistic latent semantic analysis. Our findings show that the two diversification patterns can be employed together to improve the effectiveness of ranking diversification. Furthermore, we find that the effectiveness of our framework mainly depends on the effectiveness of the underlying sub-topic modelling techniques. Finally, we examine evaluation measures for diversity retrieval. We analytically identify an issue affecting the de-facto standard measure, novelty-biased discounted cumulative gain (α-nDCG). This issue prevents the measure from behaving as desired, i.e. assessing the effectiveness of systems that provide complete coverage of sub-topics by avoiding excessive redundancy. We show that this issue is of importance as it highly affects the evaluation of retrieval systems, specifically by overrating top-ranked systems that repeatedly retrieve redundant information. To overcome this issue, we derive a theoretically sound solution by defining a safe threshold on a query-basis. We examine the impact of arbitrary settings of the α-nDCG parameter. We evaluate the intuitiveness and reliability of α-nDCG when using our proposed setting on both real and synthetic rankings. We demonstrate that the diversity of document rankings can be intuitively measured by employing the safe threshold. Moreover, our proposal does not harm, but instead increases the reliability of the measure in terms of discriminative power, stability, and sensitivity.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    • …
    corecore