407,207 research outputs found

    Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval

    Full text link
    Pseudo-relevance feedback mechanisms, from Rocchio to the relevance models, have shown the usefulness of expanding and reweighting the users' initial queries using information occurring in an initial set of retrieved documents, known as the pseudo-relevant set. Recently, dense retrieval -- through the use of neural contextual language models such as BERT for analysing the documents' and queries' contents and computing their relevance scores -- has shown a promising performance on several information retrieval tasks still relying on the traditional inverted index for identifying documents relevant to a query. Two different dense retrieval families have emerged: the use of single embedded representations for each passage and query (e.g. using BERT's [CLS] token), or via multiple representations (e.g. using an embedding for each token of the query and document). In this work, we conduct the first study into the potential for multiple representation dense retrieval to be enhanced using pseudo-relevance feedback. In particular, based on the pseudo-relevant set of documents identified using a first-pass dense retrieval, we extract representative feedback embeddings (using KMeans clustering) -- while ensuring that these embeddings discriminate among passages (based on IDF) -- which are then added to the query representation. These additional feedback embeddings are shown to both enhance the effectiveness of a reranking as well as an additional dense retrieval operation. Indeed, experiments on the MSMARCO passage ranking dataset show that MAP can be improved by upto 26% on the TREC 2019 query set and 10% on the TREC 2020 query set by the application of our proposed ColBERT-PRF method on a ColBERT dense retrieval approach.Comment: 10 page

    Understanding the Policy Context of Hiring, Human Trafficking and Modern-Day Slavery A BRIEF FOR RESPONSIBLE BUSINESS

    Get PDF
    This document is part of a digital collection provided by the Martin P. Catherwood Library, ILR School, Cornell University, pertaining to the effects of globalization on the workplace worldwide. Special emphasis is placed on labor rights, working conditions, labor market changes, and union organizing.Verite_HELP_WANTED_Understanding_the_Policy_Context_of_Hiring_Human_Trafficking_and_Modern_Day_Slavery.pdf: 228 downloads, before Oct. 1, 2020

    Modeling an ontology on accessible evacuation routes for emergencies

    Get PDF
    Providing alert communication in emergency situations is vital to reduce the number of victims. However, this is a challenging goal for researchers and professionals due to the diverse pool of prospective users, e.g. people with disabilities as well as other vulnerable groups. Moreover, in the event of an emergency situation, many people could become vulnerable because of exceptional circumstances such as stress, an unknown environment or even visual impairment (e.g. fire causing smoke). Within this scope, a crucial activity is to notify affected people about safe places and available evacuation routes. In order to address this need, we propose to extend an ontology, called SEMA4A (Simple EMergency Alert 4 [for] All), developed in a previous work for managing knowledge about accessibility guidelines, emergency situations and communication technologies. In this paper, we introduce a semi-automatic technique for knowledge acquisition and modeling on accessible evacuation routes. We introduce a use case to show applications of the ontology and conclude with an evaluation involving several experts in evacuation procedures. © 2014 Elsevier Ltd. All rights reserved

    Active Learning Strategies for Technology Assisted Sensitivity Review

    Get PDF
    Government documents must be reviewed to identify and protect any sensitive information, such as personal information, before the documents can be released to the public. However, in the era of digital government documents, such as e-mail, traditional sensitivity review procedures are no longer practical, for example due to the volume of documents to be reviewed. Therefore, there is a need for new technology assisted review protocols to integrate automatic sensitivity classification into the sensitivity review process. Moreover, to effectively assist sensitivity review, such assistive technologies must incorporate reviewer feedback to enable sensitivity classifiers to quickly learn and adapt to the sensitivities within a collection, when the types of sensitivity are not known a priori. In this work, we present a thorough evaluation of active learning strategies for sensitivity review. Moreover, we present an active learning strategy that integrates reviewer feedback, from sensitive text annotations, to identify features of sensitivity that enable us to learn an effective sensitivity classifier (0.7 Balanced Accuracy) using significantly less reviewer effort, according to the sign test (p < 0.01 ). Moreover, this approach results in a 51% reduction in the number of documents required to be reviewed to achieve the same level of classification accuracy, compared to when the approach is deployed without annotation features

    Comprehensive Review of Opinion Summarization

    Get PDF
    The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe
    • …
    corecore