5 research outputs found

    Towards Maximising Openness in Digital Sensitivity Review using Reviewing Time Predictions

    Get PDF
    The adoption of born-digital documents, such as email, by governments, such as in the UK and USA, has resulted in a large backlog of born-digital documents that must be sensitivity reviewed before they can be opened to the public, to ensure that no sensitive information is released, e.g. personal or confidential information. However, it is not practical to review all of the backlog with the available reviewing resources and, therefore, there is a need for automatic techniques to increase the number of documents that can be opened within a fixed reviewing time budget. In this paper, we conduct a user study and use the log data to build models to predict reviewing times for an average sensitivity reviewer. Moreover, we show that using our reviewing time predictions to select the order that documents are reviewed can markedly increase the ratio of reviewed documents that are released to the public, e.g. +30% for collections with high levels of sensitivity, compared to reviewing by shortest document first. This, in turn, increases the total number of documents that are opened to the public within a fixed reviewing time budget, e.g. an extra 200 documents in 100 hours reviewing

    How Sensitivity Classification Effectiveness Impacts Reviewers in Technology-Assisted Sensitivity Review

    Get PDF
    All government documents that are released to the public must first be manually reviewed to identify and protect any sensitive information, e.g. confidential information. However, the unassisted manual sensitivity review of born-digital documents is not practical due to, for example, the volume of documents that are created. Previous work has shown that sensitivity classification can be effective for predicting if a document contains sensitive information. However, since all of the released documents must be manually reviewed, it is important to know if sensitivity classification can assist sensitivity reviewers in making their sensitivity judgements. Hence, in this paper, we conduct a digital sensitivity review user study, to investigate if the accuracy of sensitivity classification effects the number of documents that a reviewer correctly judges to be sensitive or not (reviewer accuracy) and the time that it takes to sensitivity review a document (reviewing speed). Our results show that providing reviewers with sensitivity classification predictions, from a classifier that achieves 0.7 Balanced Accuracy, results in a 38% increase in mean reviewer accuracy and an increase of 72% in mean reviewing speeds, compared to when reviewers are not provided with predictions. Overall, our findings demonstrate that sensitivity classification is a viable technology for assisting with the sensitivity review of born-digital government documents

    Hear Me Out: A Study on the Use of the Voice Modality for Crowdsourced Relevance Assessments

    Full text link
    The creation of relevance assessments by human assessors (often nowadays crowdworkers) is a vital step when building IR test collections. Prior works have investigated assessor quality & behaviour, though into the impact of a document's presentation modality on assessor efficiency and effectiveness. Given the rise of voice-based interfaces, we investigate whether it is feasible for assessors to judge the relevance of text documents via a voice-based interface. We ran a user study (n = 49) on a crowdsourcing platform where participants judged the relevance of short and long documents sampled from the TREC Deep Learning corpus-presented to them either in the text or voice modality. We found that: (i) participants are equally accurate in their judgements across both the text and voice modality; (ii) with increased document length it takes participants significantly longer (for documents of length > 120 words it takes almost twice as much time) to make relevance judgements in the voice condition; and (iii) the ability of assessors to ignore stimuli that are not relevant (i.e., inhibition) impacts the assessment quality in the voice modality-assessors with higher inhibition are significantly more accurate than those with lower inhibition. Our results indicate that we can reliably leverage the voice modality as a means to effectively collect relevance labels from crowdworkers.Comment: Accepted at SIGIR 202

    The influence of topic difficulty, relevance level, and document ordering on relevance judging

    No full text
    In this study we investigate the relationship between how long it takes an assessor to judge document relevance, and three key factors that may influence the judging scenario: the difficulty of the search topic for which relevance is being assessed; the degree to which the documents are relevant to the search topic; and, the order in which the documents are presented for judging. Two potential confounding influences on judgment speed are differences in individual reading ability, and the length of documents that are being assessed. We therefore propose two measures to investigate the above factors: normalized processing speed (NPS), which adjusts the number of words that were processed per minute by taking into account differences in reading speed between judges, and normalized dwell time (NDT), which adjusts the duration that a judge spent reading a document relative to document length. Note that these two measures have different relationships with overall judgment speed: a direct relationship for NPS, and an inverse relationship for NDT

    Identifying latent relationship information in documents for efficient and effective sensitivity review

    Get PDF
    Freedom of Information (FOI) laws exist in over a hundred countries to ensure public access to information that is held by government and public institutions. However, the FOI laws exempt the public disclosure of sensitive information (e.g. personal or confidential information) that can violate the human rights of individuals or endanger a country’s national security. Hence, government documents must undergo a rigorous sensitivity review before the documents can be considered for public release. Sensitivity review is typically a manual process since it requires utmost accuracy to ensure that potentially sensitive information is protected from public release. However, due to the massive volume of government documents that must be sensitivity reviewed, it is impractical to conduct a fully manual sensitivity review. Moreover, identifying sensitive information itself is a complex task, which often requires analysing hidden patterns or connections, i.e., latent relations between documents, such as mentions of specific individuals or descriptions of events, activities or discussions that could span multiple documents. In this thesis, we argue that automatically identifying latent relations between documents can help the human users involved in the sensitivity review process to efficiently make accurate sensitivity judgements. In particular, we identify two user roles in the sensitivity review process, namely Review Organisers and Sensitivity Reviewers. Review Organisers prioritise and allocate documents for review to maximise openness, i.e., the number of documents selected for public release in a fixed time. Sensitivity Reviewers read the documents to determine whether they contain sensitive information. This thesis aims to address the following challenges in the respective tasks of the Review Organisers and Sensitivity Reviewers: (1) effectively prioritising documents for review to increase openness, (2) effectively allocating documents to reviewers based on their specific interests in different types of documents and content, and (3) accurately and efficiently identifying sensitive information by analysing latent relations between documents. In this thesis, we propose novel methods for automatically identifying the latent relations between documents to assist both Review Organisers and Sensitivity Reviewers. We first propose, RelDiff, a method for representing knowledge graph entities and relations in a single embedding space, which can improve the effectiveness of automatic sensitivity classification. Through empirical evaluation, we show that representing entire entity-relation-entity triples (e.g. personIsDirectorOf-company) can effectively indicate whether a piece of information (e.g. a person’s salary) should be considered sensitive or non-sensitive. We then propose to leverage document clustering to identify semantic categories that describe a high-level subject domain (e.g. criminality or politics). Through an extensive user study, we show that presenting documents in semantic categories can help the reviewers understand the type of content in a collection, thereby improving the reviewing speed of reviewers without affecting the accuracy of sensitivity review. Moreover, we show that prioritising semantic categories using sensitivity classification can help the Review Organisers release more documents in a fixed time (i.e. increase openness). Furthermore, we introduce the task of information threading, i.e., to identify coherent and chronologically evolving information about an event, activity or discussion from multiple documents. We propose novel information threading methods (i.e., SeqINT and HINT) and demonstrate their effectiveness through empirical evaluations compared to existing related methods. In addition, through a detailed user study, we show that reviewing documents in information threads can help the reviewers provide sensitivity judgements more quickly and accurately compared to a traditional document-by-document review. Lastly, we propose to learn the reviewers’ interests in specific types of documents to effectively allocate documents based on the reviewers’ interests and expertise. We propose, CluRec, a method for cluster-based recommendation that can effectively identify and recommend clusters of documents that are related based on the users’ interests. Through another comprehensive user study, we show that recommending documents to reviewers based on their interests can improve the reviewers’ reviewing speed and the review accuracy. Overall, we present a novel framework for sensitivity review, SERVE, that harnesses our proposed methods of identifying latent relations and provides a series of functionalities to the Sensitivity Reviewers and Review Organisers, namely: (1) Sequentially reviewing documents that are organised into semantic categories, to enable the quick and consistent review of similar documents. (2) Collectively reviewing related documents in coherent threads, to enable accurate and efficient review of sensitivities that are spread across multiple documents. (3) Customised prioritisation of documents for review based on the documents’ semantic categories and predicted sensitivity probabilities to enhance openness. (4) Recommending documents to reviewers based on their interests to effectively allocate documents to reviewers who are best equipped to understand and identify sensitive information in specific types of documents and content in a collection. This is the first thesis that takes a system-oriented approach and investigates different novel functionalities to assist human sensitivity review. Our primary contributions in this thesis are our proposed framework for sensitivity review, SERVE, and its underlying methods to identify latent relations between documents that are potential indicators of sensitive information. Our extensive experiments and evaluations, involving thorough offline experiments and carefully designed user studies, demonstrate the real-world applicability of SERVE in enhancing the ability of government organisations to fulfil their openness obligations while protecting sensitive information to comply with FOI laws. In addition, we demonstrate the applications of our proposed novel methods for information threading and cluster-based recommendation beyond sensitivity review, i.e., in the news domain, which emphasises the generalisability of our contributions
    corecore