8,638 research outputs found

    Read, White, and Blue: Prosecutors Reading Inmate Emails and the Attorney-Client Privilege, 48 J. Marshall L. Rev. 1119 (2015)

    Get PDF
    This Comment addresses whether the attorney-client privilege should extend to emails exchanged between an inmate and his or her attorney over TRULINCS, the prison email system. Section II describes the history of the attorney-client privilege, and compares and contrasts the federal privilege with the New York state privilege in order to directly address Dr. Ahmed’s conflict. Section III juxtaposes other forms of privileged attorney-client contact with inmate emailing, and discusses the confidentiality agreement provided through the prison email system, TRULINCS. Finally, Section IV proposes a fiscally responsible, efficient, and convenient solution to the possible extension of the attorney-client privilege to inmate email

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    Textual Case-based Reasoning for Spam Filtering: a Comparison of Feature-based and Feature-free Approaches

    Get PDF
    Spam filtering is a text classification task to which Case-Based Reasoning (CBR) has been successfully applied. We describe the ECUE system, which classifies emails using a feature-based form of textual CBR. Then, we describe an alternative way to compute the distances between cases in a feature-free fashion, using a distance measure based on text compression. This distance measure has the advantages of having no set-up costs and being resilient to concept drift. We report an empirical comparison, which shows the feature-free approach to be more accurate than the feature-based system. These results are fairly robust over different compression algorithms in that we find that the accuracy when using a Lempel-Ziv compressor (GZip) is approximately the same as when using a statistical compressor (PPM). We note, however, that the feature-free systems take much longer to classify emails than the feature-based system. Improvements in the classification time of both kinds of systems can be obtained by applying case base editing algorithms, which aim to remove noisy and redundant cases from a case base while maintaining, or even improving, generalisation accuracy. We report empirical results using the Competence-Based Editing (CBE) technique. We show that CBE removes more cases when we use the distance measure based on text compression (without significant changes in generalisation accuracy) than it does when we use the feature-based approach

    A review of spam email detection: analysis of spammer strategies and the dataset shift problem

    Get PDF
    .Spam emails have been traditionally seen as just annoying and unsolicited emails containing advertisements, but they increasingly include scams, malware or phishing. In order to ensure the security and integrity for the users, organisations and researchers aim to develop robust filters for spam email detection. Recently, most spam filters based on machine learning algorithms published in academic journals report very high performance, but users are still reporting a rising number of frauds and attacks via spam emails. Two main challenges can be found in this field: (a) it is a very dynamic environment prone to the dataset shift problem and (b) it suffers from the presence of an adversarial figure, i.e. the spammer. Unlike classical spam email reviews, this one is particularly focused on the problems that this constantly changing environment poses. Moreover, we analyse the different spammer strategies used for contaminating the emails, and we review the state-of-the-art techniques to develop filters based on machine learning. Finally, we empirically evaluate and present the consequences of ignoring the matter of dataset shift in this practical field. Experimental results show that this shift may lead to severe degradation in the estimated generalisation performance, with error rates reaching values up to 48.81%.SIPublicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL

    Data-based fault detection in chemical processes: Managing records with operator intervention and uncertain labels

    Get PDF
    Developing data-driven fault detection systems for chemical plants requires managing uncertain data labels and dynamic attributes due to operator-process interactions. Mislabeled data is a known problem in computer science that has received scarce attention from the process systems community. This work introduces and examines the effects of operator actions in records and labels, and the consequences in the development of detection models. Using a state space model, this work proposes an iterative relabeling scheme for retraining classifiers that continuously refines dynamic attributes and labels. Three case studies are presented: a reactor as a motivating example, flooding in a simulated de-Butanizer column, as a complex case, and foaming in an absorber as an industrial challenge. For the first case, detection accuracy is shown to increase by 14% while operating costs are reduced by 20%. Moreover, regarding the de-Butanizer column, the performance of the proposed strategy is shown to be 10% higher than the filtering strategy. Promising results are finally reported in regard of efficient strategies to deal with the presented problemPeer ReviewedPostprint (author's final draft

    Making sense of digital footprints in team-based legal investigations: the acquisition of focus

    Get PDF
    Sensemaking occurs when people face the problem of forming an understanding of a situation. One scenario in which technology has a particularly significant impact on sensemaking and its success is in legal investigations. Legal investigations extend over time, are resource intensive, and require the sifting and re-representation of large collections of electronic evidence and close collaboration between multiple investigators. In this paper, we present an account of sensemaking in three corporate legal investigations. We summarise information interaction processes in the form of a model which conceptualises processes as resource transformations triggered and shaped by both bottom-up and top-down resources. The model both extends upon and validates aspects of a previous account of investigative sensemaking (Pirolli & Card, 2005) and brings to the fore two kinds of focusing. Data focusing involves identifying and structuring information to draw out facts relevant to a given set of investigation issues. Issue focusing involves revising the issues in the light of new insights. Both are essential in sensemaking. We draw this distinction through detailed accounts of two activities in the investigations: reviewing documents for relevance and the creation and use of external representations. This provides a basis for a number of requirements for sensemaking support systems, particularly in collaborative settings, including: document annotation, dynamically associating documents of a given type; interacting with documents in fluid ways; linking external representation elements to evidence; filtering external representations in flexible ways; and viewing external representations at different levels of scale and fidelity. Finally, we use our data to analyse the conceptual elements within a 'line of enquiry‘. This provides a framework which can form the basis for partitioning information into hierarchically embedded enquiry 'contexts‘ within collaborative sensemaking systems

    A concept drift-tolerant case-base editing technique

    Full text link
    © 2015 Elsevier B.V. All rights reserved. The evolving nature and accumulating volume of real-world data inevitably give rise to the so-called "concept drift" issue, causing many deployed Case-Based Reasoning (CBR) systems to require additional maintenance procedures. In Case-base Maintenance (CBM), case-base editing strategies to revise the case-base have proven to be effective instance selection approaches for handling concept drift. Motivated by current issues related to CBR techniques in handling concept drift, we present a two-stage case-base editing technique. In Stage 1, we propose a Noise-Enhanced Fast Context Switch (NEFCS) algorithm, which targets the removal of noise in a dynamic environment, and in Stage 2, we develop an innovative Stepwise Redundancy Removal (SRR) algorithm, which reduces the size of the case-base by eliminating redundancies while preserving the case-base coverage. Experimental evaluations on several public real-world datasets show that our case-base editing technique significantly improves accuracy compared to other case-base editing approaches on concept drift tasks, while preserving its effectiveness on static tasks
    corecore