11 research outputs found

    Explainable Text Classification in Legal Document Review A Case Study of Explainable Predictive Coding

    Full text link
    In today's legal environment, lawsuits and regulatory investigations require companies to embark upon increasingly intensive data-focused engagements to identify, collect and analyze large quantities of data. When documents are staged for review the process can require companies to dedicate an extraordinary level of resources, both with respect to human resources, but also with respect to the use of technology-based techniques to intelligently sift through data. For several years, attorneys have been using a variety of tools to conduct this exercise, and most recently, they are accepting the use of machine learning techniques like text classification to efficiently cull massive volumes of data to identify responsive documents for use in these matters. In recent years, a group of AI and Machine Learning researchers have been actively researching Explainable AI. In an explainable AI system, actions or decisions are human understandable. In typical legal `document review' scenarios, a document can be identified as responsive, as long as one or more of the text snippets in a document are deemed responsive. In these scenarios, if predictive coding can be used to locate these responsive snippets, then attorneys could easily evaluate the model's document classification decision. When deployed with defined and explainable results, predictive coding can drastically enhance the overall quality and speed of the document review process by reducing the time it takes to review documents. The authors of this paper propose the concept of explainable predictive coding and simple explainable predictive coding methods to locate responsive snippets within responsive documents. We also report our preliminary experimental results using the data from an actual legal matter that entailed this type of document review.Comment: 2018 IEEE International Conference on Big Dat

    An Empirical Study of the Application of Machine Learning and Keyword Terms Methodologies to Privilege-Document Review Projects in Legal Matters

    Full text link
    Protecting privileged communications and data from disclosure is paramount for legal teams. Unrestricted legal advice, such as attorney-client communications or litigation strategy. are vital to the legal process and are exempt from disclosure in litigations or regulatory events. To protect this information from being disclosed, companies and outside counsel must review vast amounts of documents to determine those that contain privileged material. This process is extremely costly and time consuming. As data volumes increase, legal counsel employ methods to reduce the number of documents requiring review while balancing the need to ensure the protection of privileged information. Keyword searching is relied upon as a method to target privileged information and reduce document review populations. Keyword searches are effective at casting a wide net but return over inclusive results -- most of which do not contain privileged information -- and without detailed knowledge of the data, keyword lists cannot be crafted to find all privilege material. Overly-inclusive keyword searching can also be problematic, because even while it drives up costs, it also can cast `too far of a net' and thus produce unreliable results.To overcome these weaknesses of keyword searching, legal teams are using a new method to target privileged information called predictive modeling. Predictive modeling can successfully identify privileged material but little research has been published to confirm its effectiveness when compared to keyword searching. This paper summarizes a study of the effectiveness of keyword searching and predictive modeling when applied to real-world data. With this study, this group of collaborators wanted to examine and understand the benefits and weaknesses of both approaches to legal teams with identifying privilege material in document populations.Comment: 2018 IEEE International Conference on Big Data (Big Data

    Explainable Text Classification Techniques in Legal Document Review: Locating Rationales without Using Human Annotated Training Text Snippets

    Full text link
    US corporations regularly spend millions of dollars reviewing electronically-stored documents in legal matters. Recently, attorneys apply text classification to efficiently cull massive volumes of data to identify responsive documents for use in these matters. While text classification is regularly used to reduce the discovery costs of legal matters, it also faces a perception challenge: amongst lawyers, this technology is sometimes looked upon as a "black box". Put simply, no extra information is provided for attorneys to understand why documents are classified as responsive. In recent years, explainable machine learning has emerged as an active research area. In an explainable machine learning system, predictions or decisions made by a machine learning model are human understandable. In legal 'document review' scenarios, a document is responsive, because one or more of its small text snippets are deemed responsive. In these scenarios, if these responsive snippets can be located, then attorneys could easily evaluate the model's document classification decisions - this is especially important in the field of responsible AI. Our prior research identified that predictive models created using annotated training text snippets improved the precision of a model when compared to a model created using all of a set of documents' text as training. While interesting, manually annotating training text snippets is not generally practical during a legal document review. However, small increases in precision can drastically decrease the cost of large document reviews. Automating the identification of training text snippets without human review could then make the application of training text snippet-based models a practical approach.Comment: arXiv admin note: text overlap with arXiv:1912.0950
    corecore