Search CORE

26,266 research outputs found

Explainable Text Classification in Legal Document Review A Case Study of Explainable Predictive Coding

Author: Chhatwal Rishi
Gronvall Peter
Huber-Fliflet Nathaniel
Keeling Robert
Zhang Jianping
Zhao Haozhen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/04/2019
Field of study

In today's legal environment, lawsuits and regulatory investigations require companies to embark upon increasingly intensive data-focused engagements to identify, collect and analyze large quantities of data. When documents are staged for review the process can require companies to dedicate an extraordinary level of resources, both with respect to human resources, but also with respect to the use of technology-based techniques to intelligently sift through data. For several years, attorneys have been using a variety of tools to conduct this exercise, and most recently, they are accepting the use of machine learning techniques like text classification to efficiently cull massive volumes of data to identify responsive documents for use in these matters. In recent years, a group of AI and Machine Learning researchers have been actively researching Explainable AI. In an explainable AI system, actions or decisions are human understandable. In typical legal `document review' scenarios, a document can be identified as responsive, as long as one or more of the text snippets in a document are deemed responsive. In these scenarios, if predictive coding can be used to locate these responsive snippets, then attorneys could easily evaluate the model's document classification decision. When deployed with defined and explainable results, predictive coding can drastically enhance the overall quality and speed of the document review process by reducing the time it takes to review documents. The authors of this paper propose the concept of explainable predictive coding and simple explainable predictive coding methods to locate responsive snippets within responsive documents. We also report our preliminary experimental results using the data from an actual legal matter that entailed this type of document review.Comment: 2018 IEEE International Conference on Big Dat

arXiv.org e-Print Archive

Crossref

Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

Author: Acevedo Francisco
Armengol Victor Diego
Bao Yujia
Barzilay Regina
Braun Danielle
Deng Zhengyi
Hughes Kevin S
Kim Heeyoon
Ouardaoui Nofal
Parmigiani Giovanni
Wang Cathy
Wang Yan
Publication venue
Publication date: 24/04/2019
Field of study

PURPOSE: The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools monitoring and prioritizing the literature to understand the clinical implications of the pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance (risk of cancer for germline mutation carriers) or prevalence of germline genetic mutations. METHODS: We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated dataset for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule based on the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule based on the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence. RESULTS: For penetrance classification, we annotated 3740 paper titles and abstracts and used 60% for training the model, 20% for tuning the model, and 20% for evaluating the model. The SVM model achieves 89.53% accuracy (percentage of papers that were correctly classified) while the CNN model achieves 88.95 % accuracy. For prevalence classification, we annotated 3753 paper titles and abstracts. The SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 % accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date

arXiv.org e-Print Archive

DSpace@MIT

The Teacher\u27s Role in Facilitating Memory and Study Strategy Development in the Elementary School Classroom

Author: Hamilton Elizabeth Burney
Hart Silvia
Johnson Terry
Leal Linda
Moely Barbara
Rao Nirmala
Santulli Kevin
Publication venue: Digital Commons @ George Fox University
Publication date: 01/01/1992
Field of study

The efforts of 69 elementary school teachers to instruct children in cognitive processing activities were observed. Although the teaching of such activities was relatively infrequent, it varied by grade (occurring more often in grades 2-3 than in higher or lower grades) and by the content of instruction. Teachers of grade 4 and above more often provided rationales for the use of cognitive strategies than did teachers of younger children. In a second study, children of three achievement levels were selected from classrooms in which teachers varied in their use of suggestions regarding cognitive processes. Subsequent to training in the use of a memory strategy, children\u27s performance on a maintenance trial was evaluated: Among average and low achievers, those whose teachers were relatively high in strategy suggestions showed better maintenance and more deliberate use of the trained strategy than did children whose teachers rarely made strategy suggestions. The role of school experience in the development of children\u27s memory skills is discussed

Digital Commons @ George Fox University

Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport

Author: Lei Tao
Swanson Kyle
Yu Lili
Publication venue
Publication date: 01/01/2020
Field of study

Selecting input features of top relevance has become a popular method for building self-explaining models. In this work, we extend this selective rationalization approach to text matching, where the goal is to jointly select and align text pieces, such as tokens or sentences, as a justification for the downstream prediction. Our approach employs optimal transport (OT) to find a minimal cost alignment between the inputs. However, directly applying OT often produces dense and therefore uninterpretable alignments. To overcome this limitation, we introduce novel constrained variants of the OT problem that result in highly sparse alignments with controllable sparsity. Our model is end-to-end differentiable using the Sinkhorn algorithm for OT and can be trained without any alignment annotations. We evaluate our model on the StackExchange, MultiNews, e-SNLI, and MultiRC datasets. Our model achieves very sparse rationale selections with high fidelity while preserving prediction accuracy compared to strong attention baseline models.Comment: To appear at ACL 202

arXiv.org e-Print Archive

Crossref

Explainable Text Classification Techniques in Legal Document Review: Locating Rationales without Using Human Annotated Training Text Snippets

Author: Gronvall Peter
Huber-Fliflet Nathaniel
Mahoney Christian
Zhang Jianping
Publication venue
Publication date: 15/11/2023
Field of study

US corporations regularly spend millions of dollars reviewing electronically-stored documents in legal matters. Recently, attorneys apply text classification to efficiently cull massive volumes of data to identify responsive documents for use in these matters. While text classification is regularly used to reduce the discovery costs of legal matters, it also faces a perception challenge: amongst lawyers, this technology is sometimes looked upon as a "black box". Put simply, no extra information is provided for attorneys to understand why documents are classified as responsive. In recent years, explainable machine learning has emerged as an active research area. In an explainable machine learning system, predictions or decisions made by a machine learning model are human understandable. In legal 'document review' scenarios, a document is responsive, because one or more of its small text snippets are deemed responsive. In these scenarios, if these responsive snippets can be located, then attorneys could easily evaluate the model's document classification decisions - this is especially important in the field of responsible AI. Our prior research identified that predictive models created using annotated training text snippets improved the precision of a model when compared to a model created using all of a set of documents' text as training. While interesting, manually annotating training text snippets is not generally practical during a legal document review. However, small increases in precision can drastically decrease the cost of large document reviews. Automating the identification of training text snippets without human review could then make the application of training text snippet-based models a practical approach.Comment: arXiv admin note: text overlap with arXiv:1912.0950

arXiv.org e-Print Archive