26,266 research outputs found
Explainable Text Classification in Legal Document Review A Case Study of Explainable Predictive Coding
In today's legal environment, lawsuits and regulatory investigations require
companies to embark upon increasingly intensive data-focused engagements to
identify, collect and analyze large quantities of data. When documents are
staged for review the process can require companies to dedicate an
extraordinary level of resources, both with respect to human resources, but
also with respect to the use of technology-based techniques to intelligently
sift through data. For several years, attorneys have been using a variety of
tools to conduct this exercise, and most recently, they are accepting the use
of machine learning techniques like text classification to efficiently cull
massive volumes of data to identify responsive documents for use in these
matters. In recent years, a group of AI and Machine Learning researchers have
been actively researching Explainable AI. In an explainable AI system, actions
or decisions are human understandable. In typical legal `document review'
scenarios, a document can be identified as responsive, as long as one or more
of the text snippets in a document are deemed responsive. In these scenarios,
if predictive coding can be used to locate these responsive snippets, then
attorneys could easily evaluate the model's document classification decision.
When deployed with defined and explainable results, predictive coding can
drastically enhance the overall quality and speed of the document review
process by reducing the time it takes to review documents. The authors of this
paper propose the concept of explainable predictive coding and simple
explainable predictive coding methods to locate responsive snippets within
responsive documents. We also report our preliminary experimental results using
the data from an actual legal matter that entailed this type of document
review.Comment: 2018 IEEE International Conference on Big Dat
Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes
PURPOSE: The medical literature relevant to germline genetics is growing
exponentially. Clinicians need tools monitoring and prioritizing the literature
to understand the clinical implications of the pathogenic genetic variants. We
developed and evaluated two machine learning models to classify abstracts as
relevant to the penetrance (risk of cancer for germline mutation carriers) or
prevalence of germline genetic mutations. METHODS: We conducted literature
searches in PubMed and retrieved paper titles and abstracts to create an
annotated dataset for training and evaluating the two machine learning
classification models. Our first model is a support vector machine (SVM) which
learns a linear decision rule based on the bag-of-ngrams representation of each
title and abstract. Our second model is a convolutional neural network (CNN)
which learns a complex nonlinear decision rule based on the raw title and
abstract. We evaluated the performance of the two models on the classification
of papers as relevant to penetrance or prevalence. RESULTS: For penetrance
classification, we annotated 3740 paper titles and abstracts and used 60% for
training the model, 20% for tuning the model, and 20% for evaluating the model.
The SVM model achieves 89.53% accuracy (percentage of papers that were
correctly classified) while the CNN model achieves 88.95 % accuracy. For
prevalence classification, we annotated 3753 paper titles and abstracts. The
SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 %
accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts
as relevant to penetrance or prevalence. By facilitating literature review,
this tool could help clinicians and researchers keep abreast of the burgeoning
knowledge of gene-cancer associations and keep the knowledge bases for clinical
decision support tools up to date
The Teacher\u27s Role in Facilitating Memory and Study Strategy Development in the Elementary School Classroom
The efforts of 69 elementary school teachers to instruct children in cognitive processing activities were observed. Although the teaching of such activities was relatively infrequent, it varied by grade (occurring more often in grades 2-3 than in higher or lower grades) and by the content of instruction. Teachers of grade 4 and above more often provided rationales for the use of cognitive strategies than did teachers of younger children. In a second study, children of three achievement levels were selected from classrooms in which teachers varied in their use of suggestions regarding cognitive processes. Subsequent to training in the use of a memory strategy, children\u27s performance on a maintenance trial was evaluated: Among average and low achievers, those whose teachers were relatively high in strategy suggestions showed better maintenance and more deliberate use of the trained strategy than did children whose teachers rarely made strategy suggestions. The role of school experience in the development of children\u27s memory skills is discussed
Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport
Selecting input features of top relevance has become a popular method for
building self-explaining models. In this work, we extend this selective
rationalization approach to text matching, where the goal is to jointly select
and align text pieces, such as tokens or sentences, as a justification for the
downstream prediction. Our approach employs optimal transport (OT) to find a
minimal cost alignment between the inputs. However, directly applying OT often
produces dense and therefore uninterpretable alignments. To overcome this
limitation, we introduce novel constrained variants of the OT problem that
result in highly sparse alignments with controllable sparsity. Our model is
end-to-end differentiable using the Sinkhorn algorithm for OT and can be
trained without any alignment annotations. We evaluate our model on the
StackExchange, MultiNews, e-SNLI, and MultiRC datasets. Our model achieves very
sparse rationale selections with high fidelity while preserving prediction
accuracy compared to strong attention baseline models.Comment: To appear at ACL 202
Explainable Text Classification Techniques in Legal Document Review: Locating Rationales without Using Human Annotated Training Text Snippets
US corporations regularly spend millions of dollars reviewing
electronically-stored documents in legal matters. Recently, attorneys apply
text classification to efficiently cull massive volumes of data to identify
responsive documents for use in these matters. While text classification is
regularly used to reduce the discovery costs of legal matters, it also faces a
perception challenge: amongst lawyers, this technology is sometimes looked upon
as a "black box". Put simply, no extra information is provided for attorneys to
understand why documents are classified as responsive. In recent years,
explainable machine learning has emerged as an active research area. In an
explainable machine learning system, predictions or decisions made by a machine
learning model are human understandable. In legal 'document review' scenarios,
a document is responsive, because one or more of its small text snippets are
deemed responsive. In these scenarios, if these responsive snippets can be
located, then attorneys could easily evaluate the model's document
classification decisions - this is especially important in the field of
responsible AI. Our prior research identified that predictive models created
using annotated training text snippets improved the precision of a model when
compared to a model created using all of a set of documents' text as training.
While interesting, manually annotating training text snippets is not generally
practical during a legal document review. However, small increases in precision
can drastically decrease the cost of large document reviews. Automating the
identification of training text snippets without human review could then make
the application of training text snippet-based models a practical approach.Comment: arXiv admin note: text overlap with arXiv:1912.0950
- …