27,090 research outputs found
Explainable Text Classification in Legal Document Review A Case Study of Explainable Predictive Coding
In today's legal environment, lawsuits and regulatory investigations require
companies to embark upon increasingly intensive data-focused engagements to
identify, collect and analyze large quantities of data. When documents are
staged for review the process can require companies to dedicate an
extraordinary level of resources, both with respect to human resources, but
also with respect to the use of technology-based techniques to intelligently
sift through data. For several years, attorneys have been using a variety of
tools to conduct this exercise, and most recently, they are accepting the use
of machine learning techniques like text classification to efficiently cull
massive volumes of data to identify responsive documents for use in these
matters. In recent years, a group of AI and Machine Learning researchers have
been actively researching Explainable AI. In an explainable AI system, actions
or decisions are human understandable. In typical legal `document review'
scenarios, a document can be identified as responsive, as long as one or more
of the text snippets in a document are deemed responsive. In these scenarios,
if predictive coding can be used to locate these responsive snippets, then
attorneys could easily evaluate the model's document classification decision.
When deployed with defined and explainable results, predictive coding can
drastically enhance the overall quality and speed of the document review
process by reducing the time it takes to review documents. The authors of this
paper propose the concept of explainable predictive coding and simple
explainable predictive coding methods to locate responsive snippets within
responsive documents. We also report our preliminary experimental results using
the data from an actual legal matter that entailed this type of document
review.Comment: 2018 IEEE International Conference on Big Dat
Building a Document Genre Corpus: a Profile of the KRYS I Corpus
This paper describes the KRYS I corpus (http://www.krys-corpus.eu/Info.html), consisting of documents classified into 70 genre classes. It has been constructed as part of an effort to automate document genre classification as distinct from topic detection. Previously there has been very little work on building corpora of texts which have been classified using a non-topical genre palette. The reason for this is partly due to the fact that genre as a concept, is rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation ([11]). The usefulness of genre in everyday information search is only now starting to be recognised and there is no genre classification schema that has been consolidated to have applicable value in this direction. By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the information gathering and seeking behaviour and the role of genre in these activities, as well as a way forward for creating a better corpus for testing automated genre classification tasks and the application of these tasks to other domains
Classification of Radiology Reports Using Neural Attention Models
The electronic health record (EHR) contains a large amount of
multi-dimensional and unstructured clinical data of significant operational and
research value. Distinguished from previous studies, our approach embraces a
double-annotated dataset and strays away from obscure "black-box" models to
comprehensive deep learning models. In this paper, we present a novel neural
attention mechanism that not only classifies clinically important findings.
Specifically, convolutional neural networks (CNN) with attention analysis are
used to classify radiology head computed tomography reports based on five
categories that radiologists would account for in assessing acute and
communicable findings in daily practice. The experiments show that our CNN
attention models outperform non-neural models, especially when trained on a
larger dataset. Our attention analysis demonstrates the intuition behind the
classifier's decision by generating a heatmap that highlights attended terms
used by the CNN model; this is valuable when potential downstream medical
decisions are to be performed by human experts or the classifier information is
to be used in cohort construction such as for epidemiological studies
- …