28,085 research outputs found
Looking Deeper into Deep Learning Model: Attribution-based Explanations of TextCNN
Layer-wise Relevance Propagation (LRP) and saliency maps have been recently
used to explain the predictions of Deep Learning models, specifically in the
domain of text classification. Given different attribution-based explanations
to highlight relevant words for a predicted class label, experiments based on
word deleting perturbation is a common evaluation method. This word removal
approach, however, disregards any linguistic dependencies that may exist
between words or phrases in a sentence, which could semantically guide a
classifier to a particular prediction. In this paper, we present a
feature-based evaluation framework for comparing the two attribution methods on
customer reviews (public data sets) and Customer Due Diligence (CDD) extracted
reports (corporate data set). Instead of removing words based on the relevance
score, we investigate perturbations based on embedded features removal from
intermediate layers of Convolutional Neural Networks. Our experimental study is
carried out on embedded-word, embedded-document, and embedded-ngrams
explanations. Using the proposed framework, we provide a visualization tool to
assist analysts in reasoning toward the model's final prediction.Comment: NIPS 2018 Workshop on Challenges and Opportunities for AI in
Financial Services: the Impact of Fairness, Explainability, Accuracy, and
Privacy, Montr\'eal, Canad
Recommended from our members
REST: A thread embedding approach for identifying and classifying user-specified information in security forums
REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums
How can we extract useful information from a security forum? We focus on
identifying threads of interest to a security professional: (a) alerts of
worrisome events, such as attacks, (b) offering of malicious services and
products, (c) hacking information to perform malicious acts, and (d) useful
security-related experiences. The analysis of security forums is in its infancy
despite several promising recent works. Novel approaches are needed to address
the challenges in this domain: (a) the difficulty in specifying the "topics" of
interest efficiently, and (b) the unstructured and informal nature of the text.
We propose, REST, a systematic methodology to: (a) identify threads of interest
based on a, possibly incomplete, bag of words, and (b) classify them into one
of the four classes above. The key novelty of the work is a multi-step weighted
embedding approach: we project words, threads and classes in appropriate
embedding spaces and establish relevance and similarity there. We evaluate our
method with real data from three security forums with a total of 164k posts and
21K threads. First, REST robustness to initial keyword selection can extend the
user-provided keyword set and thus, it can recover from missing keywords.
Second, REST categorizes the threads into the classes of interest with superior
accuracy compared to five other methods: REST exhibits an accuracy between
63.3-76.9%. We see our approach as a first step for harnessing the wealth of
information of online forums in a user-friendly way, since the user can loosely
specify her keywords of interest
Intelligent Word Embeddings of Free-Text Radiology Reports
Radiology reports are a rich resource for advancing deep learning
applications in medicine by leveraging the large volume of data continuously
being updated, integrated, and shared. However, there are significant
challenges as well, largely due to the ambiguity and subtlety of natural
language. We propose a hybrid strategy that combines semantic-dictionary
mapping and word2vec modeling for creating dense vector embeddings of free-text
radiology reports. Our method leverages the benefits of both
semantic-dictionary mapping as well as unsupervised learning. Using the vector
representation, we automatically classify the radiology reports into three
classes denoting confidence in the diagnosis of intracranial hemorrhage by the
interpreting radiologist. We performed experiments with varying hyperparameter
settings of the word embeddings and a range of different classifiers. Best
performance achieved was a weighted precision of 88% and weighted recall of
90%. Our work offers the potential to leverage unstructured electronic health
record data by allowing direct analysis of narrative clinical notes.Comment: AMIA Annual Symposium 201
- …