8,393 research outputs found
CausaLM: Causal Model Explanation Through Counterfactual Language Models
Understanding predictions made by deep neural networks is notoriously
difficult, but also crucial to their dissemination. As all ML-based methods,
they are as good as their training data, and can also capture unwanted biases.
While there are tools that can help understand whether such biases exist, they
do not distinguish between correlation and causation, and might be ill-suited
for text-based models and for reasoning about high level language concepts. A
key problem of estimating the causal effect of a concept of interest on a given
model is that this estimation requires the generation of counterfactual
examples, which is challenging with existing generation technology. To bridge
that gap, we propose CausaLM, a framework for producing causal model
explanations using counterfactual language representation models. Our approach
is based on fine-tuning of deep contextualized embedding models with auxiliary
adversarial tasks derived from the causal graph of the problem. Concretely, we
show that by carefully choosing auxiliary adversarial pre-training tasks,
language representation models such as BERT can effectively learn a
counterfactual representation for a given concept of interest, and be used to
estimate its true causal effect on model performance. A byproduct of our method
is a language representation model that is unaffected by the tested concept,
which can be useful in mitigating unwanted bias ingrained in the data.Comment: Our code and data are available at:
https://amirfeder.github.io/CausaLM/ Under review for the Computational
Linguistics journa
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
The massive amounts of digitized historical documents acquired over the last
decades naturally lend themselves to automatic processing and exploration.
Research work seeking to automatically process facsimiles and extract
information thereby are multiplying with, as a first essential step, document
layout analysis. If the identification and categorization of segments of
interest in document images have seen significant progress over the last years
thanks to deep learning techniques, many challenges remain with, among others,
the use of finer-grained segmentation typologies and the consideration of
complex, heterogeneous documents such as historical newspapers. Besides, most
approaches consider visual features only, ignoring textual signal. In this
context, we introduce a multimodal approach for the semantic segmentation of
historical newspapers that combines visual and textual features. Based on a
series of experiments on diachronic Swiss and Luxembourgish newspapers, we
investigate, among others, the predictive power of visual and textual features
and their capacity to generalize across time and sources. Results show
consistent improvement of multimodal models in comparison to a strong visual
baseline, as well as better robustness to high material variance
A comparison of word embedding-based extraction feature techniques and deep learning models of natural disaster messages classification
The research aims to compare the classification performance of natural disaster messages classification from Twitter. The research experiment covers the analysis of three-word embedding-based extraction feature techniques and five different models of deep learning. The word embedding techniques that are used in this experiment are Word2Vec, fastText, and Glove. The experiment uses five deep learning models, namely three models of different dimensions of Convolutional Neural Network (1D CNN, 2D CNN, 3D CNN), Long Short-Term Memory Network (LSTM), and Bidirectional Encoder Representations for Transformer (BERT). The models are tested on four natural disaster messages datasets: earthquakes, floods, forest fires, and hurricanes. Those models are tested for classification performanc
Automatic Recognition and Classification of Future Work Sentences from Academic Articles in a Specific Domain
Future work sentences (FWS) are the particular sentences in academic papers
that contain the author's description of their proposed follow-up research
direction. This paper presents methods to automatically extract FWS from
academic papers and classify them according to the different future directions
embodied in the paper's content. FWS recognition methods will enable subsequent
researchers to locate future work sentences more accurately and quickly and
reduce the time and cost of acquiring the corpus. The current work on automatic
identification of future work sentences is relatively small, and the existing
research cannot accurately identify FWS from academic papers, and thus cannot
conduct data mining on a large scale. Furthermore, there are many aspects to
the content of future work, and the subdivision of the content is conducive to
the analysis of specific development directions. In this paper, Nature Language
Processing (NLP) is used as a case study, and FWS are extracted from academic
papers and classified into different types. We manually build an annotated
corpus with six different types of FWS. Then, automatic recognition and
classification of FWS are implemented using machine learning models, and the
performance of these models is compared based on the evaluation metrics. The
results show that the Bernoulli Bayesian model has the best performance in the
automatic recognition task, with the Macro F1 reaching 90.73%, and the SCIBERT
model has the best performance in the automatic classification task, with the
weighted average F1 reaching 72.63%. Finally, we extract keywords from FWS and
gain a deep understanding of the key content described in FWS, and we also
demonstrate that content determination in FWS will be reflected in the
subsequent research work by measuring the similarity between future work
sentences and the abstracts
Evaluating The Explanation of Black Box Decision for Text Classification
Through progressively evolved technology, applications of machine learning
and deep learning methods become prevalent with the increased size of the
collected data and the data processing capacity. Among these methods, deep
neural networks achieve high accuracy results in various classification tasks;
nonetheless, they have the characteristic of opaqueness that causes called them
black box models. As a trade-off, black box models fall short in terms of interpretability
by humans. Without a supportive explanation of why the model
reaches a particular conclusion, the output causes an intrusive situation for
decision-makers who will take action with the outcome of predictions. In this
context, various explanation methods have been developed to enhance the
interpretability of black box models. LIME, SHAP, and Integrated Gradients
techniques are examples of more adaptive approaches due to their welldeveloped
and easy-to-use libraries. While LIME and SHAP are post-hoc
analysis tools, Integrated Gradients provide model-specific outcomes using the
model’s inner workings. In this thesis, four widely used explanation methods
are quantitatively evaluated for text classification tasks using the Bidirectional
LSTM model and DistillBERT model on four benchmark data sets, such as
SMS Spam, IMDB Reviews, Yelp Polarity, and Fake News data sets. The results
of the experiments reveal that analysis methods and evaluation metrics
provide an auspicious foundation for assessing the strengths and weaknesses of
explanation methods
PoliToHFI at SemEval-2023 Task 6: Leveraging Entity-Aware and Hierarchical Transformers For Legal Entity Recognition and Court Judgment Prediction
The use of Natural Language Processing techniques in the legal domain has become established for supporting attorneys and domain experts in content retrieval and decision-making. However, understanding the legal text poses relevant challenges in the recognition of domain-specific entities and the adaptation and explanation of predictive models. This paper addresses the Legal Entity Name Recognition (L-NER) and Court judgment Prediction (CPJ) and Explanation (CJPE) tasks. The L-NER solution explores the use of various transformer-based models, including an entity-aware method attending domain-specific entities. The CJPE proposed method relies on hierarchical BERT-based classifiers combined with local input attribution explainers. We propose a broad comparison of eXplainable AI methodologies along with a novel approach based on NER. For the LNER task, the experimental results remark on the importance of domain-specific pre-training. For CJP our lightweight solution shows performance in line with existing approaches, and our NER-boosted explanations show promising CJPE results in terms of the conciseness of the prediction explanations
- …