Search CORE

8,393 research outputs found

CausaLM: Causal Model Explanation Through Counterfactual Language Models

Author: Feder Amir
Oved Nadav
Reichart Roi
Shalit Uri
Publication venue
Publication date: 14/06/2020
Field of study

Understanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all ML-based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help understand whether such biases exist, they do not distinguish between correlation and causation, and might be ill-suited for text-based models and for reasoning about high level language concepts. A key problem of estimating the causal effect of a concept of interest on a given model is that this estimation requires the generation of counterfactual examples, which is challenging with existing generation technology. To bridge that gap, we propose CausaLM, a framework for producing causal model explanations using counterfactual language representation models. Our approach is based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem. Concretely, we show that by carefully choosing auxiliary adversarial pre-training tasks, language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest, and be used to estimate its true causal effect on model performance. A byproduct of our method is a language representation model that is unaffected by the tested concept, which can be useful in mitigating unwanted bias ingrained in the data.Comment: Our code and data are available at: https://amirfeder.github.io/CausaLM/ Under review for the Computational Linguistics journa

arXiv.org e-Print Archive

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Author: Barman Raphaël
Clematide Simon
Ehrmann Maud
Kaplan Frédéric
Oliveira Sofia Ares
Publication venue: 'Centre pour la Communication Scientifique Directe (CCSD)'
Publication date: 14/12/2020
Field of study

The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

A comparison of word embedding-based extraction feature techniques and deep learning models of natural disaster messages classification

Author: Abadi Friska
Budiman Irwan
Faisal Mohammad Reza
Haekal Muhammad
Nugrahadi Dodon Turianto
Publication venue: Lublin University of Technology
Publication date: 30/06/2023
Field of study

The research aims to compare the classification performance of natural disaster messages classification from Twitter. The research experiment covers the analysis of three-word embedding-based extraction feature techniques and five different models of deep learning. The word embedding techniques that are used in this experiment are Word2Vec, fastText, and Glove. The experiment uses five deep learning models, namely three models of different dimensions of Convolutional Neural Network (1D CNN, 2D CNN, 3D CNN), Long Short-Term Memory Network (LSTM), and Bidirectional Encoder Representations for Transformer (BERT). The models are tested on four natural disaster messages datasets: earthquakes, floods, forest fires, and hurricanes. Those models are tested for classification performanc

Lublin University of Technology Journals

Automatic Recognition and Classification of Future Work Sentences from Academic Articles in a Specific Domain

Author: Hao Wenke
Li Zhicheng
Qian Yuchen
Wang Yuzhuo
Xiang Yi
Zhang Chengzhi
Publication venue
Publication date: 28/12/2022
Field of study

Future work sentences (FWS) are the particular sentences in academic papers that contain the author's description of their proposed follow-up research direction. This paper presents methods to automatically extract FWS from academic papers and classify them according to the different future directions embodied in the paper's content. FWS recognition methods will enable subsequent researchers to locate future work sentences more accurately and quickly and reduce the time and cost of acquiring the corpus. The current work on automatic identification of future work sentences is relatively small, and the existing research cannot accurately identify FWS from academic papers, and thus cannot conduct data mining on a large scale. Furthermore, there are many aspects to the content of future work, and the subdivision of the content is conducive to the analysis of specific development directions. In this paper, Nature Language Processing (NLP) is used as a case study, and FWS are extracted from academic papers and classified into different types. We manually build an annotated corpus with six different types of FWS. Then, automatic recognition and classification of FWS are implemented using machine learning models, and the performance of these models is compared based on the evaluation metrics. The results show that the Bernoulli Bayesian model has the best performance in the automatic recognition task, with the Macro F1 reaching 90.73%, and the SCIBERT model has the best performance in the automatic classification task, with the weighted average F1 reaching 72.63%. Finally, we extract keywords from FWS and gain a deep understanding of the key content described in FWS, and we also demonstrate that content determination in FWS will be reflected in the subsequent research work by measuring the similarity between future work sentences and the abstracts

arXiv.org e-Print Archive

Evaluating The Explanation of Black Box Decision for Text Classification

Author: Gücükbel Esra
Publication venue
Publication date: 01/01/2023
Field of study

Through progressively evolved technology, applications of machine learning and deep learning methods become prevalent with the increased size of the collected data and the data processing capacity. Among these methods, deep neural networks achieve high accuracy results in various classification tasks; nonetheless, they have the characteristic of opaqueness that causes called them black box models. As a trade-off, black box models fall short in terms of interpretability by humans. Without a supportive explanation of why the model reaches a particular conclusion, the output causes an intrusive situation for decision-makers who will take action with the outcome of predictions. In this context, various explanation methods have been developed to enhance the interpretability of black box models. LIME, SHAP, and Integrated Gradients techniques are examples of more adaptive approaches due to their welldeveloped and easy-to-use libraries. While LIME and SHAP are post-hoc analysis tools, Integrated Gradients provide model-specific outcomes using the model’s inner workings. In this thesis, four widely used explanation methods are quantitatively evaluated for text classification tasks using the Bidirectional LSTM model and DistillBERT model on four benchmark data sets, such as SMS Spam, IMDB Reviews, Yelp Polarity, and Fake News data sets. The results of the experiments reveal that analysis methods and evaluation metrics provide an auspicious foundation for assessing the strengths and weaknesses of explanation methods

Institutional Repository of the Freie Universität Berlin

PoliToHFI at SemEval-2023 Task 6: Leveraging Entity-Aware and Hierarchical Transformers For Legal Entity Recognition and Court Judgment Prediction

Author: Alkis Koudounas
Elena Baralis
Eliana Pastor
Francesco Tarasconi
Irene Benedetto
Lorenzo Vaiani
Luca Cagliero
Publication venue: ACL Association for Computational Linguistics
Publication date: 01/01/2023
Field of study

The use of Natural Language Processing techniques in the legal domain has become established for supporting attorneys and domain experts in content retrieval and decision-making. However, understanding the legal text poses relevant challenges in the recognition of domain-specific entities and the adaptation and explanation of predictive models. This paper addresses the Legal Entity Name Recognition (L-NER) and Court judgment Prediction (CPJ) and Explanation (CJPE) tasks. The L-NER solution explores the use of various transformer-based models, including an entity-aware method attending domain-specific entities. The CJPE proposed method relies on hierarchical BERT-based classifiers combined with local input attribution explainers. We propose a broad comparison of eXplainable AI methodologies along with a novel approach based on NER. For the LNER task, the experimental results remark on the importance of domain-specific pre-training. For CJP our lightweight solution shows performance in line with existing approaches, and our NER-boosted explanations show promising CJPE results in terms of the conciseness of the prediction explanations

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)