3,068 research outputs found
A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research
The remarkable achievements of Artificial Intelligence (AI) algorithms,
particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their
extensive deployment across multiple sectors, including Software Engineering
(SE). However, due to their black-box nature, these promising AI-driven SE
models are still far from being deployed in practice. This lack of
explainability poses unwanted risks for their applications in critical tasks,
such as vulnerability detection, where decision-making transparency is of
paramount importance. This paper endeavors to elucidate this interdisciplinary
domain by presenting a systematic literature review of approaches that aim to
improve the explainability of AI models within the context of SE. The review
canvasses work appearing in the most prominent SE & AI conferences and
journals, and spans 63 papers across 21 unique SE tasks. Based on three key
Research Questions (RQs), we aim to (1) summarize the SE tasks where XAI
techniques have shown success to date; (2) classify and analyze different XAI
techniques; and (3) investigate existing evaluation approaches. Based on our
findings, we identified a set of challenges remaining to be addressed in
existing studies, together with a roadmap highlighting potential opportunities
we deemed appropriate and important for future work.Comment: submitted to ACM Computing Surveys. arXiv admin note: text overlap
with arXiv:2202.06840 by other author
Explainable Software Defect Prediction from Cross Company Project Metrics Using Machine Learning
Predicting the number of defects in a project is critical for project test managers to allocate budget, resources, and schedule for testing, support and maintenance efforts. Software Defect Prediction models predict the number of defects in given projects after training the model with historical defect related information. The majority of defect prediction studies focused on predicting defect-prone modules from methods, and class-level static information, whereas this study predicts defects from project-level information based on a cross-company project dataset. This study utilizes software sizing metrics, effort metrics, and defect density information, and focuses on developing defect prediction models that apply various machine learning algorithms. One notable issue in existing defect prediction studies is the lack of transparency in the developed models. Consequently, the explain-ability of the developed model has been demonstrated using the state-of-the-art post-hoc model-agnostic method called Shapley Additive exPlanations (SHAP). Finally, important features for predicting defects from cross-company project information were identified
Morphological Image Analysis and Feature Extraction for Reasoning with AI-based Defect Detection and Classification Models
As the use of artificial intelligent (AI) models becomes more prevalent in
industries such as engineering and manufacturing, it is essential that these
models provide transparent reasoning behind their predictions. This paper
proposes the AI-Reasoner, which extracts the morphological characteristics of
defects (DefChars) from images and utilises decision trees to reason with the
DefChar values. Thereafter, the AI-Reasoner exports visualisations (i.e.
charts) and textual explanations to provide insights into outputs made by
masked-based defect detection and classification models. It also provides
effective mitigation strategies to enhance data pre-processing and overall
model performance. The AI-Reasoner was tested on explaining the outputs of an
IE Mask R-CNN model using a set of 366 images containing defects. The results
demonstrated its effectiveness in explaining the IE Mask R-CNN model's
predictions. Overall, the proposed AI-Reasoner provides a solution for
improving the performance of AI models in industrial applications that require
defect analysis.Comment: 8 pages, 3 figures, 5 tables; submitted to 2023 IEEE symposium series
on computational intelligence (SSCI
Silent Vulnerable Dependency Alert Prediction with Vulnerability Key Aspect Explanation
Due to convenience, open-source software is widely used. For beneficial
reasons, open-source maintainers often fix the vulnerabilities silently,
exposing their users unaware of the updates to threats. Previous works all
focus on black-box binary detection of the silent dependency alerts that suffer
from high false-positive rates. Open-source software users need to analyze and
explain AI prediction themselves. Explainable AI becomes remarkable as a
complementary of black-box AI models, providing details in various forms to
explain AI decisions. Noticing there is still no technique that can discover
silent dependency alert on time, in this work, we propose a framework using an
encoder-decoder model with a binary detector to provide explainable silent
dependency alert prediction. Our model generates 4 types of vulnerability key
aspects including vulnerability type, root cause, attack vector, and impact to
enhance the trustworthiness and users' acceptance to alert prediction. By
experiments with several models and inputs, we confirm CodeBERT with both
commit messages and code changes achieves the best results. Our user study
shows that explainable alert predictions can help users find silent dependency
alert more easily than black-box predictions. To the best of our knowledge,
this is the first research work on the application of Explainable AI in silent
dependency alert prediction, which opens the door of the related domains
Explainable AI for Machine Fault Diagnosis: Understanding Features' Contribution in Machine Learning Models for Industrial Condition Monitoring
Although the effectiveness of machine learning (ML) for machine diagnosis has been widely established, the interpretation of the diagnosis outcomes is still an open issue. Machine learning models behave as black boxes; therefore, the contribution given by each of the selected features to the diagnosis is not transparent to the user. This work is aimed at investigating the capabilities of the SHapley Additive exPlanation (SHAP) to identify the most important features for fault detection and classification in condition monitoring programs for rotating machinery. The authors analyse the case of medium-sized bearings of industrial interest. Namely, vibration data were collected for different health states from the test rig for industrial bearings available at the Mechanical Engineering Laboratory of Politecnico di Torino. The Support Vector Machine (SVM) and k-Nearest Neighbour (kNN) diagnosis models are explained by means of the SHAP. Accuracies higher than 98.5% are achieved for both the models using the SHAP as a criterion for feature selection. It is found that the skewness and the shape factor of the vibration signal have the greatest impact on the models’ outcomes
Explainable AI for Android Malware Detection: Towards Understanding Why the Models Perform So Well?
Machine learning (ML)-based Android malware detection has been one of the
most popular research topics in the mobile security community. An increasing
number of research studies have demonstrated that machine learning is an
effective and promising approach for malware detection, and some works have
even claimed that their proposed models could achieve 99\% detection accuracy,
leaving little room for further improvement. However, numerous prior studies
have suggested that unrealistic experimental designs bring substantial biases,
resulting in over-optimistic performance in malware detection. Unlike previous
research that examined the detection performance of ML classifiers to locate
the causes, this study employs Explainable AI (XAI) approaches to explore what
ML-based models learned during the training process, inspecting and
interpreting why ML-based malware classifiers perform so well under unrealistic
experimental settings. We discover that temporal sample inconsistency in the
training dataset brings over-optimistic classification performance (up to 99\%
F1 score and accuracy). Importantly, our results indicate that ML models
classify malware based on temporal differences between malware and benign,
rather than the actual malicious behaviors. Our evaluation also confirms the
fact that unrealistic experimental designs lead to not only unrealistic
detection performance but also poor reliability, posing a significant obstacle
to real-world applications. These findings suggest that XAI approaches should
be used to help practitioners/researchers better understand how do AI/ML models
(i.e., malware detection) work -- not just focusing on accuracy improvement.Comment: Accepted by the 33rd IEEE International Symposium on Software
Reliability Engineering (ISSRE 2022
In-situ surface porosity prediction in DED (directed energy deposition) printed SS316L parts using multimodal sensor fusion
This study aims to relate the time-frequency patterns of acoustic emission
(AE) and other multi-modal sensor data collected in a hybrid directed energy
deposition (DED) process to the pore formations at high spatial (0.5 mm) and
time (< 1ms) resolutions. Adapting an explainable AI method in LIME (Local
Interpretable Model-Agnostic Explanations), certain high-frequency waveform
signatures of AE are to be attributed to two major pathways for pore formation
in a DED process, namely, spatter events and insufficient fusion between
adjacent printing tracks from low heat input. This approach opens an exciting
possibility to predict, in real-time, the presence of a pore in every voxel
(0.5 mm in size) as they are printed, a major leap forward compared to prior
efforts. Synchronized multimodal sensor data including force, AE, vibration and
temperature were gathered while an SS316L material sample was printed and
subsequently machined. A deep convolution neural network classifier was used to
identify the presence of pores on a voxel surface based on time-frequency
patterns (spectrograms) of the sensor data collected during the process chain.
The results suggest signals collected during DED were more sensitive compared
to those from machining for detecting porosity in voxels (classification test
accuracy of 87%). The underlying explanations drawn from LIME analysis suggests
that energy captured in high frequency AE waveforms are 33% lower for porous
voxels indicating a relatively lower laser-material interaction in the melt
pool, and hence insufficient fusion and poor overlap between adjacent printing
tracks. The porous voxels for which spatter events were prevalent during
printing had about 27% higher energy contents in the high frequency AE band
compared to other porous voxels. These signatures from AE signal can further
the understanding of pore formation from spatter and insufficient fusion
- …