58 research outputs found
Sanity Checks Revisited: An Exploration to Repair the Model Parameter Randomisation Test
The Model Parameter Randomisation Test (MPRT) is widely acknowledged in the
eXplainable Artificial Intelligence (XAI) community for its well-motivated
evaluative principle: that the explanation function should be sensitive to
changes in the parameters of the model function. However, recent works have
identified several methodological caveats for the empirical interpretation of
MPRT. To address these caveats, we introduce two adaptations to the original
MPRT -- Smooth MPRT and Efficient MPRT, where the former minimises the impact
that noise has on the evaluation results through sampling and the latter
circumvents the need for biased similarity measurements by re-interpreting the
test through the explanation's rise in complexity, after full parameter
randomisation. Our experimental results demonstrate that these proposed
variants lead to improved metric reliability, thus enabling a more trustworthy
application of XAI methods.Comment: 19 pages, 12 figures, NeurIPS XAIA 202
NoiseGrad: enhancing explanations by introducing stochasticity to model weights
Many efforts have been made for revealing the decision-making process of
black-box learning machines such as deep neural networks, resulting in useful
local and global explanation methods. For local explanation, stochasticity is
known to help: a simple method, called SmoothGrad, has improved the visual
quality of gradient-based attribution by adding noise in the input space and
taking the average over the noise. In this paper, we extend this idea and
propose NoiseGrad that enhances both local and global explanation methods.
Specifically, NoiseGrad introduces stochasticity in the weight parameter space,
such that the decision boundary is perturbed. NoiseGrad is expected to enhance
the local explanation, similarly to SmoothGrad, due to the dual relationship
between the input perturbation and the decision boundary perturbation.
Furthermore, NoiseGrad can be used to enhance global explanations. We evaluate
NoiseGrad and its fusion with SmoothGrad -- FusionGrad -- qualitatively and
quantitatively with several evaluation criteria, and show that our novel
approach significantly outperforms the baseline methods. Both NoiseGrad and
FusionGrad are method-agnostic and as handy as SmoothGrad using simple
heuristics for the choice of hyperparameter setting without the need of
fine-tuning.Comment: 19 pages, 16 figure
Visualizing the Diversity of Representations Learned by Bayesian Neural Networks
Explainable Artificial Intelligence (XAI) aims to make learning machines less
opaque, and offers researchers and practitioners various tools to reveal the
decision-making strategies of neural networks. In this work, we investigate how
XAI methods can be used for exploring and visualizing the diversity of feature
representations learned by Bayesian Neural Networks (BNNs). Our goal is to
provide a global understanding of BNNs by making their decision-making
strategies a) visible and tangible through feature visualizations and b)
quantitatively measurable with a distance measure learned by contrastive
learning. Our work provides new insights into the \emph{posterior} distribution
in terms of human-understandable feature information with regard to the
underlying decision making strategies. The main findings of our work are the
following: 1) global XAI methods can be applied to explain the diversity of
decision-making strategies of BNN instances, 2) Monte Carlo dropout with
commonly used Dropout rates exhibit increased diversity in feature
representations compared to the multimodal posterior approximation of
MultiSWAG, 3) the diversity of learned feature representations highly
correlates with the uncertainty estimate for the output and 4) the inter-mode
diversity of the multimodal posterior decreases as the network width increases,
while the intra mode diversity increases. These findings are consistent with
the recent Deep Neural Networks theory, providing additional intuitions about
what the theory implies in terms of humanly understandable concepts.Comment: 16 pages, 18 figure
Flying Adversarial Patches: Manipulating the Behavior of Deep Learning-based Autonomous Multirotors
Autonomous flying robots, e.g. multirotors, often rely on a neural network
that makes predictions based on a camera image. These deep learning (DL) models
can compute surprising results if applied to input images outside the training
domain. Adversarial attacks exploit this fault, for example, by computing small
images, so-called adversarial patches, that can be placed in the environment to
manipulate the neural network's prediction. We introduce flying adversarial
patches, where an image is mounted on another flying robot and therefore can be
placed anywhere in the field of view of a victim multirotor. For an effective
attack, we compare three methods that simultaneously optimize the adversarial
patch and its position in the input image. We perform an empirical validation
on a publicly available DL model and dataset for autonomous multirotors.
Ultimately, our attacking multirotor would be able to gain full control over
the motions of the victim multirotor.Comment: 6 pages, 5 figures, Workshop on Multi-Robot Learning, International
Conference on Robotics and Automation (ICRA
Kidnapping Deep Learning-based Multirotors using Optimized Flying Adversarial Patches
Autonomous flying robots, such as multirotors, often rely on deep learning
models that make predictions based on a camera image, e.g. for pose estimation.
These models can predict surprising results if applied to input images outside
the training domain. This fault can be exploited by adversarial attacks, for
example, by computing small images, so-called adversarial patches, that can be
placed in the environment to manipulate the neural network's prediction. We
introduce flying adversarial patches, where multiple images are mounted on at
least one other flying robot and therefore can be placed anywhere in the field
of view of a victim multirotor. By introducing the attacker robots, the system
is extended to an adversarial multi-robot system. For an effective attack, we
compare three methods that simultaneously optimize multiple adversarial patches
and their position in the input image. We show that our methods scale well with
the number of adversarial patches. Moreover, we demonstrate physical flights
with two robots, where we employ a novel attack policy that uses the computed
adversarial patches to kidnap a robot that was supposed to follow a human.Comment: Accepted at MRS 2023, 7 pages, 5 figures. arXiv admin note:
substantial text overlap with arXiv:2305.1285
Labeling Neural Representations with Inverse Recognition
Deep Neural Networks (DNNs) demonstrate remarkable capabilities in learning
complex hierarchical data representations, but the nature of these
representations remains largely unknown. Existing global explainability
methods, such as Network Dissection, face limitations such as reliance on
segmentation masks, lack of statistical significance testing, and high
computational demands. We propose Inverse Recognition (INVERT), a scalable
approach for connecting learned representations with human-understandable
concepts by leveraging their capacity to discriminate between these concepts.
In contrast to prior work, INVERT is capable of handling diverse types of
neurons, exhibits less computational complexity, and does not rely on the
availability of segmentation masks. Moreover, INVERT provides an interpretable
metric assessing the alignment between the representation and its corresponding
explanation and delivering a measure of statistical significance. We
demonstrate the applicability of INVERT in various scenarios, including the
identification of representations affected by spurious correlations, and the
interpretation of the hierarchical structure of decision-making within the
models.Comment: 25 pages, 16 figure
The Meta-Evaluation Problem in Explainable AI: Identifying Reliable Estimators with MetaQuantus
Explainable AI (XAI) is a rapidly evolving field that aims to improve
transparency and trustworthiness of AI systems to humans. One of the unsolved
challenges in XAI is estimating the performance of these explanation methods
for neural networks, which has resulted in numerous competing metrics with
little to no indication of which one is to be preferred. In this paper, to
identify the most reliable evaluation method in a given explainability context,
we propose MetaQuantus -- a simple yet powerful framework that meta-evaluates
two complementary performance characteristics of an evaluation method: its
resilience to noise and reactivity to randomness. We demonstrate the
effectiveness of our framework through a series of experiments, targeting
various open questions in XAI, such as the selection of explanation methods and
optimisation of hyperparameters of a given metric. We release our work under an
open-source license to serve as a development tool for XAI researchers and
Machine Learning (ML) practitioners to verify and benchmark newly constructed
metrics (i.e., ``estimators'' of explanation quality). With this work, we
provide clear and theoretically-grounded guidance for building reliable
evaluation methods, thus facilitating standardisation and reproducibility in
the field of XAI.Comment: 30 pages, 12 figures, 3 table
Chronic Norovirus Infection after Kidney Transplantation: Molecular Evidence for Immune-Driven Viral Evolution
Background. Norovirus infection is the most common cause of acute self-limiting gastroenteritis. Only 3 cases of chronic norovirus infection in adult solid organ transplant recipients have been reported thus far. Methods. This case series describes 9 consecutive kidney allograft recipients with chronic norovirus infection with persistent virus shedding and intermittent diarrhea for a duration of 97-898 days. The follow-up includes clinical course, type of immunosuppression, and polymerase chain reaction for norovirus. Detailed molecular analyses of virus isolates from stool specimens over time were performed. Results. The intensity of immunosuppression correlated with the diarrheal symptoms but not with viral shedding. Molecular analysis of virus strains from each patient revealed infection with different variants of GII.4 strains in 7 of 9 patients. Another 2 patients were infected with either the GII.7 or GII.17 strain. No molecular evidence for nosocomial transmission in our outpatient clinic was found. Capsid sequence alignments from follow-up specimens of 4 patients showed accumulation of mutations over time, resulting in amino acid changes predominantly in the P2 and P1-2 region. Up to 25 amino acids mutations were accumulated over a 683-day period in the patient with an 898-day shedding history. Conclusion. Norovirus infection may persist in adult renal allograft recipients with or without clinical symptoms. No evidence for nosocomial transmission in adult renal allograft recipients was found in our study. Molecular analysis suggests continuous viral evolution in immunocompromised patients who are unable to clear this infectio
Results of the intensified early detection program for breast cancer in high risk patients
Brustkrebs ist die häufigste Krebserkrankung bei Frauen weltweit. 5 % dieser
Erkrankungen sind durch genetische Prädisposition zu erklären. Die größte
Chance auf Heilung birgt die möglichst frühe Erkennung einer Krebserkrankung.
Ziel vorliegender Arbeit war es, anhand der Daten von Hochrisikopatientinnen
aus dem Brustzentrum der Charité Berlin die Erfolge oder Misserfolge des
intensivierten Früh¬erkennungs¬programms in den Jahren zwischen 1997 und 2007
nachzuvollziehen. Im Ergebnis soll die Frage nach der Effizienz
(Karzinomraten, Sensitivität und ppv-Wert der einzelnen radiologischen
Untersuchung) der aufwendigen Früherkennungsscreenings für diese spezielle
Gruppe von Frauen beantwortet werden. Bei der Studie handelte es sich um eine
retrospektive Studie. Das Patientenkollektiv be¬stand aus
Hochrisikopatientinnen, die an der Charité im Rahmen des intensivierten
Früh¬erkennungsprogramms zwischen 1997 und 2007 radiologisch untersucht
wurden. Die Untersuchung bestand aus einem halbjährlichen Abtasten der
Brustdrüse durch einen Arzt, einer halbjährlichen Ultraschalluntersuchung und
einer jährlichen Mammo¬graphie und MRT. Bei Verdacht auf ein Karzinom wurde
eine histologische Sicherung mittels Stanzbiopsie oder, in seltenen Fällen,
durch eine offene Operation durch¬geführt. Erhärtete sich der Verdacht auf
eine Erkrankung, erfolgte eine offene Operation und even¬tuell eine
Mastektomie zur Karzinomresektion. Es wurden Frauen im Alter zwi¬schen 21 und
66 Jahren (Durchschnittsalter: 41,5 Jahre) mit nachgewiesener BRCA1- und/oder
BRCA2-Genmutation oder einem Heterozygoten¬risiko für eine derartige Mu¬tation
von über 20 % oder einem Risiko von über 30 %, im Laufe des Lebens an
Brust¬krebs zu erkranken, in die Studie eingeschlossen. Von insgesamt 264
Patientinnen entsprachen 132 den strengen Kriterien und wurden in vorliegende
Studie aufgenommen. 45 von diesen 132 Patientinnen hatten eine histo¬logische
Sicherung erhalten; bei zwölf wurde ein Mammakarzinom nachgewiesen. Alle
präoperativen radiologischen Untersuchungen der Brust wurden ausgewertet. Es
ergaben sich Sensitivitäts und ppv-Werte bei einer MRT von 100 % bzw. 33,3 %,
bei einer Mammographie von 66,7 % bzw. 38,1 %, bei einer
Ultraschalluntersuchung von 91,7 % bzw. 55 % sowie bei der klinischen
Untersuchung von 50 % bzw. 50 %. Die intensivierte Früherkennung an der
Charité ist somit eine sinnvolle Alternative zur beidseitigen prophylaktischen
Mastektomie. Die hohen Sensitivitätswerte und somit die hohe Effizienz der
radiologischen Untersuchungen rechtfertigen den hohen Aufwand und die
anfallenden Kosten.Breast cancer is the most common cancer in women worldwide. About 5% of breast
cancer cases are due to genetic disposition. Early detection offers the best
chance of successful treatment and recovery. The aim of this study was to
understand the successes or failures of the intensified early breast cancer
detection program based on data from high risk patients at the Charité Berlin
from 1997 to 2007. The intention was to answer the question of how efficient
(carcinoma rates, sensitivity and ppv values of the individual radiological
examinations) is the time-consuming and cost-prohibitive early detection
screening for this specific group of women. The study was a retrospective
study. The patient cohort consisted of high risk patients who had been
radiologically examined as part of the intensified early detection program at
the Charité between 1997 and 2007. The examination program consisted of a six-
monthly palpation performed by a doctor, a six-monthly ultrasound scan and an
annual mammography and MRT. In the event of a suspected carcinoma, a punch
biopsy or, in rare cases, open surgery was performed for further histological
analysis. If the suspicion of cancer was substantiated, open surgery was
performed and in some cases mastectomy to remove the carcinoma. Women between
the ages of 21 and 66 years (average age: 41.5 years) with BRCA1 and/or BRCA2
gene mutations or a heterozygote risk for a mutation of this type greater than
20 % or a risk of developing breast cancer in the course of their lives of
greater than 30 %, were included in the study. From a total of 264 patients,
132 met the stringent criteria and were enrolled in the study. 45 of the 132
patients had undergone histological examination, whereby twelve had received
confirmation of breast cancer. All the preoperative radiological examinations
of the breasts were evaluated. The sensitivity and ppv values of an MRT were
100 % and 33.3 % respectively, for mammography 66.7 % and 38.1 % respectively,
for the ultrasound scan 91.7 % and 55 % respectively and for the clinical
examination 50 % and 50 % respectively. The Charité intensified early
detection program is thus an appropriate alternative to bilateral prophylactic
mastectomy. The high sensitivity values and consequent high efficiency of the
radiological examinations justify the work and high costs involved
- …