818 research outputs found
Middle-Level Features for the Explanation of Classification Systems by Sparse Dictionary Methods.
Machine learning (ML) systems are affected by a pervasive lack of transparency. The eXplainable Artificial Intelligence (XAI) research area addresses this problem and the related issue of explaining the behavior of ML systems in terms that are understandable to human beings. In many explanation of XAI approaches, the output of ML systems are explained in terms of low-level features of their inputs. However, these approaches leave a substantive explanatory burden with human users, insofar as the latter are required to map low-level properties into more salient and readily understandable parts of the input. To alleviate this cognitive burden, an alternative model-agnostic framework is proposed here. This framework is instantiated to address explanation problems in the context of ML image classification systems, without relying on pixel relevance maps and other low-level features of the input. More specifically, one obtains sets of middle-level properties of classification inputs that are perceptually salient by applying sparse dictionary learning techniques. These middle-level properties are used as building blocks for explanations of image classifications. The achieved explanations are parsimonious, for their reliance on a limited set of middle-level image properties. And they can be contrastive, because the set of middle-level image properties can be used to explain why the system advanced the proposed classification over other antagonist classifications. In view of its model-agnostic character, the proposed framework is adaptable to a variety of other ML systems and explanation problems
Watch, read and lookup: learning to spot signs from multiple supervisors
The focus of this work is sign spotting - given a video of an isolated sign,
our task is to identify whether and where it has been signed in a continuous,
co-articulated sign language video. To achieve this sign spotting task, we
train a model using multiple types of available supervision by: (1) watching
existing sparsely labelled footage; (2) reading associated subtitles (readily
available translations of the signed content) which provide additional
weak-supervision; (3) looking up words (for which no co-articulated labelled
examples are available) in visual sign language dictionaries to enable novel
sign spotting. These three tasks are integrated into a unified learning
framework using the principles of Noise Contrastive Estimation and Multiple
Instance Learning. We validate the effectiveness of our approach on low-shot
sign spotting benchmarks. In addition, we contribute a machine-readable British
Sign Language (BSL) dictionary dataset of isolated signs, BSLDict, to
facilitate study of this task. The dataset, models and code are available at
our project page.Comment: Appears in: Asian Conference on Computer Vision 2020 (ACCV 2020) -
Oral presentation. 29 page
Toward the application of XAI methods in EEG-based systems
An interesting case of the well-known Dataset Shift Problem is the
classification of Electroencephalogram (EEG) signals in the context of
Brain-Computer Interface (BCI). The non-stationarity of EEG signals can lead to
poor generalisation performance in BCI classification systems used in different
sessions, also from the same subject. In this paper, we start from the
hypothesis that the Dataset Shift problem can be alleviated by exploiting
suitable eXplainable Artificial Intelligence (XAI) methods to locate and
transform the relevant characteristics of the input for the goal of
classification. In particular, we focus on an experimental analysis of
explanations produced by several XAI methods on an ML system trained on a
typical EEG dataset for emotion recognition. Results show that many relevant
components found by XAI methods are shared across the sessions and can be used
to build a system able to generalise better. However, relevant components of
the input signal also appear to be highly dependent on the input itself.Comment: Accepted to be presented at XAI.it 2022 - Italian Workshop on
Explainable Artificial Intelligenc
Toward the application of XAI methods in EEG-based systems
An interesting case of the well-known Dataset Shift Problem is the classification of Electroencephalogram (EEG) signals in the context of Brain-Computer Interface (BCI). The non-stationarity of EEG signals can lead to poor generalisation performance in BCI classification systems used in different sessions, also from the same subject. In this paper, we start from the hypothesis that the Dataset Shift problem can be alleviated by exploiting suitable eXplainable Artificial Intelligence (XAI) methods to locate and transform the relevant characteristics of the input for the goal of classification. In particular, we focus on an experimental analysis of explanations produced by several XAI methods on an ML system trained on a typical EEG dataset for emotion recognition. Results show that many relevant components found by XAI methods are shared across the sessions and can be used to build a system able to generalise better. However, relevant components of the input signal also appear to be highly dependent on the input itself
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
CLIP embeddings have demonstrated remarkable performance across a wide range
of computer vision tasks. However, these high-dimensional, dense vector
representations are not easily interpretable, restricting their usefulness in
downstream applications that require transparency. In this work, we empirically
show that CLIP's latent space is highly structured, and consequently that CLIP
representations can be decomposed into their underlying semantic components. We
leverage this understanding to propose a novel method, Sparse Linear Concept
Embeddings (SpLiCE), for transforming CLIP representations into sparse linear
combinations of human-interpretable concepts. Distinct from previous work,
SpLiCE does not require concept labels and can be applied post hoc. Through
extensive experimentation with multiple real-world datasets, we validate that
the representations output by SpLiCE can explain and even replace traditional
dense CLIP representations, maintaining equivalent downstream performance while
significantly improving their interpretability. We also demonstrate several use
cases of SpLiCE representations including detecting spurious correlations,
model editing, and quantifying semantic shifts in datasets.Comment: 17 pages, 8 figures, Code is provided at
https://github.com/AI4LIFE-GROUP/SpLiC
A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence
A number of algorithms in the field of artificial intelligence offer poorly interpretable decisions. To disclose the reasoning behind such algorithms, their output can be explained by means of socalled evidence-based (or factual) explanations. Alternatively, contrastive and counterfactual explanations justify why the output of the algorithms is not any different and how it could be changed, respectively. It is of crucial importance to bridge the gap between theoretical approaches to contrastive and counterfactual explanation and the corresponding computational frameworks. In this work we conduct a systematic literature review which provides readers with a thorough and reproducible analysis of the interdisciplinary research field under study. We first examine theoretical foundations of contrastive and counterfactual accounts of explanation. Then, we report the state-of-the-art computational frameworks for contrastive and counterfactual explanation generation. In addition, we analyze how grounded such frameworks are on the insights from the inspected theoretical approaches. As a result, we highlight a variety of properties of the approaches under study and reveal a number of shortcomings thereof. Moreover, we define a taxonomy regarding both theoretical and practical approaches to contrastive and counterfactual explanation.S
VCD: Visual Causality Discovery for Cross-Modal Question Reasoning
Existing visual question reasoning methods usually fail to explicitly
discover the inherent causal mechanism and ignore jointly modeling cross-modal
event temporality and causality. In this paper, we propose a visual question
reasoning framework named Cross-Modal Question Reasoning (CMQR), to discover
temporal causal structure and mitigate visual spurious correlation by causal
intervention. To explicitly discover visual causal structure, the Visual
Causality Discovery (VCD) architecture is proposed to find question-critical
scene temporally and disentangle the visual spurious correlations by
attention-based front-door causal intervention module named Local-Global Causal
Attention Module (LGCAM). To align the fine-grained interactions between
linguistic semantics and spatial-temporal representations, we build an
Interactive Visual-Linguistic Transformer (IVLT) that builds the multi-modal
co-occurrence interactions between visual and linguistic content. Extensive
experiments on four datasets demonstrate the superiority of CMQR for
discovering visual causal structures and achieving robust question reasoning.Comment: 12 pages, 6 figures. arXiv admin note: substantial text overlap with
arXiv:2207.1264
Event-driven Real-time Retrieval in Web Search
Information retrieval in real-time search presents unique challenges distinct
from those encountered in classical web search. These challenges are
particularly pronounced due to the rapid change of user search intent, which is
influenced by the occurrence and evolution of breaking news events, such as
earthquakes, elections, and wars. Previous dense retrieval methods, which
primarily focused on static semantic representation, lack the capacity to
capture immediate search intent, leading to inferior performance in retrieving
the most recent event-related documents in time-sensitive scenarios. To address
this issue, this paper expands the query with event information that represents
real-time search intent. The Event information is then integrated with the
query through a cross-attention mechanism, resulting in a time-context query
representation. We further enhance the model's capacity for event
representation through multi-task training. Since publicly available datasets
such as MS-MARCO do not contain any event information on the query side and
have few time-sensitive queries, we design an automatic data collection and
annotation pipeline to address this issue, which includes ModelZoo-based Coarse
Annotation and LLM-driven Fine Annotation processes. In addition, we share the
training tricks such as two-stage training and hard negative sampling. Finally,
we conduct a set of offline experiments on a million-scale production dataset
to evaluate our approach and deploy an A/B testing in a real online system to
verify the performance. Extensive experimental results demonstrate that our
proposed approach significantly outperforms existing state-of-the-art baseline
methods
Deep hashing with self-supervised asymmetric semantic excavation and margin-scalable constraint
Due to its effectivity and efficiency, deep hashing approaches are widely used for large-scale visual search. However, it is still challenging to produce compact and discriminative hash codes for images asso-ciated with multiple semantics for two main reasons, 1) similarity constraints designed in most of the existing methods are based upon an oversimplified similarity assignment (i.e., 0 for instance pairs sharing no label, 1 for instance pairs sharing at least 1 label), 2) the exploration in multi-semantic relevance are insufficient or even neglected in many of the existing methods. These problems significantly limit the dis-crimination of generated hash codes. In this paper, we propose a novel Deep Hashing with Self-Supervised Asymmetric Semantic Excavation and Margin-Scalable Constraint(SADH) approach to cope with these problems. SADH implements a self-supervised network to sufficiently preserve semantic information in a semantic feature dictionary and a semantic code dictionary for the semantics of the given dataset, which efficiently and precisely guides a feature learning network to preserve multi-label semantic information using an asymmetric learning strategy. By further exploiting semantic dictionaries, a new margin-scalable constraint is employed for both precise similarity searching and robust hash code generation. Extensive empirical research on four popular benchmarks validates the proposed method and shows it outperforms several state-of-the-art approaches. The source codes URL of our SADH is: http:// github.com/SWU-CS-MediaLab/SADH. (c) 2022 Elsevier B.V. All rights reserved.Computer Systems, Imagery and Medi
- …