Search CORE

27 research outputs found

Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions

Author: Papadopoulos Georgios Th.
Radoglou-Grammatikis Panagiotis
Rodis Nikolaos
Sardianos Christos
Sarigiannidis Panagiotis
Varlamis Iraklis
Publication venue
Publication date: 09/06/2023
Field of study

The current study focuses on systematically analyzing the recent advances in the field of Multimodal eXplainable Artificial Intelligence (MXAI). In particular, the relevant primary prediction tasks and publicly available datasets are initially described. Subsequently, a structured presentation of the MXAI methods of the literature is provided, taking into account the following criteria: a) The number of the involved modalities, b) The stage at which explanations are produced, and c) The type of the adopted methodology (i.e. mathematical formalism). Then, the metrics used for MXAI evaluation are discussed. Finally, a comprehensive analysis of current challenges and future research directions is provided.Comment: 26 pages, 11 figure

arXiv.org e-Print Archive

Visible and Invisible: Causal Variable Learning and its Application in a Cancer Study

Author: Kölzer Viktor
Wu Jiqing
Zlobec Inti
Publication venue: OpenReview
Publication date: 24/05/2021
Field of study

Causal visual discovery is a fundamental yet challenging problem in many research fields. Given visual data and the outcome of interest, the goal is to infer the cause-effect relation. Aside from rich visual ('visible') variables, oftentimes, the outcome is also determined by 'invisible' variables, i.e. the variables from non-visual modalities that do not have visual counterparts. This combination is particularly common in the clinical domain. Built upon the promising invariant causal prediction (ICP) framework, we propose a novel -ICP algorithm to resolve the (visible, invisible) setting. To efficiently discover -plausible causal variables and to estimate the cause-effect relation, the -ICP is learned under a min-min optimisation scheme. Driven by the need for clinical reliability and interpretability, the -ICP is implemented with a typed neural-symbolic functional language. With the built-in program synthesis method, we can synthesize a type-safe program that is comprehensible to the clinical experts. For concept validation of the -ICP, we carefully design a series of synthetic experiments on the type of visual-perception tasks that are encountered in daily life. To further substantiate the proposed method, we demonstrate the application of -ICP on a real-world cancer study dataset, Swiss CRC. This population-based cancer study has spanned over two decades, including 25 fully annotated tissue micro-array (TMA) images with at least resolution and a broad spectrum of clinical meta data for 533 patients. Both the synthetic and clinical experiments demonstrate the advantages of -ICP over the state-of-the-art methods. Finally, we discuss the limitations and challenges to be addressed in the future. Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethic

ZORA

Towards Interaction-level Video Action Understanding

Author: BAI YANG
Publication venue
Publication date: 01/01/2023
Field of study

A huge amount of videos have been created, spread, and viewed daily. Among these massive videos, the actions and activities of humans account for a large part. We desire machines to understand human actions in videos as this is essential to various applications, including but not limited to autonomous driving cars, security systems, human-robot interactions and healthcare. Towards real intelligent system that is able to interact with humans, video understanding must go beyond simply answering ``what is the action in the video", but be more aware of what those actions mean to humans and be more in line with human thinking, which we call interactive-level action understanding. This thesis identifies three main challenges to approaching interactive-level video action understanding: 1) understanding actions given human consensus; 2) understanding actions based on specific human rules; 3) directly understanding actions in videos via human natural language. For the first challenge, we select video summary as a representative task that aims to select informative frames to retain high-level information based on human annotators' experience. Through self-attention architecture and meta-learning, which jointly process dual representations of visual and sequential information for video summarization, the proposed model is capable of understanding video from human consensus (e.g., how humans think which parts of an action sequence are essential). For the second challenge, our works on action quality assessment utilize transformer decoders to parse the input action into several sub-actions and assess the more fine-grained qualities of the given action, yielding the capability of action understanding given specific human rules. (e.g., how well a diving action performs, how well a robot performs surgery) The third key idea explored in this thesis is to use graph neural networks in an adversarial fashion to understand actions through natural language. We demonstrate the utility of this technique for the video captioning task, which takes an action video as input, outputs natural language, and yields state-of-the-art performance. It can be concluded that the research directions and methods introduced in this thesis provide fundamental components toward interactive-level action understanding

Durham e-Theses

Engineering Background Knowledge for Social Robots

Author: Asprino Luigi <1988>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 03/04/2019
Field of study

Social robots are embodied agents that continuously perform knowledge-intensive tasks involving several kinds of information coming from different heterogeneous sources. Providing a framework for engineering robots' knowledge raises several problems like identifying sources of information and modeling solutions suitable for robots' activities, integrating knowledge coming from different sources, evolving this knowledge with information learned during robots' activities, grounding perceptions on robots' knowledge, assessing robots' knowledge with respect humans' one and so on. In this thesis we investigated feasibility and benefits of engineering background knowledge of Social Robots with a framework based on Semantic Web technologies and Linked Data. This research has been supported and guided by a case study that provided a proof of concept through a prototype tested in a real socially assistive context

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

AMS Tesi di Dottorato

PSA 2016

Author: -Preprint Volume- Philsci-Archive
Publication venue
Publication date: 29/10/2016
Field of study

These preprints were automatically compiled into a PDF from the collection of papers deposited in PhilSci-Archive in conjunction with the PSA 2016