27 research outputs found
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions
The current study focuses on systematically analyzing the recent advances in
the field of Multimodal eXplainable Artificial Intelligence (MXAI). In
particular, the relevant primary prediction tasks and publicly available
datasets are initially described. Subsequently, a structured presentation of
the MXAI methods of the literature is provided, taking into account the
following criteria: a) The number of the involved modalities, b) The stage at
which explanations are produced, and c) The type of the adopted methodology
(i.e. mathematical formalism). Then, the metrics used for MXAI evaluation are
discussed. Finally, a comprehensive analysis of current challenges and future
research directions is provided.Comment: 26 pages, 11 figure
Visible and Invisible: Causal Variable Learning and its Application in a Cancer Study
Causal visual discovery is a fundamental yet challenging problem in many research fields. Given visual data and the outcome of interest, the goal is to infer the cause-effect relation. Aside from rich visual ('visible') variables, oftentimes, the outcome is also determined by 'invisible' variables, i.e. the variables from non-visual modalities that do not have visual counterparts. This combination is particularly common in the clinical domain.
Built upon the promising invariant causal prediction (ICP) framework, we propose a novel -ICP algorithm to resolve the (visible, invisible) setting. To efficiently discover -plausible causal variables and to estimate the cause-effect relation, the -ICP is learned under a min-min optimisation scheme. Driven by the need for clinical reliability and interpretability, the -ICP is implemented with a typed neural-symbolic functional language. With the built-in program synthesis method, we can synthesize a type-safe program that is comprehensible to the clinical experts.
For concept validation of the -ICP, we carefully design a series of synthetic experiments on the type of visual-perception tasks that are encountered in daily life. To further substantiate the proposed method, we demonstrate the application of -ICP on a real-world cancer study dataset, Swiss CRC. This population-based cancer study has spanned over two decades, including 25 fully annotated tissue micro-array (TMA) images with at least resolution and a broad spectrum of clinical meta data for 533 patients. Both the synthetic and clinical experiments demonstrate the advantages of
-ICP over the state-of-the-art methods. Finally, we discuss the limitations and challenges to be addressed in the future.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethic
Towards Interaction-level Video Action Understanding
A huge amount of videos have been created, spread, and viewed daily. Among these massive videos, the actions and activities of humans account for a large part. We desire machines to understand human actions in videos as this is essential to various applications, including but not limited to autonomous driving cars, security systems, human-robot interactions and healthcare. Towards real intelligent system that is able to interact with humans, video understanding must go beyond simply answering ``what is the action in the video", but be more aware of what those actions mean to humans and be more in line with human thinking, which we call interactive-level action understanding. This thesis identifies three main challenges to approaching interactive-level video action understanding: 1) understanding actions given human consensus; 2) understanding actions based on specific human rules; 3) directly understanding actions in videos via human natural language. For the first challenge, we select video summary as a representative task that aims to select informative frames to retain high-level information based on human annotators' experience. Through self-attention architecture and meta-learning, which jointly process dual representations of visual and sequential information for video summarization, the proposed model is capable of understanding video from human consensus (e.g., how humans think which parts of an action sequence are essential). For the second challenge, our works on action quality assessment utilize transformer decoders to parse the input action into several sub-actions and assess the more fine-grained qualities of the given action, yielding the capability of action understanding given specific human rules. (e.g., how well a diving action performs, how well a robot performs surgery) The third key idea explored in this thesis is to use graph neural networks in an adversarial fashion to understand actions through natural language. We demonstrate the utility of this technique for the video captioning task, which takes an action video as input, outputs natural language, and yields state-of-the-art performance. It can be concluded that the research directions and methods introduced in this thesis provide fundamental components toward interactive-level action understanding
Engineering Background Knowledge for Social Robots
Social robots are embodied agents that continuously perform knowledge-intensive tasks involving several kinds of information coming from different heterogeneous sources. Providing a framework for engineering robots' knowledge raises several problems like identifying sources of information and modeling solutions suitable for robots' activities, integrating knowledge coming from different sources, evolving this knowledge with information learned during robots' activities, grounding perceptions on robots' knowledge, assessing robots' knowledge with respect humans' one and so on. In this thesis we investigated feasibility and benefits of engineering background knowledge of Social Robots with a framework based on Semantic Web technologies and Linked Data. This research has been supported and guided by a case study that provided a proof of concept through a prototype tested in a real socially assistive context
PSA 2016
These preprints were automatically compiled into a PDF from the collection of papers deposited in PhilSci-Archive in conjunction with the PSA 2016