Search CORE

46 research outputs found

Visual-Semantic Learning

Author: Yin Chengxiang
Publication venue: SURFACE at Syracuse University
Publication date: 14/05/2023
Field of study

Visual-semantic learning is an attractive and challenging research direction aiming to understand complex semantics of heterogeneous data from two domains, i.e., visual signals (i.e., images and videos) and natural language (i.e., captions and questions). It requires memorizing the rich information in a single modality and a joint comprehension of multiple modalities. Artificial intelligence (AI) systems with human-level intelligence are claimed to learn like humans, such as efficiently leveraging brain memory for better comprehension, rationally incorporating common-sense knowledge into reasoning, quickly gaining in-depth understanding given a few samples, and analyzing relationships among abundant and informative events. However, these intelligence capacities are effortless for humans but challenging for machines. To bridge the discrepancy between human-level intelligence and present-day visual-semantic learning, we start from its basic understanding ability by studying the visual question answering (e.g., Image-QA and Video-QA) tasks from the perspectives of memory augmentation and common-sense knowledge incorporation. Furthermore, we stretch it to a more challenging situation with limited and partially unlabeled training data (i.e., Few-shot Visual-Semantic Learning) to imitate the fast learning ability of humans. Finally, to further enhance visual-semantic performance in natural videos with numerous spatio-temporal dynamics, we investigate exploiting event-correlated information for a comprehensive understanding of cross-modal semantics. To study the essential visual-semantic understanding ability of the human brain with memory, we first propose a novel Memory Augmented Deep Recurrent Neural Network (i.e., MA-DRNN) model for Video-QA, which features a new method for encoding videos and questions, and memory augmentation using the emerging Differentiable Neural Computer (i.e., DNC). Specifically, we encode semantic (i.e., questions) information before visual (i.e., videos) information, which leads to better visual-semantic representations. Moreover, we leverage Differentiable Neural Computer (with external memory) to store and retrieve valuable information in questions and videos and model the long-term visual-semantic dependency. In addition to basic understanding, to tackle visual-semantic reasoning that requires external knowledge beyond visible contents (e.g., KB-Image-QA), we propose a novel framework that endows the model with capabilities of answering more general questions and achieves better exploitation of external knowledge through generating Multiple Clues for Reasoning with Memory Neural Networks (i.e., MCR-MemNN). Specifically, a well-defined detector is adopted to predict image-question-related relation phrases, each delivering two complementary clues to retrieve the supporting facts from an external knowledge base (i.e., KB). These facts are encoded into a continuous embedding space using a content-addressable memory. Afterward, mutual interactions between visual-semantic representation and the supporting facts stored in memory are captured to distill the most relevant information in three modalities (i.e., image, question, and KB). Finally, the optimal answer is predicted by choosing the supporting fact with the highest score. Furthermore, to enable a fast, in-depth understanding given a small number of samples, especially with heterogeneity in the multi-modal scenarios such as image question answering (i.e., Image-QA) and image captioning (i.e., IC), we study the few-shot visual-semantic learning and present the Hierarchical Graph ATtention Network (i.e., HGAT). This two-stage network models the intra- and inter-modal relationships with limited image-text samples. The main contributions of HGAT can be summarized as follows: 1) it sheds light on tackling few-shot multi-modal learning problems, which focuses primarily, but not exclusively, on visual and semantic modalities, through better exploitation of the intra-relationship of each modality and an attention-based co-learning framework between modalities using a hierarchical graph-based architecture; 2) it achieves superior performance on both visual question answering and image captioning in the few-shot setting; 3) it can be easily extended to the semi-supervised setting where image-text samples are partially unlabeled. Although various attention mechanisms have been utilized to manage contextualized representations by modeling intra- and inter-modal relationships of the two modalities, one limitation of the predominant visual-semantic methods is the lack of reasoning with event correlation, sensing, and analyzing relationships among abundant and informative events contained in the video. To this end, we introduce the dense caption modality as a new auxiliary and distill event-correlated information to infer the correct answer. We propose a novel end-to-end trainable model, Event-Correlated Graph Neural Networks (EC-GNNs), to perform cross-modal reasoning over information from the three modalities (i.e., caption, video, and question). Besides exploiting a new modality, we employ cross-modal reasoning modules to explicitly model inter-modal relationships and aggregate relevant information across different modalities. We propose a question-guided self-adaptive multi-modal fusion module to collect the question-oriented and event-correlated evidence through multi-step reasoning. To evaluate our proposed models, we conduct extensive experiments on VTW, MSVD-QA, and TGIF-QA datasets for Video-QA task, Toronto COCO-QA, Visual Genome-QA datasets for few-shot Image-QA task, COCO-FITB dataset for few-shot IC task, and FVQA, Visual7W + ConceptNet datasets for KB-Image-QA task. The experimental results justify these models’ effectiveness and superiority over baseline methods

Syracuse University Research Facility and Collaborative Environment

동영상 속 사람 동작의 물리 기반 재구성 및 분석

Author: 유리
Publication venue: 서울대학교 대학원
Publication date: 01/02/2021
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2021. 2. 이제희.In computer graphics, simulating and analyzing human movement have been interesting research topics started since the 1960s. Still, simulating realistic human movements in a 3D virtual world is a challenging task in computer graphics. In general, motion capture techniques have been used. Although the motion capture data guarantees realistic result and high-quality data, there is lots of equipment required to capture motion, and the process is complicated. Recently, 3D human pose estimation techniques from the 2D video are remarkably developed. Researchers in computer graphics and computer vision have attempted to reconstruct the various human motions from video data. However, existing methods can not robustly estimate dynamic actions and not work on videos filmed with a moving camera. In this thesis, we propose methods to reconstruct dynamic human motions from in-the-wild videos and to control the motions. First, we developed a framework to reconstruct motion from videos using prior physics knowledge. For dynamic motions such as backspin, the poses estimated by a state-of-the-art method are incomplete and include unreliable root trajectory or lack intermediate poses. We designed a reward function using poses and hints extracted from videos in the deep reinforcement learning controller and learned a policy to simultaneously reconstruct motion and control a virtual character. Second, we simulated figure skating movements in video. Skating sequences consist of fast and dynamic movements on ice, hindering the acquisition of motion data. Thus, we extracted 3D key poses from a video to then successfully replicate several figure skating movements using trajectory optimization and a deep reinforcement learning controller. Third, we devised an algorithm for gait analysis through video of patients with movement disorders. After acquiring the patients joint positions from 2D video processed by a deep learning network, the 3D absolute coordinates were estimated, and gait parameters such as gait velocity, cadence, and step length were calculated. Additionally, we analyzed the optimization criteria of human walking by using a 3D musculoskeletal humanoid model and physics-based simulation. For two criteria, namely, the minimization of muscle activation and joint torque, we compared simulation data with real human data for analysis. To demonstrate the effectiveness of the first two research topics, we verified the reconstruction of dynamic human motions from 2D videos using physics-based simulations. For the last two research topics, we evaluated our results with real human data.컴퓨터 그래픽스에서 인간의 움직임 시뮬레이션 및 분석은 1960 년대부터 다루어진 흥미로운 연구 주제이다. 몇 십년 동안 활발하게 연구되어 왔음에도 불구하고, 3차원 가상 공간 상에서 사실적인 인간의 움직임을 시뮬레이션하는 연구는 여전히 어렵고 도전적인 주제이다. 그동안 사람의 움직임 데이터를 얻기 위해서 모션 캡쳐 기술이 사용되어 왔다. 모션 캡처 데이터는 사실적인 결과와 고품질 데이터를 보장하지만 모션 캡쳐를 하기 위해서 필요한 장비들이 많고, 그 과정이 복잡하다. 최근에 2차원 영상으로부터 사람의 3차원 자세를 추정하는 연구들이 괄목할 만한 결과를 보여주고 있다. 이를 바탕으로 컴퓨터 그래픽스와 컴퓨터 비젼 분야의 연구자들은 비디오 데이터로부터 다양한 인간 동작을 재구성하려는 시도를 하고 있다. 그러나 기존의 방법들은 빠르고 다이나믹한 동작들은 안정적으로 추정하지 못하며 움직이는 카메라로 촬영한 비디오에 대해서는 작동하지 않는다. 본 논문에서는 비디오로부터 역동적인 인간 동작을 재구성하고 동작을 제어하는 방법을 제안한다. 먼저 사전 물리학 지식을 사용하여 비디오에서 모션을 재구성하는 프레임 워크를 제안한다. 공중제비와 같은 역동적인 동작들에 대해서 최신 연구 방법을 동원하여 추정된 자세들은 캐릭터의 궤적을 신뢰할 수 없거나 중간에 자세 추정에 실패하는 등 불완전하다. 우리는 심층강화학습 제어기에서 영상으로부터 추출한 포즈와 힌트를 활용하여 보상 함수를 설계하고 모션 재구성과 캐릭터 제어를 동시에 하는 정책을 학습하였다. 둘 째, 비디오에서 피겨 스케이팅 기술을 시뮬레이션한다. 피겨 스케이팅 기술들은 빙상에서 빠르고 역동적인 움직임으로 구성되어 있어 모션 데이터를 얻기가 까다롭다. 비디오에서 3차원 키 포즈를 추출하고 궤적 최적화 및 심층강화학습 제어기를 사용하여 여러 피겨 스케이팅 기술을 성공적으로 시연한다. 셋 째, 파킨슨 병이나 뇌성마비와 같은 질병으로 인하여 움직임 장애가 있는 환자의 보행을 분석하기 위한 알고리즘을 제안한다. 2차원 비디오로부터 딥러닝을 사용한 자세 추정기법을 사용하여 환자의 관절 위치를 얻어낸 다음, 3차원 절대 좌표를 얻어내어 이로부터 보폭, 보행 속도와 같은 보행 파라미터를 계산한다. 마지막으로, 근골격 인체 모델과 물리 시뮬레이션을 이용하여 인간 보행의 최적화 기준에 대해 탐구한다. 근육 활성도 최소화와 관절 돌림힘 최소화, 두 가지 기준에 대해 시뮬레이션한 후, 실제 사람 데이터와 비교하여 결과를 분석한다. 처음 두 개의 연구 주제의 효과를 입증하기 위해, 물리 시뮬레이션을 사용하여 이차원 비디오로부터 재구성한 여러 가지 역동적인 사람의 동작들을 재현한다. 나중 두 개의 연구 주제는 사람 데이터와의 비교 분석을 통하여 평가한다.1 Introduction 1 2 Background 9 2.1 Pose Estimation from 2D Video . . . . . . . . . . . . . . . . . . . . 9 2.2 Motion Reconstruction from Monocular Video . . . . . . . . . . . . 10 2.3 Physics-Based Character Simulation and Control . . . . . . . . . . . 12 2.4 Motion Reconstruction Leveraging Physics . . . . . . . . . . . . . . 13 2.5 Human Motion Control . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5.1 Figure Skating Simulation . . . . . . . . . . . . . . . . . . . 16 2.6 Objective Gait Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.7 Optimization for Human Movement Simulation . . . . . . . . . . . . 17 2.7.1 Stability Criteria . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Human Dynamics from Monocular Video with Dynamic Camera Movements 19 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Pose and Contact Estimation . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Learning Human Dynamics . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.1 Policy Learning . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4.2 Network Training . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4.3 Scene Estimator . . . . . . . . . . . . . . . . . . . . . . . . 29 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5.1 Video Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5.2 Comparison of Contact Estimators . . . . . . . . . . . . . . . 33 3.5.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.5.4 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4 Figure Skating Simulation from Video 42 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 Skating Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3.1 Non-holonomic Constraints . . . . . . . . . . . . . . . . . . 46 4.3.2 Relaxation of Non-holonomic Constraints . . . . . . . . . . . 47 4.4 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.5 Trajectory Optimization and Control . . . . . . . . . . . . . . . . . . 50 4.5.1 Trajectory Optimization . . . . . . . . . . . . . . . . . . . . 50 4.5.2 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5 Gait Analysis Using Pose Estimation Algorithm with 2D-video of Patients 61 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2.1 Patients and video recording . . . . . . . . . . . . . . . . . . 63 5.2.2 Standard protocol approvals, registrations, and patient consents 66 5.2.3 3D Pose estimation from 2D video . . . . . . . . . . . . . . . 66 5.2.4 Gait parameter estimation . . . . . . . . . . . . . . . . . . . 67 5.2.5 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . 68 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.3.1 Validation of video-based analysis of the gait . . . . . . . . . 68 5.3.2 gait analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.4.1 Validation with the conventional sensor-based method . . . . 75 5.4.2 Analysis of gait and turning in TUG . . . . . . . . . . . . . . 75 5.4.3 Correlation with clinical parameters . . . . . . . . . . . . . . 76 5.4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.5 Supplementary Material . . . . . . . . . . . . . . . . . . . . . . . . . 77 6 Control Optimization of Human Walking 80 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.2.1 Musculoskeletal model . . . . . . . . . . . . . . . . . . . . . 82 6.2.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.2.3 Control co-activation level . . . . . . . . . . . . . . . . . . . 83 6.2.4 Push-recovery experiment . . . . . . . . . . . . . . . . . . . 84 6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7 Conclusion 90 7.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Docto

SNU Open Repository and Archive

On Action Quality Assessment

Author: Parmar Paritosh
Publication venue: Digital Scholarship@UNLV
Publication date: 15/12/2019
Field of study

In this dissertation, we tackle the task of quantifying the quality of actions, i.e., how well an action was performed using computer vision. Existing methods used human body pose-based features to express the quality contained in an action sample. Human body pose estimation in actions such as sports actions, like diving and gymnastic vault, is particularly challenging, since the athletes undergo convoluted transformations while performing their routines. Moreover, pose-based features do not take into account visual cues such as water splash in diving. Visual cues are taken into account by human judges. In our first work, we show that using visual representation -- spatiotemporal features computed using a 3D convolutional neural network -- is more suitable as those attend to appearance and salient motion patterns of the athlete\u27s performance. Along with developing three action quality assessment (AQA) frameworks, we also compile a diving and gymnastic vault dataset. Rather, learning an action-specific model, in our second work, we show that learning to assess the quality of multiple actions jointly is more efficient as it can exploit shared/common elements of quality among different actions. All-action modeling better uses the data, shows better generalization, and adaptation to unseen/novel action classes. Taking inspiration from the \u27learning by teaching\u27 method, we propose to take multitask learning (MTL) approach to AQA, unlike existing approaches, which follow single task learning (STL) paradigm. In our MTL approach we force the network to delineate the action sample -- recognize the action in detail, and commentate on good and bad points of the performance, in addition to the main task of AQA scoring. Through this better characterization of action sample, we are able to obtain state-of-the-art results on the task of AQA. To enable our MTL approach, we also released the largest multitask AQA dataset, MTL-AQA

University of Nevada, Las Vegas Repository

Deep learning that scales: leveraging compute and data

Author: Campos Camúñez Víctor
Publication venue: Universitat Politècnica de Catalunya
Publication date: 22/12/2020
Field of study

Deep learning has revolutionized the field of artificial intelligence in the past decade. Although the development of these techniques spans over several years, the recent advent of deep learning is explained by an increased availability of data and compute that have unlocked the potential of deep neural networks. They have become ubiquitous in domains such as natural language processing, computer vision, speech processing, and control, where enough training data is available. Recent years have seen continuous progress driven by ever-growing neural networks that benefited from large amounts of data and computing power. This thesis is motivated by the observation that scale is one of the key factors driving progress in deep learning research, and aims at devising deep learning methods that scale gracefully with the available data and compute. We narrow down this scope into two main research directions. The first of them is concerned with designing hardware-aware methods which can make the most of the computing resources in current high performance computing facilities. We then study bottlenecks preventing existing methods from scaling up as more data becomes available, providing solutions that contribute towards enabling training of more complex models. This dissertation studies the aforementioned research questions for two different learning paradigms, each with its own algorithmic and computational characteristics. The first part of this thesis studies the paradigm where the model needs to learn from a collection of examples, extracting as much information as possible from the given data. The second part is concerned with training agents that learn by interacting with a simulated environment, which introduces unique challenges such as efficient exploration and simulation

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Multi-Modality Human Action Recognition

Author: Zhu Yu
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2016
Field of study

Human action recognition is very useful in many applications in various areas, e.g. video surveillance, HCI (Human computer interaction), video retrieval, gaming and security. Recently, human action recognition becomes an active research topic in computer vision and pattern recognition. A number of action recognition approaches have been proposed. However, most of the approaches are designed on the RGB images sequences, where the action data was collected by RGB/intensity camera. Thus the recognition performance is usually related to various occlusion, background, and lighting conditions of the image sequences. If more information can be provided along with the image sequences, more data sources other than the RGB video can be utilized, human actions could be better represented and recognized by the designed computer vision system.;In this dissertation, the multi-modality human action recognition is studied. On one hand, we introduce the study of multi-spectral action recognition, which involves the information from different spectrum beyond visible, e.g. infrared and near infrared. Action recognition in individual spectra is explored and new methods are proposed. Then the cross-spectral action recognition is also investigated and novel approaches are proposed in our work. On the other hand, since the depth imaging technology has made a significant progress recently, where depth information can be captured simultaneously with the RGB videos. The depth-based human action recognition is also investigated. I first propose a method combining different type of depth data to recognize human actions. Then a thorough evaluation is conducted on spatiotemporal interest point (STIP) based features for depth-based action recognition. Finally, I advocate the study of fusing different features for depth-based action analysis. Moreover, human depression recognition is studied by combining facial appearance model as well as facial dynamic model

The Research Repository @ WVU (West Virginia University)

Image Manipulation and Image Synthesis

Author: Brehm Stephan
Publication venue
Publication date: 07/09/2022
Field of study

Image manipulation is of historic importance. Ever since the advent of photography, pictures have been manipulated for various reasons. Historic rulers often used image manipulation techniques for the purpose of self-portrayal or propaganda. In many cases, the goal is to manipulate human behaviour by spreading credible misinformation. Photographs, by their nature, portray the real world and as such are more credible to humans. However, image manipulation may not only serve evil purposes. In this thesis, we propose and analyse methods for image manipulation that serve a positive purpose. Specifically, we treat image manipulation as a tool for solving other tasks. For this, we model image manipulation as an image-to-image translation (I2I) task, i.e., a system that receives an image as input and outputs a manipulated version of the input. We propose multiple I2I based methods. We demonstrate that I2I based image manipulation methods can be used to reduce motion blur in videos. Second, we show that I2I based image manipulation methods can be used for domain adaptation and domain extension. Specifically, we present a method that significantly improves the learning of semantic segmentation from synthetic source data. The same technique can be applied to learning nighttime semantic segmentation from daylight images. Next, we show that I2I can be used to enable weakly supervised object segmentation. We show that each individual task requires and allows for different levels of supervision during the training of deep models in order to achieve best performance. We discuss the importance of maintaining control over the output of such methods and show that, with reduced levels of supervision, methods for maintaining stability during training and for establishing control over the output of a system become increasingly important. We propose multiple methods that solve the issues that arise in such systems. Finally, we demonstrate that our proposed mechanisms for control can be adapted to synthesise images from scratch

OPUS Augsburg

Natural Language Processing for Motivational Interviewing Counselling: Addressing Challenges in Resources, Benchmarking and Evaluation

Author: WU ZIXIU
Publication venue: Università degli Studi di Cagliari
Publication date: 04/10/2023
Field of study

Motivational interviewing (MI) is a counselling style often used in healthcare to improve patient health and quality of life by promoting positive behaviour changes. Natural language processing (NLP) has been explored for supporting MI use cases of insights/feedback generation and therapist training, such as automatically assigning behaviour labels to therapist/client utterances and generating possible therapist responses. Despite the progress of NLP for MI applications, significant challenges remain. The most prominent one is the lack of publicly available and annotated MI dialogue corpora due to privacy constraints. Consequently, there is also a lack of common benchmarks and poor reproducibility across studies. Furthermore, human evaluation for therapist response generation is expensive and difficult to scale due to its dependence on MI experts as evaluators. In this thesis, we address these challenges in 4 directions: low-resource NLP modelling, MI dialogue dataset creation, benchmark development for real-world applicable tasks, and laypeople-experts human evaluation study. First, we explore zero-shot binary empathy assessment at the utterance level. We experiment with a supervised approach that trains on heuristically constructed empathy vs. non-empathy contrast in non-therapy dialogues. While this approach has better performance than other models without empathy-aware training, it is still suboptimal and therefore highlights the need for a well-annotated MI dataset. Next, we create AnnoMI, the first publicly available dataset of expert-annotated MI dialogues. It contains MI conversations that demonstrate both high- and low-quality counselling, with extensive annotations by domain experts covering key MI attributes. We also conduct comprehensive analyses of the dataset. Then, we investigate two AnnoMI-based real-world applicable tasks: predicting current-turn therapist/client behaviour given the utterance, and forecasting next-turn therapist behaviour given the dialogue history. We find that language models (LMs) perform well on predicting therapist behaviours with good generalisability to new dialogue topics. However, LMs have suboptimal forecasting performance, which reflects therapists' flexibility where multiple optimal next-turn actions are possible. Lastly, we ask both laypeople and experts to evaluate the generation of a crucial type of therapist responses -- reflection -- on a key quality aspect: coherence and context-consistency. We find that laypeople are a viable alternative to experts, as laypeople show good agreement with each other and correlation with experts. We also find that a large LM generates mostly coherent and consistent reflections. Overall, the work of this thesis broadens access to NLP for MI significantly as well as presents a wide range of findings on related natural language understanding/generation tasks with a real-world focus. Thus, our contributions lay the groundwork for the broader NLP community to be more engaged in research for MI, which will ultimately improve the quality of life for recipients of MI counselling

Archivio istituzionale della ricerca - Università di Cagliari

From Anecdotal Evidence to Quantitative Evaluation Methods:A Systematic Review on Evaluating Explainable AI

Author: Nauta Meike
Nguyen Elisa
Pathak Shreyasi
Peters Michelle
Schlötterer Jörg
Schmitt Yasmin
Seifert Christin
Trienes Jan
van Keulen Maurice
Publication venue: ArXiv.org
Publication date: 31/05/2022
Field of study

The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes, also raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a multi-faceted concept. We identify 12 conceptual properties, such as Compactness and Correctness, that should be evaluated for comprehensively assessing the quality of an explanation. Our so-called Co-12 properties serve as categorization scheme for systematically reviewing the evaluation practice of more than 300 papers published in the last 7 years at major AI and ML conferences that introduce an XAI method. We find that 1 in 3 papers evaluate exclusively with anecdotal evidence, and 1 in 5 papers evaluate with users. We also contribute to the call for objective, quantifiable evaluation methods by presenting an extensive overview of quantitative XAI evaluation methods. This systematic collection of evaluation methods provides researchers and practitioners with concrete tools to thoroughly validate, benchmark and compare new and existing XAI methods. This also opens up opportunities to include quantitative metrics as optimization criteria during model training in order to optimize for accuracy and interpretability simultaneously.Comment: Link to website added: https://utwente-dmb.github.io/xai-papers

arXiv.org e-Print Archive

University of Twente Research Information

Latent variable methods for visualization through time

Author: Strachan Iain Guy David
Publication venue: The University of Edinburgh
Publication date: 01/01/2002
Field of study

Edinburgh Research Archive