1,233 research outputs found
Interactive Decision making using Dissimilarity to visually represented Prototypes
ABSTRACT To make informed decisions, an expert has to reason with multidimensional, heterogeneous data and analysis results of these. Items in such datasets are typically represented by features. However, as argued in cognitive science, features do not yield an optimal space for human reasoning. In fact, humans tend to organize complex information in terms of prototypes or known cases rather than in absolute terms. When confronted with unknown data items, humans assess them in terms of similarity to these prototypical elements. Interestingly, an analogues similarity-to-prototype approach, where prototypes are taken from the data, has been successfully applied in machine learning. Combining such a machine learning approach with human prototypical reasoning in a Visual Analytics context requires to integrate similarity-based classification with interactive visualizations. To that end, the data prototypes should be visually represented to trigger direct associations to cases familiar to the domain experts. In this paper, we propose a set of highly interactive visualizations to explore data and classification results in terms of dissimilarities to visually represented prototypes. We argue that this approach not only supports human reasoning processes, but is also suitable to enhance understanding of heterogeneous data. The proposed framework is applied to a risk assessment case study in Forensic Psychiatry
ProtoExplorer: Interpretable Forensic Analysis of Deepfake Videos using Prototype Exploration and Refinement
In high-stakes settings, Machine Learning models that can provide predictions
that are interpretable for humans are crucial. This is even more true with the
advent of complex deep learning based models with a huge number of tunable
parameters. Recently, prototype-based methods have emerged as a promising
approach to make deep learning interpretable. We particularly focus on the
analysis of deepfake videos in a forensics context. Although prototype-based
methods have been introduced for the detection of deepfake videos, their use in
real-world scenarios still presents major challenges, in that prototypes tend
to be overly similar and interpretability varies between prototypes. This paper
proposes a Visual Analytics process model for prototype learning, and, based on
this, presents ProtoExplorer, a Visual Analytics system for the exploration and
refinement of prototype-based deepfake detection models. ProtoExplorer offers
tools for visualizing and temporally filtering prototype-based predictions when
working with video data. It disentangles the complexity of working with
spatio-temporal prototypes, facilitating their visualization. It further
enables the refinement of models by interactively deleting and replacing
prototypes with the aim to achieve more interpretable and less biased
predictions while preserving detection accuracy. The system was designed with
forensic experts and evaluated in a number of rounds based on both open-ended
think aloud evaluation and interviews. These sessions have confirmed the
strength of our prototype based exploration of deepfake videos while they
provided the feedback needed to continuously improve the system.Comment: 15 pages, 6 figure
Human-assisted self-supervised labeling of large data sets
There is a severe demand for, and shortage of, large accurately labeled datasets to train supervised computational intelligence (CI) algorithms in domains like unmanned aerial systems (UAS) and autonomous vehicles. This has hindered our ability to develop and deploy various computer vision algorithms in/across environments and niche domains for tasks like detection, localization, and tracking. Herein, I propose a new human-in-the-loop (HITL) based growing neural gas (GNG) algorithm to minimize human intervention during labeling large UAS data collections over a shared geospatial area. Specifically, I address human driven events like new class identification and mistake correction. I also address algorithm-centric operations like new pattern discovery and self-supervised labeling. Pattern discovery and identification through self-supervised labeling is made possible through open set recognition (OSR). Herein, I propose a classifier with the ability to say "I don't know" to identify outliers in the data and bootstrap deep learning (DL) models, specifically convolutional neural networks (CNNs), with the ability to classify on N+1 classes. The effectiveness of the algorithms are demonstrated using simulated realistic ray-traced low altitude UAS data from the Unreal Engine. The results show that it is possible to increase speed and reduce mental fatigue over hand labeling large image datasets.Includes bibliographical references
Social Identity Enactment Through Linguistic Style: Using Naturally Occurring Online Data to Study Behavioural Prototypicality
Social identity prototypes refer to the quintessential representation of a particular social identity; prototypes define and prescribe the characteristics, behaviours and attitudes of a particular group, as distinguished from other groups (Hogg, 2001). For the most part, identity prototypicality is studied using self-reported methods used to assess perceptions of the prototypicality of self and others. However, in this thesis we provide behavioural evidence to demonstrate how linguistic style data can be used to measure identity-prototypical behaviour in real world contexts. Combining naturally-occurring online data with experimental data, the first chapter demonstrates that individuals behave in an identity-prototypical way regardless of the context in which they are communicating. Further, we show that this identity-prototypical style of communication is robust to topic, demographics, personality and platform, and moreover that the same identity-prototypical communication style can be detected in experimentally controlled conditions. In the second chapter, we demonstrate the small but statistically significant link between identity-prototypical communication and influence in real-world forum data. This finding provides insight into how group members respond to other ingroup members based on their prototypical communication style in real-world situations. Finally, in the third chapter, we use the group prototypical behaviour observed in naturally occurring online forum data to construct a typology of social identities, demonstrating the existence of five different types of social identity in line with the research of Deaux et al. (1995). We also demonstrate that it is possible to use this measurement of behavioural prototypicality to observe identity change over time. Using eight years’ worth of forum data, we illustrate the slow movement of the transgender identity from being a stigmatised identity in 2012, to shifting towards a collective action identity in 2019. In sum, the findings outlined in this thesis provide evidence to support the idea that it is possible to use machine learning algorithms and naturally occurring online data to study behavioural prototypicality in real world environments. Moreover, this methodology enables us to study identities ‘in the wild’ thus transcending the limitations associated with using self-reported methodologies or experimental approaches to study how individuals express and enact their group memberships. Further, we also demonstrate the value in using naturally-occurring online behavioural data to test and extend the key components of social identity theory.Engineering and Physical Sciences Research Council (EPSRC)Engineering and Physical Sciences Research Council (EPSRC
Interpretable Models Capable of Handling Systematic Missingness in Imbalanced Classes and Heterogeneous Datasets
Application of interpretable machine learning techniques on medical datasets facilitate early and fast diagnoses, along with getting deeper insight into the data. Furthermore, the transparency of these models increase trust among application domain experts. Medical datasets face common issues such as heterogeneous measurements, imbalanced classes with limited sample size, and missing data, which hinder the straightforward application of machine learning techniques. In this paper we present a family of prototype-based (PB) interpretable models which are capable of handling these issues. The models introduced in this contribution show comparable or superior performance to alternative techniques applicable in such situations. However, unlike ensemble based models, which have to compromise on easy interpretation, the PB models here do not. Moreover we propose a strategy of harnessing the power of ensembles while maintaining the intrinsic interpretability of the PB models, by averaging the model parameter manifolds. All the models were evaluated on a synthetic (publicly available dataset) in addition to detailed analyses of two real-world medical datasets (one publicly available). Results indicated that the models and strategies we introduced addressed the challenges of real-world medical data, while remaining computationally inexpensive and transparent, as well as similar or superior in performance compared to their alternatives
- …