17 research outputs found
The unbearable (technical) unreliability of automated facial emotion recognition
Emotion recognition, and in particular acial emotion recognition (FER), is among the most controversial applications of machine learning, not least because of its ethical implications for human subjects. In this article, we address the controversial conjecture that machines can read emotions from our facial expressions by asking whether this task can be performed reliably. This means, rather than considering the potential harms or scientific soundness of facial emotion recognition systems, focusing on the reliability of the ground truths used to develop emotion recognition systems, assessing how well different human observers agree on the emotions they detect in subjects' faces. Additionally, we discuss the extent to which sharing context can help observers agree on the emotions they perceive on subjects' faces. Briefly, we demonstrate that when large and heterogeneous samples of observers are involved, the task of emotion detection from static images crumbles into inconsistency. We thus reveal that any endeavour to understand human behaviour from large sets of labelled patterns is over-ambitious, even if it were technically feasible. We conclude that we cannot speak of actual accuracy for facial emotion recognition systems for any practical purposes
The Impact of Gender and Personality in Human-AI Teaming: The Case of Collaborative Question Answering
This paper discusses the results of an exploratory study aimed at investigating the impact of conversational agents (CAs) and specifically their agential characteristics on collaborative decision-making processes. The study involved 29 participants divided into 8 small teams engaged in a question-and-answer trivia-style game with the support of a text-based CA, characterized by two independent binary variables: personality (gentle and cooperative vs blunt and uncooperative) and gender (female vs male). A semi-structured group interview was conducted at the end of the experimental sessions to investigate the perceived utility and level of satisfaction with the CAs. Our results show that when users interact with a gentle and cooperative CA, their user satisfaction is higher. Furthermore, female CAs are perceived as more useful and satisfying to interact with than male CAs. We show that group performance improves through interaction with the CAs, confirming that a stereotype favoring the female with a gentle and cooperative personality combination exists in regard to perceived satisfaction, even though this does not lead to greater perceived utility. Our study extends the current debate about the possible correlation between CA characteristics and human acceptance and suggests future research to investigate the role of gender bias and related biases in human-AI teaming
Painting the black box white: experimental findings from applying XAI to an ECG reading setting
The shift from symbolic AI systems to black-box, sub-symbolic, and
statistical ones has motivated a rapid increase in the interest toward
explainable AI (XAI), i.e. approaches to make black-box AI systems explainable
to human decision makers with the aim of making these systems more acceptable
and more usable tools and supports. However, we make the point that, rather
than always making black boxes transparent, these approaches are at risk of
\emph{painting the black boxes white}, thus failing to provide a level of
transparency that would increase the system's usability and comprehensibility;
or, even, at risk of generating new errors, in what we termed the
\emph{white-box paradox}. To address these usability-related issues, in this
work we focus on the cognitive dimension of users' perception of explanations
and XAI systems. To this aim, we designed and conducted a questionnaire-based
experiment by which we involved 44 cardiology residents and specialists in an
AI-supported ECG reading task. In doing so, we investigated different research
questions concerning the relationship between users' characteristics (e.g.
expertise) and their perception of AI and XAI systems, including their trust,
the perceived explanations' quality and their tendency to defer the decision
process to automation (i.e. technology dominance), as well as the mutual
relationships among these different dimensions. Our findings provide a
contribution to the evaluation of AI-based support systems from a Human-AI
interaction-oriented perspective and lay the ground for further investigation
of XAI and its effects on decision making and user experience.Comment: 15 pages, 7 figure
The multicenter European Biological Variation Study (EuBIVAS): a new glance provided by the Principal Component Analysis (PCA), a machine learning unsupervised algorithms, based on the basic metabolic panel linked measurands
Abstract
Objectives
The European Biological Variation Study (EuBIVAS), which includes 91 healthy volunteers from five European countries, estimated high-quality biological variation (BV) data for several measurands. Previous EuBIVAS papers reported no significant differences among laboratories/population; however, they were focused on specific set of measurands, without a comprehensive general look. The aim of this paper is to evaluate the homogeneity of EuBIVAS data considering multivariate information applying the Principal Component Analysis (PCA), a machine learning unsupervised algorithm.
Methods
The EuBIVAS data for 13 basic metabolic panel linked measurands (glucose, albumin, total protein, electrolytes, urea, total bilirubin, creatinine, phosphatase alkaline, aminotransferases), age, sex, menopause, body mass index (BMI), country, alcohol, smoking habits, and physical activity, have been used to generate three databases developed using the traditional univariate and the multivariate Elliptic Envelope approaches to detect outliers, and different missing-value imputations. Two matrix of data for each database, reporting both mean values, and "within-person BV" (CVP) values for any measurand/subject, were analyzed using PCA.
Results
A clear clustering between males and females mean values has been identified, where the menopausal females are closer to the males. Data interpretations for the three databases are similar. No significant differences for both mean and CVPs values, for countries, alcohol, smoking habits, BMI and physical activity, have been found.
Conclusions
The absence of meaningful differences among countries confirms the EuBIVAS sample homogeneity and that the obtained data are widely applicable to deliver APS. Our data suggest that the use of PCA and the multivariate approach may be used to detect outliers, although further studies are required
A case study with automatic diagnosis of electrocardiogram
Funding Information: This work was supported by European funds through the Recovery and Resilience Plan, project ”Center for Responsible AI”, project number C645008882-00000055 . Publisher Copyright: © 2023 The Author(s)Artificial Intelligence (AI) use in automated Electrocardiogram (ECG) classification has continuously attracted the research community's interest, motivated by their promising results. Despite their great promise, limited attention has been paid to the robustness of their results, which is a key element for their implementation in clinical practice. Uncertainty Quantification (UQ) is a critical for trustworthy and reliable AI, particularly in safety-critical domains such as medicine. Estimating uncertainty in Machine Learning (ML) model predictions has been extensively used for Out-of-Distribution (OOD) detection under single-label tasks. However, the use of UQ methods in multi-label classification remains underexplored. This study goes beyond developing highly accurate models comparing five uncertainty quantification methods using the same Deep Neural Network (DNN) architecture across various validation scenarios, including internal and external validation as well as OOD detection, taking multi-label ECG classification as the example domain. We show the importance of external validation and its impact on classification performance, uncertainty estimates quality, and calibration. Ensemble-based methods yield more robust uncertainty estimations than single network or stochastic methods. Although current methods still have limitations in accurately quantifying uncertainty, particularly in the case of dataset shift, incorporating uncertainty estimates with a classification with a rejection option improves the ability to detect such changes. Moreover, we show that using uncertainty estimates as a criterion for sample selection in active learning setting results in greater improvements in classification performance compared to random sampling.publishersversionpublishe
Comparing Handcrafted Features and Deep Neural Representations for Domain Generalization in Human Activity Recognition
Human Activity Recognition (HAR) has been studied extensively, yet current approaches are not capable of generalizing across different domains (i.e., subjects, devices, or datasets) with acceptable performance. This lack of generalization hinders the applicability of these models in real-world environments. As deep neural networks are becoming increasingly popular in recent work, there is a need for an explicit comparison between handcrafted and deep representations in Out-of-Distribution (OOD) settings. This paper compares both approaches in multiple domains using homogenized public datasets. First, we compare several metrics to validate three different OOD settings. In our main experiments, we then verify that even though deep learning initially outperforms models with handcrafted features, the situation is reversed as the distance from the training distribution increases. These findings support the hypothesis that handcrafted features may generalize better across specific domains.publishe
Prediction of Choice from Competing Mechanosensory and Choice-Memory Cues during Active Tactile Decision Making
Perceptual decision making is an active process where animals move their sense organs to extract task-relevant information. To investigate how the brain translates sensory input into decisions during active sensation, we developed a mouse active touch task where the mechanosensory input can be precisely measured and that challenges animals to use multiple mechanosensory cues. Male mice were trained to localize a pole using a single whisker and to report their decision by selecting one of three choices. Using high-speed imaging and machine vision, we estimated whisker–object mechanical forces at millisecond resolution. Mice solved the task by a sensory-motor strategy where both the strength and direction of whisker bending were informative cues to pole location. We found competing influences of immediate sensory input and choice memory on mouse choice. On correct trials, choice could be predicted from the direction and strength of whisker bending, but not from previous choice. In contrast, on error trials, choice could be predicted from previous choice but not from whisker bending. This study shows that animal choices during active tactile decision making can be predicted from mechanosensory and choice-memory signals, and provides a new task well suited for the future study of the neural basis of active perceptual decisions
Toward a Perspectivist Turn in Ground Truthing for Predictive Computing
Most current Artificial Intelligence applications are based on supervised Machine Learning (ML), which ultimately grounds on data annotated by small teams of experts or large ensemble of volunteers. The annotation process is often performed in terms of a majority vote, however this has been proved to be often problematic by recent evaluation studies.
In this article, we describe and advocate for a different paradigm, which we call perspectivism: this counters the removal of disagreement and, consequently, the assumption of correctness of traditionally aggregated gold-standard datasets, and proposes the adoption of methods that preserve divergence of opinions and integrate multiple perspectives in the ground truthing process of ML development. Drawing on previous works which inspired it, mainly from the crowdsourcing and multi-rater labeling settings, we survey the state-of-the-art and describe the potential of our proposal for not only the more subjective tasks (e.g. those related to human language) but also those tasks commonly understood as objective (e.g. medical decision making). We present the main benefits of adopting a perspectivist stance in ML, as well as possible disadvantages, and various ways in which such a stance can be implemented in practice. Finally, we share a set of recommendations and outline a research agenda to advance the perspectivist stance in ML
Belief Functions and Rough Sets: Survey and New Insights
International audienceRough set theory and belief function theory, two popular mathematical frameworks for uncertainty representation, have been widely applied in different settings and contexts. Despite different origins and mathematical foundations, the fundamental concepts of the two formalisms (i.e., approximations in rough set theory, belief and plausibility functions in belief function theory) are closely related. In this survey article, we review the most relevant contributions studying the links between these two uncertainty representation formalisms. In particular, we discuss the theoretical relationships connecting the two approaches, as well as their applications in knowledge representation and machine learning. Special attention is paid to the combined use of these formalisms as a way of dealing with imprecise and uncertain information. The aim of this work is, thus, to provide a focused picture of these two important fields, discuss some known results and point to relevant future research directions