Search CORE

73 research outputs found

Understanding and improving subjective measures in human-computer interaction

Author: Brühlmann Florian
Publication venue
Publication date: 01/01/2019
Field of study

In Human-Computer Interaction (HCI), research has shifted from a focus on usability and performance towards the holistic notion of User Experience (UX). Research into UX places special emphasis on concepts from psychology, such as emotion, trust, and motivation. Under this paradigm, elaborate methods to capture the richness and diversity of subjective experiences are needed. Although psychology offers a long-standing tradition of developing self-reported scales, it is currently undergoing radical changes in research and reporting practice. Hence, UX research is facing several challenges, such as the widespread use of ad-hoc questionnaires with unknown or unsatisfactory psychometric properties, or a lack of replication and transparency. Therefore, this thesis contributes to several gaps in the research by developing and validating self-reported scales in the domain of user motivation (manuscript 1), perceived user interface language quality (manuscript 2), and user trust (manuscript 3). Furthermore, issues of online research and practical considerations to ensure data quality are empirically examined (manuscript 4). Overall, this thesis provides well-documented templates for scale development, and may help improve scientific rigor in HCI

edoc

How to Measure the Game Experience? Analysis of the Factor Structure of Two Questionnaires

Author: Brühlmann Florian
Schmid Gian-Marco
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

We describe and report the analysis of two widely used questionnaires to measure the player experience in digital games. In order to contribute to the further validation and meaningful application of the PENS and GEQ we examined the underlying factorial structure of both questionnaires. Four hundred and forty-seven participants played two different games and rated them on a set of various variables including the PENS and GEQ. Consistent with previous research we gained additional insight into optimization of both measurements. While the factor structure of the PENS appears to be consistent and invariant across two different games, the GEQ reveals weaknesses in fulfilling these requirements

Crossref

edoc

Measuring user rated language quality: Development and validation of the user interface Language Quality Survey (LQS)

Author: Bargas-Avila Javier A.
Brühlmann Florian
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Written text plays a special role in user interfaces. Key information in interaction elements and content are mostly conveyed through text. The global context, where software has to run in multiple geographical and cultural regions, requires software developers to translate their interfaces into many different languages. This translation process is prone to errors – therefore the question of how language quality can be measured is important. This paper presents the development of a questionnaire to measure user interface language quality (LQS). After a first validation of the instrument with 843 participants, a final set of 10 items remained, which was tested again (). The survey showed a high internal consistency (Cronbach׳s α) of .82, acceptable discriminatory power coefficients (.34–.47), as well as a moderate average homogeneity of .36. The LQS also showed moderate correlation to UMUX, an established usability metric (convergent validity), and it successfully distinguished high and low language quality (discriminative validity). The application to three different products (YouTube, Google Analytics, Google AdWords) revealed similar key statistics, providing evidence that this survey is product-independent. Meanwhile, the survey has been translated and applied to more than 60 languages

Crossref

edoc

Breaking immersion: A theoretical framework of alienated play to facilitate critical reflection on interactive media

Author: Aeschbach Lena Fanya
Brühlmann Florian
Opwis Klaus
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2022
Field of study

There is a growing interest in understanding how to best represent complexity using IDNs. We conceptualize this as the aim to make players of such IDNs reflect critically on the complexity being represented. We argue that current understandings of player experience do not lend themselves to this aim. Research on interactive media has assumed immersion to be a universal positive for the player experience. However, in this article we argue that immersion into the Magic Circle of an IDN could be antagonistic to a critical experience. This is because immersion persuades players into suspending their disbelief, rather than facilitating critical reflection. Instead we propose, on the basis of the Epic Theater, an alternative form of play called alienated play. Meaning, a form of play in which the player is playing, while also observing themselves play. This form of play should allow for players to benefit from the enjoyable nature of play, while simultaneously remaining at a critical distance. To illustrate our theory we design two models, one for immersed play and one for alienated play. Furthermore, we present examples of the design for alienation in commercial video games, as well as hypotheses to test out theory in future research. Therefore, this work contributes an initial theoretical and practical informed form of play, specifically designed to facilitate critical reflection on IDNs representing complexity

edoc

Online Playtesting With Crowdsourcing: Advantages and Challenges

Author: Brühlmann Florian
Mekler Elisa D.
Schmid Gian-Marco
Publication venue: CHI 2016 Workshop: Lightweight Games User Research for Indies and Non-Profit Organizations
Publication date: 08/05/2016
Field of study

Answering important design questions and delivering actionable insights within a couple of days is invaluable. Traditional playtests are often time consuming, expensive and deliver insights based on only a small sample of participants. Crowdsourced playtests may deliver comparable quality of feedback with less resources. However, several aspects have to be considered in order to receive meaningful and actionable results. Based on our experience, we provide five recommendations to ensure data quality and prevent fraud. Taken together, this suggests that crowd-sourced playtesting is a promising alternative for indie, non-profit and academic Games User Research

edoc

The quality of data collected online: An investigation of careless responding in a crowdsourced sample

Author: Aeschbach Lena F.
Brühlmann Florian
Opwis Klaus
Petralito Serge
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Despite recent concerns about data quality, various academic fields rely increasingly on crowdsourced samples. Thus, the goal of this study was to systematically assess carelessness in a crowdsourced sample (N = 394) by applying various measures and detection methods. A Latent Profile Analysis revealed that 45.9% of the participants showed some form of careless behavior. Excluding these participants increased the effect size in an experiment included in the survey. Based on our findings, several recommendations of easy to apply measures for assessing data quality are given

edoc

Certification Labels for Trustworthy AI: Insights From an Empirical Mixed-Method Study

Author: Benk Michaela
Brühlmann Florian
Kühne Swen J.
Scharowski Nicolas
Wettstein Léane
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/05/2023
Field of study

Auditing plays a pivotal role in the development of trustworthy AI. However, current research primarily focuses on creating auditable AI documentation, which is intended for regulators and experts rather than end-users affected by AI decisions. How to communicate to members of the public that an AI has been audited and considered trustworthy remains an open challenge. This study empirically investigated certification labels as a promising solution. Through interviews (N = 12) and a census-representative survey (N = 302), we investigated end-users' attitudes toward certification labels and their effectiveness in communicating trustworthiness in low- and high-stakes AI scenarios. Based on the survey results, we demonstrate that labels can significantly increase end-users' trust and willingness to use AI in both low- and high-stakes scenarios. However, end-users' preferences for certification labels and their effect on trust and willingness to use AI were more pronounced in high-stake scenarios. Qualitative content analysis of the interviews revealed opportunities and limitations of certification labels, as well as facilitators and inhibitors for the effective use of labels in the context of AI. For example, while certification labels can mitigate data-related concerns expressed by end-users (e.g., privacy and data protection), other concerns (e.g., model performance) are more challenging to address. Our study provides valuable insights and recommendations for designing and implementing certification labels as a promising constituent within the trustworthy AI ecosystem

arXiv.org e-Print Archive

Repository for Publications and Research Data

ZHAW digitalcollection

To Trust or Distrust Trust Measures: Validating Questionnaires for Trust in AI

Author: Aeschbach Lena Fanya
Brühlmann Florian
Opwis Klaus
Perrig Sebastian A. C.
Scharowski Nicolas
von Felten Nick
Wintersberger Philipp
Publication venue
Publication date: 01/03/2024
Field of study

Despite the importance of trust in human-AI interactions, researchers must adopt questionnaires from other disciplines that lack validation in the AI context. Motivated by the need for reliable and valid measures, we investigated the psychometric quality of two trust questionnaires, the Trust between People and Automation scale (TPA) by Jian et al. (2000) and the Trust Scale for the AI Context (TAI) by Hoffman et al. (2023). In a pre-registered online experiment (N = 1485), participants observed interactions with trustworthy and untrustworthy AI (autonomous vehicle and chatbot). Results support the psychometric quality of the TAI while revealing opportunities to improve the TPA, which we outline in our recommendations for using the two questionnaires. Furthermore, our findings provide additional empirical evidence of trust and distrust as two distinct constructs that may coexist independently. Building on our findings, we highlight the opportunities and added value of measuring both trust and distrust in human-AI research and advocate for further work on both constructs

arXiv.org e-Print Archive

Exploring the effects of human-centered AI explanations on trust and reliance

Author: Florian Brühlmann
Klaus Opwis
Melanie Svab
Nicolas Scharowski
Sebastian A. C. Perrig
Publication venue: 'Frontiers Media SA'
Publication date: 01/07/2023
Field of study

Transparency is widely regarded as crucial for the responsible real-world deployment of artificial intelligence (AI) and is considered an essential prerequisite to establishing trust in AI. There are several approaches to enabling transparency, with one promising attempt being human-centered explanations. However, there is little research into the effectiveness of human-centered explanations on end-users' trust. What complicates the comparison of existing empirical work is that trust is measured in different ways. Some researchers measure subjective trust using questionnaires, while others measure objective trust-related behavior such as reliance. To bridge these gaps, we investigated the effects of two promising human-centered post-hoc explanations, feature importance and counterfactuals, on trust and reliance. We compared these two explanations with a control condition in a decision-making experiment (N = 380). Results showed that human-centered explanations can significantly increase reliance but the type of decision-making (increasing a price vs. decreasing a price) had an even greater influence. This challenges the presumed importance of transparency over other factors in human decision-making involving AI, such as potential heuristics and biases. We conclude that trust does not necessarily equate to reliance and emphasize the importance of appropriate, validated, and agreed-upon metrics to design and evaluate human-centered AI

Directory of Open Access Journals

Recommended from our members

Many Labs 5: Testing Pre-Data-Collection Peer Review as an Intervention to Increase Replicability

Author: Aczel Balazs
Aeschbach Lena F
Andrighetto Luca
Arnal Jack D
Arrow Holly
Ashbaugh Kayla
Babincak Peter
Bakos Bence E
Baník Gabriel
Baranski Erica
Bart-Plange Diane-Jo
Baskin Ernest
Belopavlović Radomir
Bernstein Michael H
Białek Michał
Bloxsom Nicholas G
Bodroža Bojana
Bonfiglio Diane BV
Boucher Leanne
Brumbaugh Claudia C
Brühlmann Florian
Buttrick Nicholas R
Casini Erica
Chartier Christopher R
Chen Yiling
Chiorri Carlo
Chopik William J
Christ Oliver
Ciunci Antonia M
Claypool Heather M
Coary Sean
Collins W Matthew
Corker Katherine S
Corley Martin
Curran Paul G
Day Chris R
de Lima Tiago Jessé Souza
Dering Benjamin
Dreber Anna
Ebersole Charles R
Edlund John E
Falcão Filipe
Fedor Anna
Feinberg Lily
Ferguson Ian R
Ford Máire
Frank Michael C
Fryberger Emily
Garinther Alexander
Gawryluk Katarzyna
Giacomantonio Mauro
Giessner Steffen R
Grahe Jon E
Guadagno Rosanna E
Hancock Peter JB
Hartshorne Joshua K
Hałasa Ewa
Hilliard Rias A
Hughes Sean
Hüffmeier Joachim
Idzikowska Katarzyna
IJzerman Hans
Inzlicht Michael
Jern Alan
Jiménez-Leal William
Johannesson Magnus
Joy-Gaba Jennifer A
Kauff Mathias
Kellier Danielle J
Kessinger Grecia
Kidwell Mallory C
Kimbrough Amanda M
King Josiah PJ
Kolb Vanessa S
Kovacs Marton
Kołodziej Sabina
Krasuska Karolina
Kraus Sue
Krueger Lacy E
Kuchno Katarzyna
Lage Caio Ambrosio
Langford Eleanor V
Lazarević Ljiljana B
Levitan Carmel A
Lin Hause
Lins Samuel
Loy Jia E
Manfredi Dylan
Markiewicz Łukasz
Mathur Maya B
Menon Madhavi
Mercier Brett
Metzger Mitchell
Meyet Venus
Millen Ailsa E
Miller Jeremy K
Montealegre Andres
Rabagliati Hugh
Ropovik Ivan
Čolić Marija V
Publication venue: eScholarship, University of California
Publication date: 01/09/2020
Field of study

Replication studies in psychological science sometimes fail to reproduce prior findings. If these studies use methods that are unfaithful to the original study or ineffective in eliciting the phenomenon of interest, then a failure to replicate may be a failure of the protocol rather than a challenge to the original finding. Formal pre-data-collection peer review by experts may address shortcomings and increase replicability rates. We selected 10 replication studies from the Reproducibility Project: Psychology (RP:P; Open Science Collaboration, 2015) for which the original authors had expressed concerns about the replication designs before data collection; only one of these studies had yielded a statistically significant effect (p <.05). Commenters suggested that lack of adherence to expert review and low-powered tests were the reasons that most of these RP:P studies failed to replicate the original effects. We revised the replication protocols and received formal peer review prior to conducting new replication studies. We administered the RP:P and revised protocols in multiple laboratories (median number of laboratories per original study = 6.5, range = 3–9; median total sample = 1,279.5, range = 276–3,512) for high-powered tests of each original finding with both protocols. Overall, following the preregistered analysis plan, we found that the revised protocols produced effect sizes similar to those of the RP:P protocols (Δr =.002 or.014, depending on analytic approach). The median effect size for the revised protocols (r =.05) was similar to that of the RP:P protocols (r =.04) and the original RP:P replications (r =.11), and smaller than that of the original studies (r =.37). Analysis of the cumulative evidence across the original studies and the corresponding three replication attempts provided very precise estimates of the 10 tested effects and indicated that their effect sizes (median r =.07, range =.00–.15) were 78% smaller, on average, than the original effect sizes (median r =.37, range =.19–.50)

eScholarship - University of California