Search CORE

294 research outputs found

Can we trust online crowdworkers? Comparing online and offline participants in a preference test of virtual agents

Author: Hauser David J
Jonell Patrik
Kaufmann Nicolas
Komarov Steven
Kucherenko Taras
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/09/2020
Field of study

Conducting user studies is a crucial component in many scientific fields. While some studies require participants to be physically present, other studies can be conducted both physically (e.g. in-lab) and online (e.g. via crowdsourcing). Inviting participants to the lab can be a time-consuming and logistically difficult endeavor, not to mention that sometimes research groups might not be able to run in-lab experiments, because of, for example, a pandemic. Crowdsourcing platforms such as Amazon Mechanical Turk (AMT) or Prolific can therefore be a suitable alternative to run certain experiments, such as evaluating virtual agents. Although previous studies investigated the use of crowdsourcing platforms for running experiments, there is still uncertainty as to whether the results are reliable for perceptual studies. Here we replicate a previous experiment where participants evaluated a gesture generation model for virtual agents. The experiment is conducted across three participant pools -- in-lab, Prolific, and AMT -- having similar demographics across the in-lab participants and the Prolific platform. Our results show no difference between the three participant pools in regards to their evaluations of the gesture generation models and their reliability scores. The results indicate that online platforms can successfully be used for perceptual evaluations of this kind.Comment: Accepted to IVA 2020. Patrik Jonell and Taras Kucherenko contributed equally to this wor

arXiv.org e-Print Archive

An Open source Implementation of ITU-T Recommendation P.808 with Validation

Author: Cutler Ross
Naderi Babak
Publication venue: 'International Speech Communication Association'
Publication date: 16/05/2020
Field of study

The ITU-T Recommendation P.808 provides a crowdsourcing approach for conducting a subjective assessment of speech quality using the Absolute Category Rating (ACR) method. We provide an open-source implementation of the ITU-T Rec. P.808 that runs on the Amazon Mechanical Turk platform. We extended our implementation to include Degradation Category Ratings (DCR) and Comparison Category Ratings (CCR) test methods. We also significantly speed up the test process by integrating the participant qualification step into the main rating task compared to a two-stage qualification and rating solution. We provide program scripts for creating and executing the subjective test, and data cleansing and analyzing the answers to avoid operational errors. To validate the implementation, we compare the Mean Opinion Scores (MOS) collected through our implementation with MOS values from a standard laboratory experiment conducted based on the ITU-T Rec. P.800. We also evaluate the reproducibility of the result of the subjective speech quality assessment through crowdsourcing using our implementation. Finally, we quantify the impact of parts of the system designed to improve the reliability: environmental tests, gold and trapping questions, rating patterns, and a headset usage test

arXiv.org e-Print Archive

Training and Application of Correct Information Unit Analysis to Structured and Unstructured Discourse

Author: Cohen Audrey Bretthauer
Publication venue: PDXScholar
Publication date: 03/06/2015
Field of study

Correct Information Units (CIU) analysis is one of the few measures of discourse that attempts to quantify discourse as a function of communicating information efficiently. Though this analysis is used reliably as a research tool, most studies\u27 apply CIUs to structured discourse tasks and do not specifically describe how raters are trained. If certified clinical speech-language pathologists can likewise reliably apply CIU analysis within clinical settings to unstructured discourse, such as the discourse of people with aphasia (PWA), it may allow clinicians to quantify the information communicated efficiently in clinical populations with discourse deficits. Purpose: The purpose of this study is to determine if using the outlined training module, clinicians are able to score CIUs with similar inter-rater reliability across both structured and unstructured discourse samples as researchers. Method: Four certified SLPs will undergo a two-hour training session in CIU analysis similar to that of a university research staffs\u27 CIU training protocol. Each SLP will score CIUs in structured and unstructured language samples collected from individuals diagnosed with aphasia. The SLP\u27 scores within the structured and unstructured discourse samples will be compared to those of a university research lab staffs\u27. This will determine (1) whether SLPs can reliably code CIUs when compared with research raters in a lab setting when both using the same two-hour CIU training and resources allotted; (2) whether there is a significant difference in reliability when structured and unstructured discourse is analyzed

PDXScholar (Portland State University)

An alternative approach to measuring reliability of transcription in children’s speech samples

Author: Gibbin Sarah
Morgan Lydia
Seifert MIriam
Wren Yvonne E
Publication venue: 'S. Karger AG'
Publication date: 30/09/2019
Field of study

Automated Virtual Coach for Surgical Training

Author: Malpani Anand
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 03/10/2018
Field of study

Surgical educators have recommended individualized coaching for acquisition, retention and improvement of expertise in technical skills. Such one-on-one coaching is limited to institutions that can afford surgical coaches and is certainly not feasible at national and global scales. We hypothesize that automated methods that model intraoperative video, surgeon's hand and instrument motion, and sensor data can provide effective and efficient individualized coaching. With the advent of instrumented operating rooms and training laboratories, access to such large scale intra-operative data has become feasible. Previous methods for automated skill assessment present an overall evaluation at the task/global level to the surgeons without any directed feedback and error analysis. Demonstration, if at all, is present in the form of fixed instructional videos, while deliberate practice is completely absent from automated training platforms. We believe that an effective coach should: demonstrate expert behavior (how do I do it correctly), evaluate trainee performance (how did I do) at task and segment-level, critique errors and deficits (where and why was I wrong), recommend deliberate practice (what do I do to improve), and monitor skill progress (when do I become proficient). In this thesis, we present new methods and solutions towards these coaching interventions in different training settings viz. virtual reality simulation, bench-top simulation and the operating room. First, we outline a summarizations-based approach for surgical phase modeling using various sources of intra-operative procedural data such as – system events (sensors) as well as crowdsourced surgical activity context. We validate a crowdsourced approach to obtain context summarizations of intra-operative surgical activity. Second, we develop a new scoring method to evaluate task segments using rankings derived from pairwise comparisons of performances obtained via crowdsourcing. We show that reliable and valid crowdsourced pairwise comparisons can be obtained across multiple training task settings. Additionally, we present preliminary results comparing inter-rater agreement in relative ratings and absolute ratings for crowdsourced assessments of an endoscopic sinus surgery training task data set. Third, we implement a real-time feedback and teaching framework using virtual reality simulation to present teaching cues and deficit metrics that are targeted at critical learning elements of a task. We compare the effectiveness of this real-time coach to independent self-driven learning on a needle passing task in a pilot randomized controlled trial. Finally, we present an integration of the above components of task progress detection, segment-level evaluation and real-time feedback towards the first end-to-end automated virtual coach for surgical training

JScholarship

Human-in-the-Loop Learning From Crowdsourcing and Social Media

Author: Liu Tong
Publication venue: RIT Scholar Works
Publication date: 01/06/2020
Field of study

Computational social studies using public social media data have become more and more popular because of the large amount of user-generated data available. The richness of social media data, coupled with noise and subjectivity, raise significant challenges for computationally studying social issues in a feasible and scalable manner. Machine learning problems are, as a result, often subjective or ambiguous when humans are involved. That is, humans solving the same problems might come to legitimate but completely different conclusions, based on their personal experiences and beliefs. When building supervised learning models, particularly when using crowdsourced training data, multiple annotations per data item are usually reduced to a single label representing ground truth. This inevitably hides a rich source of diversity and subjectivity of opinions about the labels. Label distribution learning associates for each data item a probability distribution over the labels for that item, thus it can preserve diversities of opinions, beliefs, etc. that conventional learning hides or ignores. We propose a humans-in-the-loop learning framework to model and study large volumes of unlabeled subjective social media data with less human effort. We study various annotation tasks given to crowdsourced annotators and methods for aggregating their contributions in a manner that preserves subjectivity and disagreement. We introduce a strategy for learning label distributions with only five-to-ten labels per item by aggregating human-annotated labels over multiple, semantically related data items. We conduct experiments using our learning framework on data related to two subjective social issues (work and employment, and suicide prevention) that touch many people worldwide. Our methods can be applied to a broad variety of problems, particularly social problems. Our experimental results suggest that specific label aggregation methods can help provide reliable representative semantics at the population level

RIT Scholar Works

Recommended from our members

A framework for evaluating automatic indexing or classification in the context of retrieval

Author: Anderson
Aronson
Bainbridge
Beaulieu
Belkin
Blandford
Borlund
Braschler
Brenner
Buckley
Buckley
Chung
Cleverdon
Colosimo
Cooper
Davis
Fidel
Golub
Golub
Golub
Hersh
Hliaoutakis
Hosseini
Hripcsak
Huang
Iivonen
Ingwersen
Ingwersen
Kazai
Kekäläinen
Kim
Lalmas
Lancaster
Lancaster
Lewis
Liu
Lykke
Mai
Markey
Medelyan
Mladenic
Moens
Oard
Olson
Paynter
Plaunt
Purpura
Ribeiro-Neto
Roberts
Roitblat
Rolling
Rosenberg
Ruiz
Saracevic
Saracevic
Saracevic
Sebastiani
Silvester
Soergel
Soergel
Sormunen
Sparck Jones
Suomela
Svarre
Tonkin
Tsai
Venanzi
Voorhees
Publication venue: 'Wiley'
Publication date: 22/10/2015
Field of study

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. While some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The paper reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single “gold standard” method when evaluating indexing and retrieval and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on indexing, classification and approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard; evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance

City Research Online

University of South Wales Research Explorer

VBN

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Linnéuniversitetets forskningsdatabas

Fully Automatic Analysis of Engagement and Its Relationship to Personality in Human-Robot Interactions

Author: Celiktutan O
Chetouani M
Gunes H
Hupont I
Salam H
Publication venue: IEEE Access
Publication date: 01/09/2016
Field of study

Engagement is crucial to designing intelligent systems that can adapt to the characteristics of their users. This paper focuses on automatic analysis and classification of engagement based on humans’ and robot’s personality profiles in a triadic human-human-robot interaction setting. More explicitly, we present a study that involves two participants interacting with a humanoid robot, and investigate how participants’ personalities can be used together with the robot’s personality to predict the engagement state of each participant. The fully automatic system is firstly trained to predict the Big Five personality traits of each participant by extracting individual and interpersonal features from their nonverbal behavioural cues. Secondly, the output of the personality prediction system is used as an input to the engagement classification system. Thirdly, we focus on the concept of “group engagement”, which we define as the collective engagement of the participants with the robot, and analyse the impact of similar and dissimilar personalities on the engagement classification. Our experimental results show that (i) using the automatically predicted personality labels for engagement classification yields an F-measure on par with using the manually annotated personality labels, demonstrating the effectiveness of the automatic personality prediction module proposed; (ii) using the individual and interpersonal features without utilising personality information is not sufficient for engagement classification, instead incorporating the participants’ and robot’s personalities with individual/interpersonal features increases engagement classification performance; and (iii) the best classification performance is achieved when the participants and the robot are extroverted, while the worst results are obtained when all are introverted.This work was performed within the Labex SMART project (ANR-11-LABX-65) supported by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR-11-IDEX-0004-02. The work of Oya Celiktutan and Hatice Gunes is also funded by the EPSRC under its IDEAS Factory Sandpits call on Digital Personhood (Grant Ref.: EP/L00416X/1).This is the author accepted manuscript. The final version is available from Institute of Electrical and Electronics Engineers via http://dx.doi.org/10.1109/ACCESS.2016.261452

Spiral - Imperial College Digital Repository

An Italian lexical resource for incivility detection in online discourses

Author: Basile Valerio
Publication venue
Publication date: 01/01/2022
Field of study