381 research outputs found
Predicting the confusion level of text excerpts with syntactic, lexical and n-gram features
Distance learning, offline presentations (presentations that are not being carried in a live fashion but were instead pre-recorded) and such activities whose main goal is to convey information are getting increasingly relevant with digital media such as Virtual Reality (VR) and Massive Online Open Courses (MOOCs). While MOOCs are a well-established reality in the learning environment, VR is also being used to promote learning in virtual rooms, be it in the academia or in the industry. Oftentimes these methods are based on written scripts that take the learner through the content, making them critical components to these tools. With such an important role, it is important to ensure the efficiency of these scripts.
Confusion is a non-basic emotion associated with learning. This process often leads to a cognitive disequilibrium either caused by the content itself or due to the way it is conveyed when it comes to its syntactic and lexical features. We hereby propose a supervised model that can predict the likelihood of confusion an input text excerpt can cause on the learner. To achieve this, we performed syntactic and lexical analyses over 300 text excerpts and collected 5 confusion level classifications (0 – 6) per excerpt from 51 annotators to use their respective means as labels. These examples that compose the dataset were collected from random presentations transcripts across various fields of knowledge. The learning model was trained with this data with the results being included in the body of the paper.
This model allows the design of clearer scripts of offline presentations and similar approaches and we expect that it improves the efficiency of these speeches. While this model is applied to this specific case, we hope to pave the way to generalize this approach to other contexts where clearness of text is critical, such as the scripts of MOOCs or academic abstracts.info:eu-repo/semantics/acceptedVersio
Affective learning: improving engagement and enhancing learning with affect-aware feedback
This paper describes the design and ecologically valid evaluation of a learner model that lies at the heart of an intelligent learning environment called iTalk2Learn. A core objective of the learner model is to adapt formative feedback based on students’ affective states. Types of adaptation include what type of formative feedback should be provided and how it should be presented. Two Bayesian networks trained with data gathered in a series of Wizard-of-Oz studies are used for the adaptation process. This paper reports results from a quasi-experimental evaluation, in authentic classroom settings, which compared a version of iTalk2Learn that adapted feedback based on students’ affective states as they were talking aloud with the system (the affect condition) with one that provided feedback based only on the students’ performance (the non-affect condition). Our results suggest that affect-aware support contributes to reducing boredom and off-task behavior, and may have an effect on learning. We discuss the internal and ecological validity of the study, in light of pedagogical considerations that informed the design of the two conditions. Overall, the results of the study have implications both for the design of educational technology and for classroom approaches to teaching, because they highlight the important role that affect-aware modelling plays in the adaptive delivery of formative feedback to support learning
Towards higher sense of presence: a 3D virtual environment adaptable to confusion and engagement
Virtual Reality scenarios where emitters convey information to receptors can be used as a tool for distance learning and to enable virtual visits to company physical headquarters. However, immersive Virtual Reality setups usually require visualization interfaces such as Head-mounted Displays, Powerwalls or CAVE systems, supported by interaction devices (Microsoft Kinect, Wii Motion, among others), that foster natural interaction but are often inaccessible to users. We propose a virtual presentation scenario, supported by a framework, that provides emotion-driven interaction through ubiquitous devices. An experiment with 3 conditions was designed involving: a control condition; a less confusing text script based on its lexical, syntactical, and bigram features; and a third condition where an adaptive lighting system dynamically acted based on the user’s engagement. Results show that users exposed to the less confusing script reported higher sense of presence, albeit without statistical significance. Users from the last condition reported lower sense of presence, which rejects our hypothesis without statistical significance. We theorize that, as the presentation was given orally and the adaptive lighting system impacts the visual channel, this conflict may have overloaded the users’ cognitive capacity and thus reduced available resources to address the presentation content.info:eu-repo/semantics/publishedVersio
Virtual environments promoting interaction
Virtual reality (VR) has been widely researched in the academic environment and is now breaking
into the industry. Regular companies do not have access to this technology as a collaboration tool
because these solutions usually require specific devices that are not at hand of the common user in
offices. There are other collaboration platforms based on video, speech and text, but VR allows
users to share the same 3D space. In this 3D space there can be added functionalities or information
that in a real-world environment would not be possible, something intrinsic to VR.
This dissertation has produced a 3D framework that promotes nonverbal communication. It
plays a fundamental role on human interaction and is mostly based on emotion. In the academia,
confusion is known to influence learning gains if it is properly managed. We designed a study to
evaluate how lexical, syntactic and n-gram features influence perceived confusion and found results (not statistically significant) that point that it is possible to build a machine learning model
that can predict the level of confusion based on these features. This model was used to manipulate
the script of a given presentation, and user feedback shows a trend that by manipulating these
features and theoretically lowering the level of confusion on text not only drops the reported confusion, as it also increases reported sense of presence. Another contribution of this dissertation
comes from the intrinsic features of a 3D environment where one can carry actions that in a real
world are not possible. We designed an automatic adaption lighting system that reacts to the perceived user’s engagement. This hypothesis was partially refused as the results go against what we
hypothesized but do not have statistical significance.
Three lines of research may stem from this dissertation. First, there can be more complex features to train the machine learning model such as syntax trees. Also, on an Intelligent Tutoring
System this could adjust the avatar’s speech in real-time if fed by a real-time confusion detector.
When going for a social scenario, the set of basic emotions is well-adjusted and can enrich them.
Facial emotion recognition can extend this effect to the avatar’s body to fuel this synchronization
and increase the sense of presence. Finally, we based this dissertation on the premise of using
ubiquitous devices, but with the rapid evolution of technology we should consider that new devices
will be present on offices. This opens new possibilities for other modalities.A Realidade Virtual (RV) tem sido alvo de investigação extensa na academia e tem vindo a entrar
na indústria. Empresas comuns não têm acesso a esta tecnologia como uma ferramenta de colaboração porque estas soluções necessitam de dispositivos especÃficos que não estão disponÃveis para
o utilizador comum em escritório. Existem outras plataformas de colaboração baseadas em vÃdeo,
voz e texto, mas a RV permite partilhar o mesmo espaço 3D. Neste espaço podem existir funcionalidades ou informação adicionais que no mundo real não seria possÃvel, algo intrÃnseco à RV.
Esta dissertação produziu uma framework 3D que promove a comunicação não-verbal que tem
um papel fundamental na interação humana e é principalmente baseada em emoção. Na academia
é sabido que a confusão influencia os ganhos na aprendizagem quando gerida adequadamente.
Desenhámos um estudo para avaliar como as caracterÃsticas lexicais, sintáticas e n-gramas influenciam a confusão percecionada. ConstruÃmos e testámos um modelo de aprendizagem automática
que prevê o nÃvel de confusão baseado nestas caracterÃsticas, produzindo resultados não estatisticamente significativos que suportam esta hipótese. Este modelo foi usado para manipular o texto
de uma apresentação e o feedback dos utilizadores demonstra uma tendência na diminuição do
nÃvel de confusão reportada no texto e aumento da sensação de presença. Outra contribuição vem
das caracterÃsticas intrÃnsecas de um ambiente 3D onde se podem executar ações que no mundo
real não seriam possÃveis. Desenhámos um sistema automático de iluminação adaptativa que reage
ao engagement percecionado do utilizador. Os resultados não suportam o que hipotetizámos mas
não têm significância estatÃstica, pelo que esta hipótese foi parcialmente rejeitada.
Três linhas de investigação podem provir desta dissertação. Primeiro, criar caracterÃsticas mais
complexas para treinar o modelo de aprendizagem, tais como árvores de sintaxe. Além disso, num
Intelligent Tutoring System este modelo poderá ajustar o discurso do avatar em tempo real, alimentado por um detetor de confusão. As emoções básicas ajustam-se a um cenário social e podem
enriquecê-lo. A emoção expressada facialmente pode estender este efeito ao corpo do avatar para
alimentar o sincronismo social e aumentar a sensação de presença. Finalmente, baseámo-nos em
dispositivos ubÃquos, mas com a rápida evolução da tecnologia, podemos considerar que novos
dispositivos irão estar presentes em escritórios. Isto abre possibilidades para novas modalidades
Recognising Complex Mental States from Naturalistic Human-Computer Interactions
New advances in computer vision techniques will revolutionize the way we interact with computers, as they, together with other improvements, will help us build machines that understand us better. The face is the main non-verbal channel for human-human communication and contains valuable information about emotion, mood, and mental state. Affective computing researchers have investigated widely how facial expressions can be used for automatically recognizing affect and mental states. Nowadays, physiological signals can be measured by video-based techniques, which can also be utilised for emotion detection. Physiological signals, are an important indicator of internal feelings, and are more robust against social masking. This thesis focuses on computer vision techniques to detect facial expression and physiological changes for recognizing non-basic and natural emotions during human-computer interaction. It covers all stages of the research process from data acquisition, integration and application. Most previous studies focused on acquiring data from prototypic basic emotions acted out under laboratory conditions. To evaluate the proposed method under more practical conditions, two different scenarios were used for data collection. In the first scenario, a set of controlled stimulus was used to trigger the user’s emotion. The second scenario aimed at capturing more naturalistic emotions that might occur during a writing activity. In the second scenario, the engagement level of the participants with other affective states was the target of the system. For the first time this thesis explores how video-based physiological measures can be used in affect detection. Video-based measuring of physiological signals is a new technique that needs more improvement to be used in practical applications. A machine learning approach is proposed and evaluated to improve the accuracy of heart rate (HR) measurement using an ordinary camera during a naturalistic interaction with computer
Recognising Complex Mental States from Naturalistic Human-Computer Interactions
New advances in computer vision techniques will revolutionize the way we interact with computers, as they, together with other improvements, will help us build machines that understand us better. The face is the main non-verbal channel for human-human communication and contains valuable information about emotion, mood, and mental state. Affective computing researchers have investigated widely how facial expressions can be used for automatically recognizing affect and mental states. Nowadays, physiological signals can be measured by video-based techniques, which can also be utilised for emotion detection. Physiological signals, are an important indicator of internal feelings, and are more robust against social masking. This thesis focuses on computer vision techniques to detect facial expression and physiological changes for recognizing non-basic and natural emotions during human-computer interaction. It covers all stages of the research process from data acquisition, integration and application. Most previous studies focused on acquiring data from prototypic basic emotions acted out under laboratory conditions. To evaluate the proposed method under more practical conditions, two different scenarios were used for data collection. In the first scenario, a set of controlled stimulus was used to trigger the user’s emotion. The second scenario aimed at capturing more naturalistic emotions that might occur during a writing activity. In the second scenario, the engagement level of the participants with other affective states was the target of the system. For the first time this thesis explores how video-based physiological measures can be used in affect detection. Video-based measuring of physiological signals is a new technique that needs more improvement to be used in practical applications. A machine learning approach is proposed and evaluated to improve the accuracy of heart rate (HR) measurement using an ordinary camera during a naturalistic interaction with computer
Supporting Children’s Metacognition with a Facial Emotion Recognition based Intelligent Tutor System
The present study aims to investigate the relationship between emotions experienced during learning and metacognition in typically developing (TD) children and those with autism spectrum disorder (ASD). This will assist us in using machine learning (ML) to develop a facial emotion recognition (FER) based intelligent tutor system (ITS) to support children’s metacognitive monitoring process in order to enhance their learning outcomes. In this paper, we first report the results of our preliminary research, which utilized an ML-based FER algorithm to detect four spontaneous epistemic emotions (i.e., neutral, confused, frustrated, and boredom) and six spontaneous basic emotions (i.e., anger, disgust, fear, happiness, sadness, and surprise). Subsequently, we adapted an application (‘BrainHood’) to create the ‘Meta-BrainHood’, that embedded our proposed ML-based FER algorithm to examine the relationship between facial emotion expressions and metacognitive monitoring performance in TD children and those with ASD. Finally, we outline the future steps in our research, which adopts the outcomes of the first two steps to construct an ITS to improve children’s metacognitive monitoring performance and learning outcomes.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Human Information Communication Desig
Automatic Context-Driven Inference of Engagement in HMI: A Survey
An integral part of seamless human-human communication is engagement, the
process by which two or more participants establish, maintain, and end their
perceived connection. Therefore, to develop successful human-centered
human-machine interaction applications, automatic engagement inference is one
of the tasks required to achieve engaging interactions between humans and
machines, and to make machines attuned to their users, hence enhancing user
satisfaction and technology acceptance. Several factors contribute to
engagement state inference, which include the interaction context and
interactants' behaviours and identity. Indeed, engagement is a multi-faceted
and multi-modal construct that requires high accuracy in the analysis and
interpretation of contextual, verbal and non-verbal cues. Thus, the development
of an automated and intelligent system that accomplishes this task has been
proven to be challenging so far. This paper presents a comprehensive survey on
previous work in engagement inference for human-machine interaction, entailing
interdisciplinary definition, engagement components and factors, publicly
available datasets, ground truth assessment, and most commonly used features
and methods, serving as a guide for the development of future human-machine
interaction interfaces with reliable context-aware engagement inference
capability. An in-depth review across embodied and disembodied interaction
modes, and an emphasis on the interaction context of which engagement
perception modules are integrated sets apart the presented survey from existing
surveys
- …