105 research outputs found

    Towards the Use of Dialog Systems to Facilitate Inclusive Education

    Get PDF
    Continuous advances in the development of information technologies have currently led to the possibility of accessing learning contents from anywhere, at anytime, and almost instantaneously. However, accessibility is not always the main objective in the design of educative applications, specifically to facilitate their adoption by disabled people. Different technologies have recently emerged to foster the accessibility of computers and new mobile devices, favoring a more natural communication between the student and the developed educative systems. This chapter describes innovative uses of multimodal dialog systems in education, with special emphasis in the advantages that they provide for creating inclusive applications and learning activities

    General and Interval Type-2 Fuzzy Face-Space Approach to Emotion Recognition

    Get PDF
    Facial expressions of a person representing similar emotion are not always unique. Naturally, the facial features of a subject taken from different instances of the same emotion have wide variations. In the presence of two or more facial features, the variation of the attributes together makes the emotion recognition problem more complicated. This variation is the main source of uncertainty in the emotion recognition problem, which has been addressed here in two steps using type-2 fuzzy sets. First a type-2 fuzzy face space is constructed with the background knowledge of facial features of different subjects for different emotions. Second, the emotion of an unknown facial expression is determined based on the consensus of the measured facial features with the fuzzy face space. Both interval and general type-2 fuzzy sets (GT2FS) have been used separately to model the fuzzy face space. The interval type-2 fuzzy set (IT2FS) involves primary membership functions for m facial features obtained from n-subjects, each having l-instances of facial expressions for a given emotion. The GT2FS in addition to employing the primary membership functions mentioned above also involves the secondary memberships for individual primary membership curve, which has been obtained here by formulating and solving an optimization problem. The optimization problem here attempts to minimize the difference between two decoded signals: the first one being the type-1 defuzzification of the average primary membership functions obtained from the n-subjects, while the second one refers to the type-2 defuzzified signal for a given primary membership function with secondary memberships as unknown. The uncertainty management policy adopted using GT2FS has resulted in a classification accuracy of 98.333% in comparison to 91.667% obtained by its interval type-2 counterpart. A small improvement (approximately 2.5%) in classification accuracy by IT2FS has been attained by pre-processing measurements using - he well-known interval approach

    Attention-based Approaches for Text Analytics in Social Media and Automatic Summarization

    Full text link
    [ES] Hoy en día, la sociedad tiene acceso y posibilidad de contribuir a grandes cantidades de contenidos presentes en Internet, como redes sociales, periódicos online, foros, blogs o plataformas de contenido multimedia. Todo este tipo de medios han tenido, durante los últimos años, un impacto abrumador en el día a día de individuos y organizaciones, siendo actualmente medios predominantes para compartir, debatir y analizar contenidos online. Por este motivo, resulta de interés trabajar sobre este tipo de plataformas, desde diferentes puntos de vista, bajo el paraguas del Procesamiento del Lenguaje Natural. En esta tesis nos centramos en dos áreas amplias dentro de este campo, aplicadas al análisis de contenido en línea: análisis de texto en redes sociales y resumen automático. En paralelo, las redes neuronales también son un tema central de esta tesis, donde toda la experimentación se ha realizado utilizando enfoques de aprendizaje profundo, principalmente basados en mecanismos de atención. Además, trabajamos mayoritariamente con el idioma español, por ser un idioma poco explorado y de gran interés para los proyectos de investigación en los que participamos. Por un lado, para el análisis de texto en redes sociales, nos enfocamos en tareas de análisis afectivo, incluyendo análisis de sentimientos y detección de emociones, junto con el análisis de la ironía. En este sentido, se presenta un enfoque basado en Transformer Encoders, que consiste en contextualizar \textit{word embeddings} pre-entrenados con tweets en español, para abordar tareas de análisis de sentimiento y detección de ironía. También proponemos el uso de métricas de evaluación como funciones de pérdida, con el fin de entrenar redes neuronales, para reducir el impacto del desequilibrio de clases en tareas \textit{multi-class} y \textit{multi-label} de detección de emociones. Adicionalmente, se presenta una especialización de BERT tanto para el idioma español como para el dominio de Twitter, que tiene en cuenta la coherencia entre tweets en conversaciones de Twitter. El desempeño de todos estos enfoques ha sido probado con diferentes corpus, a partir de varios \textit{benchmarks} de referencia, mostrando resultados muy competitivos en todas las tareas abordadas. Por otro lado, nos centramos en el resumen extractivo de artículos periodísticos y de programas televisivos de debate. Con respecto al resumen de artículos, se presenta un marco teórico para el resumen extractivo, basado en redes jerárquicas siamesas con mecanismos de atención. También presentamos dos instancias de este marco: \textit{Siamese Hierarchical Attention Networks} y \textit{Siamese Hierarchical Transformer Encoders}. Estos sistemas han sido evaluados en los corpora CNN/DailyMail y NewsRoom, obteniendo resultados competitivos en comparación con otros enfoques extractivos coetáneos. Con respecto a los programas de debate, se ha propuesto una tarea que consiste en resumir las intervenciones transcritas de los ponentes, sobre un tema determinado, en el programa "La Noche en 24 Horas". Además, se propone un corpus de artículos periodísticos, recogidos de varios periódicos españoles en línea, con el fin de estudiar la transferibilidad de los enfoques propuestos, entre artículos e intervenciones de los participantes en los debates. Este enfoque muestra mejores resultados que otras técnicas extractivas, junto con una transferibilidad de dominio muy prometedora.[CA] Avui en dia, la societat té accés i possibilitat de contribuir a grans quantitats de continguts presents a Internet, com xarxes socials, diaris online, fòrums, blocs o plataformes de contingut multimèdia. Tot aquest tipus de mitjans han tingut, durant els darrers anys, un impacte aclaparador en el dia a dia d'individus i organitzacions, sent actualment mitjans predominants per compartir, debatre i analitzar continguts en línia. Per aquest motiu, resulta d'interès treballar sobre aquest tipus de plataformes, des de diferents punts de vista, sota el paraigua de l'Processament de el Llenguatge Natural. En aquesta tesi ens centrem en dues àrees àmplies dins d'aquest camp, aplicades a l'anàlisi de contingut en línia: anàlisi de text en xarxes socials i resum automàtic. En paral·lel, les xarxes neuronals també són un tema central d'aquesta tesi, on tota l'experimentació s'ha realitzat utilitzant enfocaments d'aprenentatge profund, principalment basats en mecanismes d'atenció. A més, treballem majoritàriament amb l'idioma espanyol, per ser un idioma poc explorat i de gran interès per als projectes de recerca en els que participem. D'una banda, per a l'anàlisi de text en xarxes socials, ens enfoquem en tasques d'anàlisi afectiu, incloent anàlisi de sentiments i detecció d'emocions, juntament amb l'anàlisi de la ironia. En aquest sentit, es presenta una aproximació basada en Transformer Encoders, que consisteix en contextualitzar \textit{word embeddings} pre-entrenats amb tweets en espanyol, per abordar tasques d'anàlisi de sentiment i detecció d'ironia. També proposem l'ús de mètriques d'avaluació com a funcions de pèrdua, per tal d'entrenar xarxes neuronals, per reduir l'impacte de l'desequilibri de classes en tasques \textit{multi-class} i \textit{multi-label} de detecció d'emocions. Addicionalment, es presenta una especialització de BERT tant per l'idioma espanyol com per al domini de Twitter, que té en compte la coherència entre tweets en converses de Twitter. El comportament de tots aquests enfocaments s'ha provat amb diferents corpus, a partir de diversos \textit{benchmarks} de referència, mostrant resultats molt competitius en totes les tasques abordades. D'altra banda, ens centrem en el resum extractiu d'articles periodístics i de programes televisius de debat. Pel que fa a l'resum d'articles, es presenta un marc teòric per al resum extractiu, basat en xarxes jeràrquiques siameses amb mecanismes d'atenció. També presentem dues instàncies d'aquest marc: \textit{Siamese Hierarchical Attention Networks} i \textit{Siamese Hierarchical Transformer Encoders}. Aquests sistemes s'han avaluat en els corpora CNN/DailyMail i Newsroom, obtenint resultats competitius en comparació amb altres enfocaments extractius coetanis. Pel que fa als programes de debat, s'ha proposat una tasca que consisteix a resumir les intervencions transcrites dels ponents, sobre un tema determinat, al programa "La Noche en 24 Horas". A més, es proposa un corpus d'articles periodístics, recollits de diversos diaris espanyols en línia, per tal d'estudiar la transferibilitat dels enfocaments proposats, entre articles i intervencions dels participants en els debats. Aquesta aproximació mostra millors resultats que altres tècniques extractives, juntament amb una transferibilitat de domini molt prometedora.[EN] Nowadays, society has access, and the possibility to contribute, to large amounts of the content present on the internet, such as social networks, online newspapers, forums, blogs, or multimedia content platforms. These platforms have had, during the last years, an overwhelming impact on the daily life of individuals and organizations, becoming the predominant ways for sharing, discussing, and analyzing online content. Therefore, it is very interesting to work with these platforms, from different points of view, under the umbrella of Natural Language Processing. In this thesis, we focus on two broad areas inside this field, applied to analyze online content: text analytics in social media and automatic summarization. Neural networks are also a central topic in this thesis, where all the experimentation has been performed by using deep learning approaches, mainly based on attention mechanisms. Besides, we mostly work with the Spanish language, due to it is an interesting and underexplored language with a great interest in the research projects we participated in. On the one hand, for text analytics in social media, we focused on affective analysis tasks, including sentiment analysis and emotion detection, along with the analysis of the irony. In this regard, an approach based on Transformer Encoders, based on contextualizing pretrained Spanish word embeddings from Twitter, to address sentiment analysis and irony detection tasks, is presented. We also propose the use of evaluation metrics as loss functions, in order to train neural networks for reducing the impact of the class imbalance in multi-class and multi-label emotion detection tasks. Additionally, a specialization of BERT both for the Spanish language and the Twitter domain, that takes into account inter-sentence coherence in Twitter conversation flows, is presented. The performance of all these approaches has been tested with different corpora, from several reference evaluation benchmarks, showing very competitive results in all the tasks addressed. On the other hand, we focused on extractive summarization of news articles and TV talk shows. Regarding the summarization of news articles, a theoretical framework for extractive summarization, based on siamese hierarchical networks with attention mechanisms, is presented. Also, we present two instantiations of this framework: Siamese Hierarchical Attention Networks and Siamese Hierarchical Transformer Encoders. These systems were evaluated on the CNN/DailyMail and the NewsRoom corpora, obtaining competitive results in comparison to other contemporary extractive approaches. Concerning the TV talk shows, we proposed a text summarization task, for summarizing the transcribed interventions of the speakers, about a given topic, in the Spanish TV talk shows of the ``La Noche en 24 Horas" program. In addition, a corpus of news articles, collected from several Spanish online newspapers, is proposed, in order to study the domain transferability of siamese hierarchical approaches, between news articles and interventions of debate participants. This approach shows better results than other extractive techniques, along with a very promising domain transferability.González Barba, JÁ. (2021). Attention-based Approaches for Text Analytics in Social Media and Automatic Summarization [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/172245TESI

    Affective Computing

    Get PDF
    This book provides an overview of state of the art research in Affective Computing. It presents new ideas, original results and practical experiences in this increasingly important research field. The book consists of 23 chapters categorized into four sections. Since one of the most important means of human communication is facial expression, the first section of this book (Chapters 1 to 7) presents a research on synthesis and recognition of facial expressions. Given that we not only use the face but also body movements to express ourselves, in the second section (Chapters 8 to 11) we present a research on perception and generation of emotional expressions by using full-body motions. The third section of the book (Chapters 12 to 16) presents computational models on emotion, as well as findings from neuroscience research. In the last section of the book (Chapters 17 to 22) we present applications related to affective computing

    Human-Robot Interaction architecture for interactive and lively social robots

    Get PDF
    Mención Internacional en el título de doctorLa sociedad está experimentando un proceso de envejecimiento que puede provocar un desequilibrio entre la población en edad de trabajar y aquella fuera del mercado de trabajo. Una de las soluciones a este problema que se están considerando hoy en día es la introducción de robots en multiples sectores, incluyendo el de servicios. Sin embargo, para que esto sea una solución viable, estos robots necesitan ser capaces de interactuar con personas de manera satisfactoria, entre otras habilidades. En el contexto de la aplicación de robots sociales al cuidado de mayores, esta tesis busca proporcionar a un robot social las habilidades necesarias para crear interacciones entre humanos y robots que sean naturales. En concreto, esta tesis se centra en tres problemas que deben ser solucionados: (i) el modelado de interacciones entre humanos y robots; (ii) equipar a un robot social con las capacidades expresivas necesarias para una comunicación satisfactoria; y (iii) darle al robot una apariencia vivaz. La solución al problema de modelado de diálogos presentada en esta tesis propone diseñar estos diálogos como una secuencia de elementos atómicos llamados Actos Comunicativos (CAs, por sus siglas en inglés). Se pueden parametrizar en tiempo de ejecución para completar diferentes objetivos comunicativos, y están equipados con mecanismos para manejar algunas de las imprecisiones que pueden aparecer durante interacciones. Estos CAs han sido identificados a partir de la combinación de dos dimensiones: iniciativa (si la tiene el robot o el usuario) e intención (si se pretende obtener o proporcionar información). Estos CAs pueden ser combinados siguiendo una estructura jerárquica para crear estructuras mas complejas que sean reutilizables. Esto simplifica el proceso para crear nuevas interacciones, permitiendo a los desarrolladores centrarse exclusivamente en diseñar el flujo del diálogo, sin tener que preocuparse de reimplementar otras funcionalidades que tienen que estar presentes en todas las interacciones (como el manejo de errores, por ejemplo). La expresividad del robot está basada en el uso de una librería de gestos, o expresiones, multimodales predefinidos, modelados como estructuras similares a máquinas de estados. El módulo que controla la expresividad recibe peticiones para realizar dichas expresiones, planifica su ejecución para evitar cualquier conflicto que pueda aparecer, las carga, y comprueba que su ejecución se complete sin problemas. El sistema es capaz también de generar estas expresiones en tiempo de ejecución a partir de una lista de acciones unimodales (como decir una frase, o mover una articulación). Una de las características más importantes de la arquitectura de expresividad propuesta es la integración de una serie de métodos de modulación que pueden ser usados para modificar los gestos del robot en tiempo de ejecución. Esto permite al robot adaptar estas expresiones en base a circunstancias particulares (aumentando al mismo tiempo la variabilidad de la expresividad del robot), y usar un número limitado de gestos para mostrar diferentes estados internos (como el estado emocional). Teniendo en cuenta que ser reconocido como un ser vivo es un requisito para poder participar en interacciones sociales, que un robot social muestre una apariencia de vivacidad es un factor clave en interacciones entre humanos y robots. Para ello, esta tesis propone dos soluciones. El primer método genera acciones a través de las diferentes interfaces del robot a intervalos. La frecuencia e intensidad de estas acciones están definidas en base a una señal que representa el pulso del robot. Dicha señal puede adaptarse al contexto de la interacción o al estado interno del robot. El segundo método enriquece las interacciones verbales entre el robot y el usuario prediciendo los gestos no verbales más apropiados en base al contenido del diálogo y a la intención comunicativa del robot. Un modelo basado en aprendizaje automático recibe la transcripción del mensaje verbal del robot, predice los gestos que deberían acompañarlo, y los sincroniza para que cada gesto empiece en el momento preciso. Este modelo se ha desarrollado usando una combinación de un encoder diseñado con una red neuronal Long-Short Term Memory, y un Conditional Random Field para predecir la secuencia de gestos que deben acompañar a la frase del robot. Todos los elementos presentados conforman el núcleo de una arquitectura de interacción humano-robot modular que ha sido integrada en múltiples plataformas, y probada bajo diferentes condiciones. El objetivo central de esta tesis es contribuir al área de interacción humano-robot con una nueva solución que es modular e independiente de la plataforma robótica, y que se centra en proporcionar a los desarrolladores las herramientas necesarias para desarrollar aplicaciones que requieran interacciones con personas.Society is experiencing a series of demographic changes that can result in an unbalance between the active working and non-working age populations. One of the solutions considered to mitigate this problem is the inclusion of robots in multiple sectors, including the service sector. But for this to be a viable solution, among other features, robots need to be able to interact with humans successfully. This thesis seeks to endow a social robot with the abilities required for a natural human-robot interactions. The main objective is to contribute to the body of knowledge on the area of Human-Robot Interaction with a new, platform-independent, modular approach that focuses on giving roboticists the tools required to develop applications that involve interactions with humans. In particular, this thesis focuses on three problems that need to be addressed: (i) modelling interactions between a robot and an user; (ii) endow the robot with the expressive capabilities required for a successful communication; and (iii) endow the robot with a lively appearance. The approach to dialogue modelling presented in this thesis proposes to model dialogues as a sequence of atomic interaction units, called Communicative Acts, or CAs. They can be parametrized in runtime to achieve different communicative goals, and are endowed with mechanisms oriented to solve some of the uncertainties related to interaction. Two dimensions have been used to identify the required CAs: initiative (the robot or the user), and intention (either retrieve information or to convey it). These basic CAs can be combined in a hierarchical manner to create more re-usable complex structures. This approach simplifies the creation of new interactions, by allowing developers to focus exclusively on designing the flow of the dialogue, without having to re-implement functionalities that are common to all dialogues (like error handling, for example). The expressiveness of the robot is based on the use of a library of predefined multimodal gestures, or expressions, modelled as state machines. The module managing the expressiveness receives requests for performing gestures, schedules their execution in order to avoid any possible conflict that might arise, loads them, and ensures that their execution goes without problems. The proposed approach is also able to generate expressions in runtime based on a list of unimodal actions (an utterance, the motion of a limb, etc...). One of the key features of the proposed expressiveness management approach is the integration of a series of modulation techniques that can be used to modify the robot’s expressions in runtime. This would allow the robot to adapt them to the particularities of a given situation (which would also increase the variability of the robot expressiveness), and to display different internal states with the same expressions. Considering that being recognized as a living being is a requirement for engaging in social encounters, the perception of a social robot as a living entity is a key requirement to foster human-robot interactions. In this dissertation, two approaches have been proposed. The first method generates actions for the different interfaces of the robot at certain intervals. The frequency and intensity of these actions are defined by a signal that represents the pulse of the robot, which can be adapted to the context of the interaction or the internal state of the robot. The second method enhances the robot’s utterance by predicting the appropriate non-verbal expressions that should accompany them, according to the content of the robot’s message, as well as its communicative intention. A deep learning model receives the transcription of the robot’s utterances, predicts which expressions should accompany it, and synchronizes them, so each gesture selected starts at the appropriate time. The model has been developed using a combination of a Long-Short Term Memory network-based encoder and a Conditional Random Field for generating a sequence of gestures that are combined with the robot’s utterance. All the elements presented above conform the core of a modular Human-Robot Interaction architecture that has been integrated in multiple platforms, and tested under different conditions.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Fernando Torres Medina.- Secretario: Concepción Alicia Monje Micharet.- Vocal: Amirabdollahian Farshi

    Socially assistive robots : the specific case of the NAO

    Get PDF
    Numerous researches have studied the development of robotics, especially socially assistive robots (SAR), including the NAO robot. This small humanoid robot has a great potential in social assistance. The NAO robot’s features and capabilities, such as motricity, functionality, and affective capacities, have been studied in various contexts. The principal aim of this study is to gather every research that has been done using this robot to see how the NAO can be used and what could be its potential as a SAR. Articles using the NAO in any situation were found searching PSYCHINFO, Computer and Applied Sciences Complete and ACM Digital Library databases. The main inclusion criterion was that studies had to use the NAO robot. Studies comparing it with other robots or intervention programs were also included. Articles about technical improvements were excluded since they did not involve concrete utilisation of the NAO. Also, duplicates and articles with an important lack of information on sample were excluded. A total of 51 publications (1895 participants) were included in the review. Six categories were defined: social interactions, affectivity, intervention, assisted teaching, mild cognitive impairment/dementia, and autism/intellectual disability. A great majority of the findings are positive concerning the NAO robot. Its multimodality makes it a SAR with potential

    Building Embodied Conversational Agents:Observations on human nonverbal behaviour as a resource for the development of artificial characters

    Get PDF
    "Wow this is so cool!" This is what I most probably yelled, back in the 90s, when my first computer program on our MSX computer turned out to do exactly what I wanted it to do. The program contained the following instruction: COLOR 10(1.1) After hitting enter, it would change the screen color from light blue to dark yellow. A few years after that experience, Microsoft Windows was introduced. Windows came with an intuitive graphical user interface that was designed to allow all people, so also those who would not consider themselves to be experienced computer addicts, to interact with the computer. This was a major step forward in human-computer interaction, as from that point forward no complex programming skills were required anymore to perform such actions as adapting the screen color. Changing the background was just a matter of pointing the mouse to the desired color on a color palette. "Wow this is so cool!". This is what I shouted, again, 20 years later. This time my new smartphone successfully skipped to the next song on Spotify because I literally told my smartphone, with my voice, to do so. Being able to operate your smartphone with natural language through voice-control can be extremely handy, for instance when listening to music while showering. Again, the option to handle a computer with voice instructions turned out to be a significant optimization in human-computer interaction. From now on, computers could be instructed without the use of a screen, mouse or keyboard, and instead could operate successfully simply by telling the machine what to do. In other words, I have personally witnessed how, within only a few decades, the way people interact with computers has changed drastically, starting as a rather technical and abstract enterprise to becoming something that was both natural and intuitive, and did not require any advanced computer background. Accordingly, while computers used to be machines that could only be operated by technically-oriented individuals, they had gradually changed into devices that are part of many people’s household, just as much as a television, a vacuum cleaner or a microwave oven. The introduction of voice control is a significant feature of the newer generation of interfaces in the sense that these have become more "antropomorphic" and try to mimic the way people interact in daily life, where indeed the voice is a universally used device that humans exploit in their exchanges with others. The question then arises whether it would be possible to go even one step further, where people, like in science-fiction movies, interact with avatars or humanoid robots, whereby users can have a proper conversation with a computer-simulated human that is indistinguishable from a real human. An interaction with a human-like representation of a computer that behaves, talks and reacts like a real person would imply that the computer is able to not only produce and understand messages transmitted auditorily through the voice, but also could rely on the perception and generation of different forms of body language, such as facial expressions, gestures or body posture. At the time of writing, developments of this next step in human-computer interaction are in full swing, but the type of such interactions is still rather constrained when compared to the way humans have their exchanges with other humans. It is interesting to reflect on how such future humanmachine interactions may look like. When we consider other products that have been created in history, it sometimes is striking to see that some of these have been inspired by things that can be observed in our environment, yet at the same do not have to be exact copies of those phenomena. For instance, an airplane has wings just as birds, yet the wings of an airplane do not make those typical movements a bird would produce to fly. Moreover, an airplane has wheels, whereas a bird has legs. At the same time, an airplane has made it possible for a humans to cover long distances in a fast and smooth manner in a way that was unthinkable before it was invented. The example of the airplane shows how new technologies can have "unnatural" properties, but can nonetheless be very beneficial and impactful for human beings. This dissertation centers on this practical question of how virtual humans can be programmed to act more human-like. The four studies presented in this dissertation all have the equivalent underlying question of how parts of human behavior can be captured, such that computers can use it to become more human-like. Each study differs in method, perspective and specific questions, but they are all aimed to gain insights and directions that would help further push the computer developments of human-like behavior and investigate (the simulation of) human conversational behavior. The rest of this introductory chapter gives a general overview of virtual humans (also known as embodied conversational agents), their potential uses and the engineering challenges, followed by an overview of the four studies

    Building Embodied Conversational Agents:Observations on human nonverbal behaviour as a resource for the development of artificial characters

    Get PDF
    "Wow this is so cool!" This is what I most probably yelled, back in the 90s, when my first computer program on our MSX computer turned out to do exactly what I wanted it to do. The program contained the following instruction: COLOR 10(1.1) After hitting enter, it would change the screen color from light blue to dark yellow. A few years after that experience, Microsoft Windows was introduced. Windows came with an intuitive graphical user interface that was designed to allow all people, so also those who would not consider themselves to be experienced computer addicts, to interact with the computer. This was a major step forward in human-computer interaction, as from that point forward no complex programming skills were required anymore to perform such actions as adapting the screen color. Changing the background was just a matter of pointing the mouse to the desired color on a color palette. "Wow this is so cool!". This is what I shouted, again, 20 years later. This time my new smartphone successfully skipped to the next song on Spotify because I literally told my smartphone, with my voice, to do so. Being able to operate your smartphone with natural language through voice-control can be extremely handy, for instance when listening to music while showering. Again, the option to handle a computer with voice instructions turned out to be a significant optimization in human-computer interaction. From now on, computers could be instructed without the use of a screen, mouse or keyboard, and instead could operate successfully simply by telling the machine what to do. In other words, I have personally witnessed how, within only a few decades, the way people interact with computers has changed drastically, starting as a rather technical and abstract enterprise to becoming something that was both natural and intuitive, and did not require any advanced computer background. Accordingly, while computers used to be machines that could only be operated by technically-oriented individuals, they had gradually changed into devices that are part of many people’s household, just as much as a television, a vacuum cleaner or a microwave oven. The introduction of voice control is a significant feature of the newer generation of interfaces in the sense that these have become more "antropomorphic" and try to mimic the way people interact in daily life, where indeed the voice is a universally used device that humans exploit in their exchanges with others. The question then arises whether it would be possible to go even one step further, where people, like in science-fiction movies, interact with avatars or humanoid robots, whereby users can have a proper conversation with a computer-simulated human that is indistinguishable from a real human. An interaction with a human-like representation of a computer that behaves, talks and reacts like a real person would imply that the computer is able to not only produce and understand messages transmitted auditorily through the voice, but also could rely on the perception and generation of different forms of body language, such as facial expressions, gestures or body posture. At the time of writing, developments of this next step in human-computer interaction are in full swing, but the type of such interactions is still rather constrained when compared to the way humans have their exchanges with other humans. It is interesting to reflect on how such future humanmachine interactions may look like. When we consider other products that have been created in history, it sometimes is striking to see that some of these have been inspired by things that can be observed in our environment, yet at the same do not have to be exact copies of those phenomena. For instance, an airplane has wings just as birds, yet the wings of an airplane do not make those typical movements a bird would produce to fly. Moreover, an airplane has wheels, whereas a bird has legs. At the same time, an airplane has made it possible for a humans to cover long distances in a fast and smooth manner in a way that was unthinkable before it was invented. The example of the airplane shows how new technologies can have "unnatural" properties, but can nonetheless be very beneficial and impactful for human beings. This dissertation centers on this practical question of how virtual humans can be programmed to act more human-like. The four studies presented in this dissertation all have the equivalent underlying question of how parts of human behavior can be captured, such that computers can use it to become more human-like. Each study differs in method, perspective and specific questions, but they are all aimed to gain insights and directions that would help further push the computer developments of human-like behavior and investigate (the simulation of) human conversational behavior. The rest of this introductory chapter gives a general overview of virtual humans (also known as embodied conversational agents), their potential uses and the engineering challenges, followed by an overview of the four studies
    corecore