12 research outputs found

    Human desire inference process and analysis

    Get PDF
    Ubiquitous computing becomes a more fascinating research area since it may offer us an unobtrusive way to help users in their environments that integrate surrounding objects and activities. To date, there have been numerous studies focusing on how user\u27s activity can be identified and predicted, without considering motivation driving an action. However, understanding the underlying motivation is a key to activity analysis. On the other hand, user\u27s desires often generate motivations to engage activities in order to fulfill such desires. Thus, we must study user\u27s desires in order to provide proper services to make the life of users more comfortable. In this study, we present how to design and implement a computational model for inference of user\u27s desire. First, we devised a hierarchical desire inference process based on the Bayesian Belief Networks (BBNs), that considers the affective states, behavior contexts and environmental contexts of a user at given points in time to infer the user\u27s desire. The inferred desire of the highest probability from the BBNs is then used in the subsequent decision making. Second, we extended a probabilistic framework based on the Dynamic Bayesian Belief Networks (DBBNs) which model the observation sequences and information theory. A generic hierarchical probabilistic framework for desire inference is introduced to model the context information and the visual sensory observations. Also, this framework dynamically evolves to account for temporal change in context information along with the change in user\u27s desire. Third, we described what possible factors are relevant to determine user\u27s desire. To achieve this, a full-scale experiment has been conducted. Raw data from sensors were interpreted as context information. We observed the user\u27s activities and get user\u27s emotions as a part of input parameters. Throughout the experiment, a complete analysis was conducted whereas 30 factors were considered and most relevant factors were selectively chosen using correlation coefficient and delta value. Our results show that 11 factors (3 emotions, 7 behaviors and 1 location factor) are relevant to inferring user\u27s desire. Finally, we have established an evaluation environment within the Smart Home Lab to validate our approach. In order to train and verify the desire inference model, multiple stimuli are provided to induce user\u27s desires and pilot data are collected during the experiments. For evaluation, we used the recall and precision methodology, which are basic measures. As a result, average precision was calculated to be 85% for human desire inference and 81% for Think-Aloud

    Influence of Selected Factors on a Counselor\u27s Attention Level to and Counseling Performance with a Virtual Human in a Virtual Counseling Session

    Get PDF
    Virtual humans serve as role-players in social skills training environments simulating situational face-to-face conversations. Previous research indicates that virtual humans in instructional roles can increase a learner\u27s engagement and motivation towards the training. Left unaddressed is if the learner is looking at the virtual human as one would in a human-to-human, face-to-face interaction. Using a modified version of the Emergent Leader Immersive Training Environment (ELITE-Lite), this study tracks visual attention and other behavior of 120 counselor trainees counseling a virtual human role-playing counselee. Specific study elements include: (1) the counselor\u27s level of visual attention toward the virtual counselee; (2) how changes to the counselor\u27s viewpoint may influence the counselor\u27s visual focus; and (3) how levels of the virtual human\u27s behavior may influence the counselor\u27s visual focus. Secondary considerations include aspects of learner performance, acceptance of the virtual human, and impacts of age and rank. Result highlights indicate that counselor visual attentional behavior could be separated into two phases: when the virtual human was speaking and when not speaking. When the virtual human is speaking, the counselor\u27s primary visual attention is on the counselee, but is also split toward pre-scripted responses required for the training session. During the non-speaking phase, the counselor\u27s visual focus was on pre-scripted responses required for training. Some of the other findings included that participants did not consider this to be like a conversation with a human, but they indicated acceptance of the virtual human as a partner with the training environment and they considered the simulation to be a useful experience. Additionally, the research indicates behavior may differ due to age or rank. Future study and design considerations for enhancements to social skills training environments are provided

    A psychology and game theory approach to human–robot cooperation

    Get PDF
    Social robots have great practical potentials to be applied to, for example, education, autism therapy, and commercial settings. However, currently, few commercially available social robots meet our expectations of ‘social agents’ due to their limited social skills and the abilities to maintain smooth and sophisticated rea-life social interactions. Psychological and human-centred perspectives are therefore crucial to be incorporated in for better understanding and development of social robots that can be deployed as assistants and companions to enhance human life quality. In this thesis, I present a research approach that draws together psychological literature, Open Science initiatives, and game theory paradigms, aiming to systemically and structurally investigate the cooperative and social aspects of human–robot interactions. In Chapter 1, the three components of this research approach are illustrated, with the main focus on their relevance and value in more rigorously researching human–robot interactions. Chapter 2 to 4 describe the three empirical studies that I adopted this research approach to examine the roles of contextual factors, personal factors, and robotic factors in human–robot interactions. Specifically, findings in Chapter 2 revealed that people’s cooperative decisions in prisoner’s dilemma games played with the embodied Cozmo robot were not influenced by the incentive structures of the games, contrary to the evidence from interpersonal prisoner’s dilemma games, but their decisions demonstrated a reciprocal (tit-for-tat) pattern in response to the robot opponent. In Chapter 3, we verified that this Cozmo robotic platform can displays highly recognisable emotional expressions to people, and people’s affective empathic might be counterintuitively associated with the emotion contagion effects of Cozmo’s emotional displays. Chapter 4 presents a study that examined the effects of Cozmo’s negative emotional displays on shaping people’s cooperative tendencies in prisoner’s dilemma games. We did not find evidence supporting an interaction between the effects of the robots’ emotions and people’s cooperative predispositions, which was inconsistent with our predictions informed by psychological emotion theories. However, exploratory analyses suggested that people who correctly recognised the Cozmo robots’ sad and angry expressions were less cooperative to the robots in games. Throughout the two studies on prisoner’s dilemma games played with the embodied Cozmo robots, we revealed consistent cooperative tendencies by people that cooperative willingness was the highest at the start of games and gradually decreased as more game rounds were played. In Chapter 5, I summarised the current findings and identified some limitations of these studies. Also, I outlined the future directions in relation to these topics, including further investigations into the generalisability of different robotic platforms and incorporating neurocognitive and qualitative methods for in-depth understanding of mechanisms supporting people’s cooperative willingness towards social robots. Social interactions with robots are highly dynamic and complex, which have brought about some unique challenges to robotic designers and researchers in the relevant fields. The thesis provides a point of departure for understanding cooperative willingness towards small-size social robots at a behavioural level. The research approach and empirical findings presented in the thesis could help enhance reproducibility in human–robot interaction research and more importantly, have practical implications of real-life human–robot cooperation

    Reconeixement afectiu automàtic mitjançant l'anàlisi de paràmetres acústics i lingüístics de la parla espontània

    Get PDF
    Aquesta tesi aborda el reconeixement automàtic d'emocions espontànies basat en l'anàlisi del senyal de veu. Es realitza dins del Grup de recerca de Tecnologies Mèdia d’Enginyeria i Arquitectura La Salle, tenint el seu origen en un moment en el qual existeixen obertes diverses línies de recerca relacionades amb la síntesi afectiva però cap d’elles relacionada amb la seva anàlisi. La motivació és millorar la interacció persona-màquina aportant un mòdul d'anàlisi en l'entrada dels sistemes que permeti, posteriorment, generar una resposta adequada a través dels mòduls de síntesis en la sortida dels mateixos. El focus d'atenció se situa en l'expressivitat afectiva, intentant dotar d'habilitats d'intel•ligència emocional a sistemes d'intel•ligència artificial amb l'objectiu d'aconseguir que la interacció persona-màquina s'assembli, en la major mesura possible, a la comunicació humana. En primer lloc es realitza una anàlisi preliminar basada en locucions gravades en condicions ideals. L'expressivitat vocal en aquest cas és actuada i els enregistraments responen a un guió previ que determina a priori l'etiqueta que descriu el contingut afectiu de les mateixes. Si bé aquest no és el paradigma de la interacció en un entorn realista, aquest primer pas serveix per provar les primeres aproximacions a la parametrització dels corpus, els mètodes de selecció de paràmetres i la seva utilitat en l'optimització dels procediments, així com la viabilitat de considerar el sistema de reconeixement afectiu com un exercici de classificació categòrica. Així mateix, permet comparar els resultats obtinguts en aquest escenari amb els que s'obtenen posteriorment en l'escenari realista. Si bé pot considerar-se que la utilitat d'un marc de treball com l'aquí proposat manca d'interès més enllà de l’exercici de comprovació citat, en aquesta tesi es proposa un sistema basat en aquest plantejament la finalitat del qual és la validació automàtica d'un corpus de veu expressiva destinat a síntesi, ja que en síntesi sí és necessari que el corpus estigui gravat en condicions òptimes posat perquè serà emprat per a la generació de noves locucions. En segon lloc la tesi aprofundeix en l'anàlisi del corpus FAU Aibo, un corpus multilocutor de veu expressiva espontània gravat en alemany a partir d'interaccions naturals d'un grup de nens i nenes amb un robot dotat d'un micròfon. En aquest cas el plantejament és completament diferent a l'anterior partint de la definició del propi corpus, en el qual les locucions no responen a un guió previ i les etiquetes afectives s'assignen posteriorment a partir de l'avaluació subjectiva de les mateixes. Així mateix, el grau d'expressivitat emocional d'aquestes locucions és inferior al de les gravades per un actor o una actriu perquè que són espontànies i les emocions, atès que es generen de forma natural, no responen necessàriament a una definició prototípica. Tot això sense considerar que les condicions d'enregistrament no són les mateixes que les que s'obtindrien en un estudi d'enregistrament professional. En aquest escenari els resultats són molt diferents als obtinguts en l'escenari anterior raó per la qual es fa necessari un estudi més detallat. En aquest sentit es plantegen dues parametritzacions, una a nivell acústic i una altra a nivell lingüístic, ja que la segona podria no veure's tan afectada pels elements que poden degradar la primera, tals com a soroll o altres artefactes. Es proposen diferents sistemes de classificació de complexitat variable malgrat que, sovint, els sistemes més senzills produeixen resultats adequats. També es proposen diferents agrupacions de paràmetres intentant aconseguir un conjunt de dades el més petit possible que sigui capaç de dur a terme un reconeixement afectiu automàtic de forma eficaç. Els resultats obtinguts en l'anàlisi de les expressions espontànies posen de manifest la complexitat del problema tractat i es corresponen amb valors inferiors als obtinguts a partir de corpus gravats en condicions ideals. No obstant això, els esquemes proposats aconsegueixen obtenir resultats que superen els publicats a data d’avui en estudis realitzats en condicions anàlogues i obren, per tant, la porta a recerques futures en aquest àmbit.Esta tesis aborda el reconocimiento automático de emociones espontáneas basado en el análisis de la señal de voz. Se realiza dentro del Grup de recerca de Tecnologies Mèdia de Enginyeria i Arquitectura La Salle, teniendo su origen en un momento en el que existen abiertas varias líneas de investigación relacionadas con la síntesis afectiva pero ninguna relacionada con su análisis. La motivación es mejorar la interacción persona-máquina aportando un módulo de análisis en la entrada de los sistemas que permita, posteriormente, generar una respuesta adecuada a través de los módulos de síntesis en la salida de los mismos. El centro de atención se sitúa en la expresividad afectiva, intentando dotar de habilidades de inteligencia emocional a sistemas de inteligencia artificial con el objetivo de lograr que la interacción persona-máquina se asemeje, en la mayor medida posible, a la comunicación humana. En primer lugar se realiza un análisis preliminar basado en locuciones grabadas en condiciones ideales. La expresividad vocal en este caso es actuada y las grabaciones responden a un guion previo que determina a priori la etiqueta que describe el contenido afectivo de las mismas. Si bien este no es el paradigma de la interacción en un entorno realista, este primer paso sirve para probar las primeras aproximaciones a la parametrización de los corpus, los métodos de selección de parámetros y su utilidad en la optimización de los procedimientos, así como la viabilidad de considerar el sistema de reconocimiento afectivo como un ejercicio de clasificación categórica. Asimismo, permite comparar los resultados obtenidos en este escenario con los que se obtienen posteriormente en el escenario realista. Si bien pudiera considerarse que la utilidad de un marco de trabajo como el aquí propuesto carece de interés más allá del mero ejercicio de comprobación citado, en esta tesis se propone un sistema basado en este planteamiento cuya finalidad es la validación automática de un corpus de voz expresiva destinado a síntesis, ya que en síntesis sí es necesario que el corpus esté grabado en condiciones óptimas puesto que será empleado para la generación de nuevas locuciones. En segundo lugar la tesis profundiza en el análisis del corpus FAU Aibo, un corpus multilocutor de voz expresiva espontánea grabado en alemán a partir de interacciones naturales de un grupo de niños y niñas con un robot dotado de un micrófono. En este caso el planteamiento es completamente distinto al anterior partiendo de la definición del propio corpus, en el que las locuciones no responden a un guion previo y las etiquetas afectivas se asignan posteriormente a partir de la evaluación subjetiva de las mismas. Asimismo, el grado de expresividad emocional de estas locuciones es inferior al de las grabadas por un actor o una actriz en tanto que son espontáneas y las emociones, dado que se generan de forma natural, no responden necesariamente a una definición prototípica. Todo ello sin considerar que las condiciones de grabación no son las mismas que las que se obtendrían en un estudio de grabación profesional. En este escenario los resultados son muy diferentes a los obtenidos en el escenario anterior por lo que se requiere un estudio más detallado. En este sentido se plantean dos parametrizaciones, una a nivel acústico y otra a nivel lingüístico, ya que la segunda podría no verse tan afectada por los elementos que pueden degradar la primera, tales como ruido u otros artefactos. Se proponen distintos sistemas de clasificación de complejidad variable a pesar de que, a menudo, los sistemas más sencillos producen resultados buenos. También se proponen distintas agrupaciones de parámetros intentando conseguir un conjunto de datos lo más pequeño posible que sea capaz de llevar a cabo un reconocimiento afectivo automático de forma eficaz. Los resultados obtenidos en el análisis de las expresiones espontáneas ponen de manifiesto la complejidad del problema tratado y se corresponden con valores inferiores a los obtenidos a partir de corpus grabados en condiciones ideales. Sin embargo, los esquemas propuestos logran obtener resultados que superan los publicados hasta la fecha en estudios realizados en condiciones análogas y abren, por lo tanto, la puerta a investigaciones futuras en este ámbito.The topic of this thesis is about automatic spontaneous emotion recognition from the analysis of the speech signal. It is carried out in the Grup de recerca de Tecnologies Mèdia of Enginyeria i Arquitectura La Salle, and it was started when several research lines related to the synthesis of emotions were in progress but no one related to its analysis. The motivation is to improve human-machine interaction by developing an analysis module to be adapted as an input to the devices able to generate an appropriate answer at the output through their synthesis modules. The highlight is the expression of emotion, trying to give emotional intelligence skills to systems of artificial intelligence. The main goal is to make human-machine interaction more similar to human communication. First, we carried out a preliminary analysis of utterances recorded under ideal conditions. Vocal expression was, in this case, acted and the recordings followed a script which determined the descriptive label of their emotional content. Although this was not the paradigm of interaction in a realistic scenario, this previous step was useful to test the first approaches to parameterisation of corpora, feature selection methods and their utility optimizing the proposed procedures, and to determine whether the consideration of the emotion recognition problem as a categorical classification exercise is viable. Moreover, it allowed the comparison of the results in this scenario with the results obtained in the realistic environment. This framework can be useful in other contexts, additionally to this comparison utility. In this thesis we propose a system based on it with the goal of validating automatically an expressive speech corpus for synthesis. In the synthesis field, corpora must be recorded under real conditions to create new speech utterances. Second, we present an analysis of the FAU Aibo corpus, a multispeaker corpus of emotional spontaneous speech recorded in German from the interaction of a group of children with a robot with a microphone. In this case the approach was different because of the definition of the corpus. The recordings of the FAU Aibo corpus did not follow a script and the emotion category labels were assigned after a subjective evaluation process. Moreover, the emotional content of these recordings was lower than in those recorded by actors because of their spontaneity and emotions were not prototypical because they were generated naturally, not following a script. Furthermore, recording conditions were not the same that in a professional recording studio. In this scenario, results were very different to those obtained in the previous one. For this reason a more accurate analysis was required. In this sense we used two parameterisations, adding linguistic parameters to the acoustic information because the first one could be more robust to noise or some other artefacts than the second one. We considered several classifiers of different complexity although, often, simple systems get the better results. Moreover, we defined several sets of features trying to get a reduced set of data able to work efficiently in the automatic emotion recognition task. Results related to the analysis of the spontaneous emotions confirmed the complexity of the problem and revealed lower values than those associated to the corpus recorded under ideal conditions. However, the schemas got better results than those published so far in works carried out under similar conditions. This opens a door to future research in this area
    corecore