Search CORE

4 research outputs found

A Deep Learning Approach for Multi-View Engagement Estimation of Children in a Child-Robot Joint Attention Task

Author: Chalvatzaki Georgia
Hadfield Jack
Khamassi Mehdi
Koutras Petros
Maragos Petros
Tzafestas Costas,
Publication venue: HAL CCSD
Publication date: 03/11/2019
Field of study

International audienceIn this work we tackle the problem of child engagement estimation while children freely interact with a robot in a friendly, room-like environment. We propose a deep-based multi-view solution that takes advantage of recent developments in human pose detection. We extract the child's pose from different RGB-D cameras placed regularly in the room, fuse the results and feed them to a deep neural network trained for classifying engagement levels. The deep network contains a recurrent layer, in order to exploit the rich temporal information contained in the pose data. The resulting method outperforms a number of baseline classifiers, and provides a promising tool for better automatic understanding of a child's attitude, interest and attention while cooperating with a robot. The goal is to integrate this model in next generation social robots as an attention monitoring tool during various Child Robot Interaction (CRI) tasks both for Typically Developed (TD) children and children affected by autism (ASD)

Automatic assessment of nonverbal interaction from smartphone videos

Author: Lehtimäki Laura
Publication venue: Helsingfors universitet
Publication date: 01/01/2019
Field of study

Ei-kielellisten vuorovaikutuspiirteiden arviointi perustuu nykyään pitkälti tarkkailuun, haastatteluihin ja kyselyihin. Määrällisiä menetelmiä ei juuri ole. Uusi teknologia tuo arviointiin uusia mahdollisuuksia, ja siihen perustuvia arviointimenetelmiä kehitetäänkin jatkuvasti. Monet teknologia-avusteisista menetelmistä perustuvat liikkeen tunnistukseen esimerkiksi sensoreiden, kameroiden tai tietokonenäön avulla. Tässä tutkimuksessa selvitetään mahdollisuutta käyttää asennontunnistusalgoritmia ei-kielellisen vuorovaikutuksen arvioimisessa. Tavoitteena on selvittää, pystytäänkö algoritmin avulla tunnistamaan videolta samat vuorovaikutuspiirteet kuin käsin annotoimalla. Tavoitteena on myös tutkia, mikä on paras tapa annotoida videot tämänkaltaisessa tutkimuksessa. Tutkimusmateriaali koostui neljästä videosta, joissa lapsi ja vanhempi puhalsivat saippuakuplia. OpenPose-algoritmilla tunnistettiin lapsen ja vanhemman asennot jokaisesta yksittäisestä kuvasta. Näin saadut koordinaatit käsiteltiin edelleen Matlabilla siten, että niistä laskettiin lapsen ja vanhemman aktiivisuudet ja käsien läheisyys jokaisella ajanhetkellä. Videot annotoitiin kahdella eri tavalla. Perusyksiköistä annotoitiin katseiden suunnat ja saippukuplapurkin käsittely. Vuorovaikutuspiirteistä annotoitiin kommunikointialoitteet, vuorottelu ja jaetun tarkkaavuuden hetket. Algoritmin avulla laskettuja tuloksia vertailtiin annotointeihin visuaalisesti. Kommunikaatioaloitteet ja vuorottelu näkyivät käsien läheisyytenä ja lapsen ja aikuisen aktiivisuuksien vuorotteluna. Vaihtelua käsien läheisyydessä ja aktiivisuuksissa aiheutti kuitenkin moni muukin toiminta kuin vuorovaikutus, joten pelkästään niiden avulla vuorovaikutusta ei voitu erottaa muusta toiminnasta. Kaikki vuorovaikutus ei myöskään liittynyt saippuakuplapurkin käsittelyyn, jolloin se ei näkynyt käsien läheisyytenä. Kuvausjärjestelyistä johtuen algoritmi ei pystynyt tunnistamaan videoista katseen suuntaa, joten myöskään jaetun tarkkaavuuden hetkiä ei pystytty tunnistamaan automaattisesti. Kuvausjärjestelyjä pitäisikin muuttaa niin, että kuvattavien kasvot ovat koko ajan näkyvissä. Tämän kaltaisessa tutkimuksessa kannattaa jatkossa yksittäisten vuorovaikutustekojen arvioimisen sijasta keskittyä laajempiin kokonaisuuksiin kuten synkroniaan vuorovaikutuskumppanien välillä. Paras annotointitapa riippuu tutkimuksen tavoitteesta.The assessment of nonverbal interaction is currently based on observations, interviews and questionnaires. The quantitative methods for assessment of nonverbal interaction are few. Novel technology allows new ways to perform assessment, and new methods are constantly being developed. Many of them are based on movement tracking by sensors, cameras and computer vision. In this study the use of OpenPose, a pose estimation algorithm, was investigated in detection of nonverbal interactional events. The aim was to find out whether the same meaningful interactional events could be found from videos by the algorithm and by human annotators. Another purpose was to find out the best way to annotate the videos in a study like this. The research material consisted of four videos of a child and a parent blowing soap bubbles. The videos were first run by OpenPose to track the poses of the child and the parent frame by frame. The data obtained by the algorithm was further processed by Matlab to extract the activities of the child and the parent, the coupling of the activities and the closeness of child’s and parent’s hands at each time point. The videos were manually annotated in two different ways: Both the basic units, such as the gaze directions and thehandling soap bubble jar, and the interactional events, such as communication initiatives, turn-taking and joint attention, were annotated. The results obtained by the algorithm were visually compared to annotations. The communication initiatives and turn-taking could be seen as peaks in hand closeness and as alternation in activities. However, interaction events were not the only reasons that caused changes in hand closeness and in activities, so they could not be distinguished from other actions solely by these factors. There also existed interaction that was not related to jar handling, which could not be seen from the hand closeness curves. With current recording arrangements, the gaze directions could not be detected by the algorithm and therefore the moments of joint attention could not be determined either. In order to enable the detection of gaze directions in the future studies, the faces of subjects are visible all the time. Distinguishing individual interaction events may not be the best way to assess interaction, and the focus of assessment should be in global units, such as synchrony between interaction partners. The best way to annotate the videos depends on the aim of the study

Helsingin yliopiston digitaalinen arkisto