186 research outputs found

    WATCHING PEOPLE: ALGORITHMS TO STUDY HUMAN MOTION AND ACTIVITIES

    Get PDF
    Nowadays human motion analysis is one of the most active research topics in Computer Vision and it is receiving an increasing attention from both the industrial and scientific communities. The growing interest in human motion analysis is motivated by the increasing number of promising applications, ranging from surveillance, human–computer interaction, virtual reality to healthcare, sports, computer games and video conferencing, just to name a few. The aim of this thesis is to give an overview of the various tasks involved in visual motion analysis of the human body and to present the issues and possible solutions related to it. In this thesis, visual motion analysis is categorized into three major areas related to the interpretation of human motion: tracking of human motion using virtual pan-tilt-zoom (vPTZ) camera, recognition of human motions and human behaviors segmentation. In the field of human motion tracking, a virtual environment for PTZ cameras (vPTZ) is presented to overcame the mechanical limitations of PTZ cameras. The vPTZ is built on equirectangular images acquired by 360° cameras and it allows not only the development of pedestrian tracking algorithms but also the comparison of their performances. On the basis of this virtual environment, three novel pedestrian tracking algorithms for 360° cameras were developed, two of which adopt a tracking-by-detection approach while the last adopts a Bayesian approach. The action recognition problem is addressed by an algorithm that represents actions in terms of multinomial distributions of frequent sequential patterns of different length. Frequent sequential patterns are series of data descriptors that occur many times in the data. The proposed method learns a codebook of frequent sequential patterns by means of an apriori-like algorithm. An action is then represented with a Bag-of-Frequent-Sequential-Patterns approach. In the last part of this thesis a methodology to semi-automatically annotate behavioral data given a small set of manually annotated data is presented. The resulting methodology is not only effective in the semi-automated annotation task but can also be used in presence of abnormal behaviors, as demonstrated empirically by testing the system on data collected from children affected by neuro-developmental disorders

    Spatial release from masking in children with and without auditory processing disorder in real and virtual auditory environments

    Get PDF
    Auditory Processing Disorder (APD) is a developmental disorder characterised by difficulties in listening to speech-in-noise despite normal audiometric thresholds. It is still poorly understood and much disputed and there is a need for better diagnostic tools. One promising finding is that some children referred for APD assessment have a reduced spatial release from masking (SRM). Current clinical tests measure SRM in virtual auditory environments created from head-related transfer functions (HRTFs) of a standardised adult head. Adults and children, however, have different head dimensions and mismatched HRTFs are known to affect aspects of binaural hearing like localisation. There has been little research on HRTFs in children and it is unclear whether a large mismatch can impact speech perception, especially for children with APD who have difficulties with accurately processing auditory information. In this project, we examined the effect of nonindividualised virtual auditory environments on the SRM in adults and children with and without APD. The first study with normal-hearing adults compared environments created from individually measured HRTFs and two nonindividualised sets of HRTFs to a real anechoic environment. Speech reception thresholds (SRTs) were measured for target sentences at 0° and two symmetric speech maskers at 0° or ±90° azimuth. No significant effect of auditory environment on SRTs and SRM could be observed. A larger study was then conducted with APD and typically-developing children aged 7 to 12 years. Individual HRTFs were measured for each child. The SRM was measured in environments created from these individualised HRTFs or artificial head HRTFs and in the real anechoic environment. To assess the influence of spectral cues, SRTs were also measured for HRTFs from a spherical head model that only contains interaural time and level differences. Additionally, the study included an extended high-frequency audiogram, a receptive language test and two parental questionnaires. The SRTs of children with APD were worse than those of typically-developing children in all conditions but SRMs were similar. Only small differences in SRTs were found across environments, mainly for the spherical head HRTFs. SRTs in children were higher than in adults but improved with age. APD children also had higher hearing thresholds and performed worse in the language test

    Automatic assessment of nonverbal interaction from smartphone videos

    Get PDF
    Ei-kielellisten vuorovaikutuspiirteiden arviointi perustuu nykyään pitkälti tarkkailuun, haastatteluihin ja kyselyihin. Määrällisiä menetelmiä ei juuri ole. Uusi teknologia tuo arviointiin uusia mahdollisuuksia, ja siihen perustuvia arviointimenetelmiä kehitetäänkin jatkuvasti. Monet teknologia-avusteisista menetelmistä perustuvat liikkeen tunnistukseen esimerkiksi sensoreiden, kameroiden tai tietokonenäön avulla. Tässä tutkimuksessa selvitetään mahdollisuutta käyttää asennontunnistusalgoritmia ei-kielellisen vuorovaikutuksen arvioimisessa. Tavoitteena on selvittää, pystytäänkö algoritmin avulla tunnistamaan videolta samat vuorovaikutuspiirteet kuin käsin annotoimalla. Tavoitteena on myös tutkia, mikä on paras tapa annotoida videot tämänkaltaisessa tutkimuksessa. Tutkimusmateriaali koostui neljästä videosta, joissa lapsi ja vanhempi puhalsivat saippuakuplia. OpenPose-algoritmilla tunnistettiin lapsen ja vanhemman asennot jokaisesta yksittäisestä kuvasta. Näin saadut koordinaatit käsiteltiin edelleen Matlabilla siten, että niistä laskettiin lapsen ja vanhemman aktiivisuudet ja käsien läheisyys jokaisella ajanhetkellä. Videot annotoitiin kahdella eri tavalla. Perusyksiköistä annotoitiin katseiden suunnat ja saippukuplapurkin käsittely. Vuorovaikutuspiirteistä annotoitiin kommunikointialoitteet, vuorottelu ja jaetun tarkkaavuuden hetket. Algoritmin avulla laskettuja tuloksia vertailtiin annotointeihin visuaalisesti. Kommunikaatioaloitteet ja vuorottelu näkyivät käsien läheisyytenä ja lapsen ja aikuisen aktiivisuuksien vuorotteluna. Vaihtelua käsien läheisyydessä ja aktiivisuuksissa aiheutti kuitenkin moni muukin toiminta kuin vuorovaikutus, joten pelkästään niiden avulla vuorovaikutusta ei voitu erottaa muusta toiminnasta. Kaikki vuorovaikutus ei myöskään liittynyt saippuakuplapurkin käsittelyyn, jolloin se ei näkynyt käsien läheisyytenä. Kuvausjärjestelyistä johtuen algoritmi ei pystynyt tunnistamaan videoista katseen suuntaa, joten myöskään jaetun tarkkaavuuden hetkiä ei pystytty tunnistamaan automaattisesti. Kuvausjärjestelyjä pitäisikin muuttaa niin, että kuvattavien kasvot ovat koko ajan näkyvissä. Tämän kaltaisessa tutkimuksessa kannattaa jatkossa yksittäisten vuorovaikutustekojen arvioimisen sijasta keskittyä laajempiin kokonaisuuksiin kuten synkroniaan vuorovaikutuskumppanien välillä. Paras annotointitapa riippuu tutkimuksen tavoitteesta.The assessment of nonverbal interaction is currently based on observations, interviews and questionnaires. The quantitative methods for assessment of nonverbal interaction are few. Novel technology allows new ways to perform assessment, and new methods are constantly being developed. Many of them are based on movement tracking by sensors, cameras and computer vision. In this study the use of OpenPose, a pose estimation algorithm, was investigated in detection of nonverbal interactional events. The aim was to find out whether the same meaningful interactional events could be found from videos by the algorithm and by human annotators. Another purpose was to find out the best way to annotate the videos in a study like this. The research material consisted of four videos of a child and a parent blowing soap bubbles. The videos were first run by OpenPose to track the poses of the child and the parent frame by frame. The data obtained by the algorithm was further processed by Matlab to extract the activities of the child and the parent, the coupling of the activities and the closeness of child’s and parent’s hands at each time point. The videos were manually annotated in two different ways: Both the basic units, such as the gaze directions and thehandling soap bubble jar, and the interactional events, such as communication initiatives, turn-taking and joint attention, were annotated. The results obtained by the algorithm were visually compared to annotations. The communication initiatives and turn-taking could be seen as peaks in hand closeness and as alternation in activities. However, interaction events were not the only reasons that caused changes in hand closeness and in activities, so they could not be distinguished from other actions solely by these factors. There also existed interaction that was not related to jar handling, which could not be seen from the hand closeness curves. With current recording arrangements, the gaze directions could not be detected by the algorithm and therefore the moments of joint attention could not be determined either. In order to enable the detection of gaze directions in the future studies, the faces of subjects are visible all the time. Distinguishing individual interaction events may not be the best way to assess interaction, and the focus of assessment should be in global units, such as synchrony between interaction partners. The best way to annotate the videos depends on the aim of the study

    Implementation of new assistive technologies for people affected by Autistic Spectrum Disorders (ASDs)

    Get PDF
    Individuals with Autistic Spectrum Disorders (ASDs) have impairments in the processing of social and emotional information. The number of children known to have autism has increased dramatically since the 1980s. This has sensitized the scienti¯c community to the design and development of technologies suitable for treating an autistic patient in order to broaden the emotive responsiveness, such as the employment of robotic systems to engage proactive interactive responses in children with ASDs. My PhD work focuses on the design and develop of new technologies for therapy with individual affect by ASD. The main challenge of my work has been to design and develop a novel control architecture able to reproduce the brain characteristics in terms of high concurrency processing, flexibility and the ability to learn new behavior. The main di±culties in implementing Artificial Neural Networks (ANNs) in hardware in terms of accuracy, gate complexity and speed performance are discussed. A new wearable eye tracking system able to investigate attention disorders early in infancy is proposed. Technological choices are emphasized with respect to unobtrusive and ecological to adapt the device for infants. New algorithms to increase the system robustness under illumination change and during calibration process have been developed and herein presented. Experimental test prove the effectiveness of the solutions. Considerations on the future research directions are addressed, stressing the multiple application fields of the designed device

    State of the art of audio- and video based solutions for AAL

    Get PDF
    Working Group 3. Audio- and Video-based AAL ApplicationsIt is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living (AAL) technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters (e.g., heart rate, respiratory rate). Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals (e.g., speech recordings). Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary 4 debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach. This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users. The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted. The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed.publishedVersio

    An Intelligent Robot and Augmented Reality Instruction System

    Get PDF
    Human-Centered Robotics (HCR) is a research area that focuses on how robots can empower people to live safer, simpler, and more independent lives. In this dissertation, I present a combination of two technologies to deliver human-centric solutions to an important population. The first nascent area that I investigate is the creation of an Intelligent Robot Instructor (IRI) as a learning and instruction tool for human pupils. The second technology is the use of augmented reality (AR) to create an Augmented Reality Instruction (ARI) system to provide instruction via a wearable interface. To function in an intelligent and context-aware manner, both systems require the ability to reason about their perception of the environment and make appropriate decisions. In this work, I construct a novel formulation of several education methodologies, particularly those known as response prompting, as part of a cognitive framework to create a system for intelligent instruction, and compare these methodologies in the context of intelligent decision making using both technologies. The IRI system is demonstrated through experiments with a humanoid robot that uses object recognition and localization for perception and interacts with students through speech, gestures, and object interaction. The ARI system uses augmented reality, computer vision, and machine learning methods to create an intelligent, contextually aware instructional system. By using AR to teach prerequisite skills that lend themselves well to visual, augmented reality instruction prior to a robot instructor teaching skills that lend themselves to embodied interaction, I am able to demonstrate the potential of each system independently as well as in combination to facilitate students\u27 learning. I identify people with intellectual and developmental disabilities (I/DD) as a particularly significant use case and show that IRI and ARI systems can help fulfill the compelling need to develop tools and strategies for people with I/DD. I present results that demonstrate both systems can be used independently by students with I/DD to quickly and easily acquire the skills required for performance of relevant vocational tasks. This is the first successful real-world application of response-prompting for decision making in a robotic and augmented reality intelligent instruction system

    Non Invasive Tools for Early Detection of Autism Spectrum Disorders

    Get PDF
    Autism Spectrum Disorders (ASDs) describe a set of neurodevelopmental disorders. ASD represents a significant public health problem. Currently, ASDs are not diagnosed before the 2nd year of life but an early identification of ASDs would be crucial as interventions are much more effective than specific therapies starting in later childhood. To this aim, cheap an contact-less automatic approaches recently aroused great clinical interest. Among them, the cry and the movements of the newborn, both involving the central nervous system, are proposed as possible indicators of neurological disorders. This PhD work is a first step towards solving this challenging problem. An integrated system is presented enabling the recording of audio (crying) and video (movements) data of the newborn, their automatic analysis with innovative techniques for the extraction of clinically relevant parameters and their classification with data mining techniques. New robust algorithms were developed for the selection of the voiced parts of the cry signal, the estimation of acoustic parameters based on the wavelet transform and the analysis of the infant’s general movements (GMs) through a new body model for segmentation and 2D reconstruction. In addition to a thorough literature review this thesis presents the state of the art on these topics that shows that no studies exist concerning normative ranges for newborn infant cry in the first 6 months of life nor the correlation between cry and movements. Through the new automatic methods a population of control infants (“low-risk”, LR) was compared to a group of “high-risk” (HR) infants, i.e. siblings of children already diagnosed with ASD. A subset of LR infants clinically diagnosed as newborns with Typical Development (TD) and one affected by ASD were compared. The results show that the selected acoustic parameters allow good differentiation between the two groups. This result provides new perspectives both diagnostic and therapeutic

    A comparison of stuttering behavior and fluency improvement in english-mandarin bilinguals who stutter

    Get PDF
    Despite the number of bilinguals and speakers of English and Mandarin worldwide, up till now there have been no investigations of stuttering in any of the Chinese languages, or in bilinguals who speak both English and Mandarin. Hence, it is not known whether stuttering behavior in Mandarin mimics that in English, or whether speech restructuring techniques such as Prolonged Speech produce the same fluency outcomes in Mandarin speakers as they do for English speakers. Research into stuttering in bilinguals is available but far from adequate. Although the limited extant studies show that bilinguals who stutter (BWS) may stutter either the same or differently across languages, and that treatment effects in one language can automatically carry over to the other language, it is unclear whether these findings are influenced by factors such as language dominance or language structure. These issues need to be clarified because speech language pathologists (SLPs) who work with bilinguals often do not speak the dominant language of their clients. Thus, the language of assessment and treatment becomes an important clinical consideration. The aim of this thesis was to investigate (a) whether the severity and type of stuttering was different in English and Mandarin in English-Mandarin bilingual adults, (b) whether this difference was influenced by language dominance, (c) whether stuttering reductions in English generalized to Mandarin following treatment in English only, and (d) whether treatment generalization was influenced by language dominance. To achieve these aims, a way of establishing the dominant language in bilinguals was a necessary first step. The first part of this thesis reviews the disorder of stuttering and the treatment for adults who stutter, the differences between English and Chinese languages, and stuttering in bilinguals. Part Two of this thesis describes the development of a tool for determining language dominance in a multilingual Asian population such as that found in Singapore. This study reviews the complex issues involved in assessing language dominance. It presents the rationale for and description of a self-report classification tool for identifying the dominant language in English-Mandarin bilingual Singaporeans. The decision regarding language dominance was based on a predetermined set of criteria using self-report questionnaire data on language proficiency, frequency of language use, and domain of language use. The tool was administered to 168 English-Mandarin bilingual participants, and the self-report data were validated against the results of a discriminant analysis. The discriminant analysis revealed a reliable three-way classification into English-dominant, Mandarin-dominant, and balanced bilinguals. Scores on a single word receptive vocabulary test supported these dominance classifications. Part Three of this thesis contains two studies investigating stuttering in BWS. The second study of this thesis examined the influence of language dominance on the manifestation of stuttering in English-Mandarin BWS. Results are presented for 30 English-Mandarin BWS who were divided according to their bilingual classification group: 15 English-dominant, four Mandarin-dominant, and 11 balanced bilinguals. All participants underwent comprehensive speech evaluations in both languages. The English-dominant and Mandarin-dominant BWS were found to exhibit greater stuttering in their less dominant language, whereas the balanced bilinguals evidenced similar levels of stuttering in both languages. An analysis of the types of stutter using the Lidcombe Behavioral Data Language showed no significant differences between English and Mandarin for all bilingual groups. In the third study of this thesis, the influence of language dominance on the generalization of stuttering reductions from English to Mandarin was investigated. Results are provided for seven English-dominant, three Mandarin-dominant, and four balanced bilinguals who underwent a Smooth Speech intensive program in English only. A comparison of stuttering between their pretreatment scores and three posttreatment interval scores indicated that the degree of fluency transfer from the treated to the untreated language was disproportionate. English-dominant and Mandarin-dominant participants showed greater fluency improvement in their dominant language even if this language was not directly treated. In the final chapter, Part Four, a hypothesis is provided to explain the findings of this thesis. A discussion of the limitations of the thesis and suggestions for future research are also presented. The chapter concludes with a summary of the main contributions that this thesis makes to the field of stuttering in bilinguals
    corecore