92 research outputs found

    Automatic annotation of head velocity and acceleration in Anvil

    Get PDF

    Classifying head movements in video-recorded conversations based on movement velocity, acceleration and jerk

    Get PDF
    This paper is about the automatic annotation of head movements in videos of face-to-face conversations. Manual annotation of gestures is resource consuming, and modelling gesture behaviours in different types of communicative settings requires many types of annotated data. Therefore, developing methods for automatic annotation is crucial. We present an approach where an SVM classifier learns to classify head movements based on measurements of velocity, acceleration, and the third derivative of position with respect to time, jerk. Consequently, annotations of head movements are added to new video data. The results of the automatic annotation are evaluated against manual annotations in the same data and show an accuracy of 73.47% with respect to these. The results also show that using jerk improves accuracy.peer-reviewe

    Automatic detection and classification of head movements in face-to-face conversations

    Get PDF
    This paper presents an approach to automatic head movement detection and classification in data from a corpus of video-recorded face-toface conversations in Danish involving 12 different speakers. A number of classifiers were trained with different combinations of visual, acoustic and word features and tested in a leave-one-out cross validation scenario. The visual movement features were extracted from the raw video data using OpenPose, the acoustic ones from the sound files using Praat, and the word features from the transcriptions. The best results were obtained by a Multilayer Perceptron classifier, which reached an average 0.68 F1 score across the 12 speakers for head movement detection, and 0.40 for head movement classification given four different classes. In both cases, the classifier outperformed a simple most frequent class baseline, a more advanced baseline only relying on velocity features, and linear classifiers using different combinations of featurespeer-reviewe

    Hand-tracking in video conversations

    Get PDF
    Antud lõputöö eesmärgiks oli kirjeldada erinevaid meetodeid objektide jälgimiseks videosalvestistes ning luua tööriist automaatseks käeliigutuste annoteerimiseks. Videosalvestiste annoteerimine on oluline vahend inimestevahelise suhtluse uurimiseks. Käsitsi annotatsioonide tegemine on aeganõudev ning seega on oluline uurida võimalusi automatiseeritud vahendite loomiseks. Loodud tööriist kasutab CAMShift jälgimisalgoritmi ning on realiseeritud lisamoodulina programmile ANVIL. ANVIL on vabavaraline vahend annotatsioonide loomiseks. Tööriist suudab jälgida käsi ja värvilisi objekte ning tuvastada liigutusi videovestlustes. Algne käepiirkonna või muu objekti videost ülesleidmine ja ära märkimine on jäetud kasutaja hooleks. Liigutused annoteeritakse automaatselt ning info liigutuse alg- ning lõpppunkti ja keskmise kiiruse kohta kirjutatakse ANVIL'i annotatsioonifaili. Loodud tööriista testiti videosalvestiste peal kahe inimese vahelisest suhtlusest ning kasutati käte ning muude värviliste objektide jälgimiseks. Jälgimise ja liigutuste tuvastamise täpsus sõltus videokvaliteedist ning kasutaja poolt määratud sätetest (näiteks minimaalne arvestatav värviküllastus). Kokkuvõtteks võib öelda, et loodud tööriist täidab seatud eesmärke, kuid on ruumi täiendusteks. Näiteks võiks lisada võimalused automaatseks käepiirkondade leidmiseks videos ning liigutuste analüüsimiseks ja eri kategooriatesse jagamiseks. Sellised täiendused vähendaksid veel rohkem kasutaja tööd.The goal of this thesis was to describe various object tracking methods and to create a tool for automatic gesture annotation. Annotation of video data is an important prerequisite for human communication studies, but doing this manually is time and resource consuming. It is thus important to study automatic tools for annotation. The tool implements an object tracking algorithm known as CAMShift and is used as a plugin for the ANVIL annotation software. The tool is able to track hands and other colored objects in a video and detect movements, the inital detection of the hand is left to the user. The movements are automatically annotated by writing the start and end point of the movement and average velocity to a specified annotation track in ANVIL. The tool was tested on recordings of actual dialogues and used to track both bare hands and colored objects. The tracking and movement detection precision depends on the quality of the video being used and on the user specified settings. All in all the created tool meets the goal of the thesis as it is able to automatically track and annotate gestures in recorded video conversations. However, adding functionality to automatically detect hands in a video frame without user intervention and classify gestures based on collected movement data would further reduce the need for user input during the annotation process

    Computer vision methods for unconstrained gesture recognition in the context of sign language annotation

    Get PDF
    Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency

    Collection and analysis of radar rainfall and satellite data for the Darwin TRMM experiment

    Get PDF
    The following subject areas are covered: video cloud camera (purpose, design, operation, data); special observing periods (SOP-2, SOP 2.5 - an extension of SOP-2); Garand algorithm; and warm rain

    From head to toe:body movement for human-computer interaction

    Get PDF
    Our bodies are the medium through which we experience the world around us, so human-computer interaction can highly benefit from the richness of body movements and postures as an input modality. In recent years, the widespread availability of inertial measurement units and depth sensors led to the development of a plethora of applications for the body in human-computer interaction. However, the main focus of these works has been on using the upper body for explicit input. This thesis investigates the research space of full-body human-computer interaction through three propositions. The first proposition is that there is more to be inferred by natural users’ movements and postures, such as the quality of activities and psychological states. We develop this proposition in two domains. First, we explore how to support users in performing weight lifting activities. We propose a system that classifies different ways of performing the same activity; an object-oriented model-based framework for formally specifying activities; and a system that automatically extracts an activity model by demonstration. Second, we explore how to automatically capture nonverbal cues for affective computing. We developed a system that annotates motion and gaze data according to the Body Action and Posture coding system. We show that quality analysis can add another layer of information to activity recognition, and that systems that support the communication of quality information should strive to support how we implicitly communicate movement through nonverbal communication. Further, we argue that working at a higher level of abstraction, affect recognition systems can more directly translate findings from other areas into their algorithms, but also contribute new knowledge to these fields. The second proposition is that the lower limbs can provide an effective means of interacting with computers beyond assistive technology To address the problem of the dispersed literature on the topic, we conducted a comprehensive survey on the lower body in HCI, under the lenses of users, systems and interactions. To address the lack of a fundamental understanding of foot-based interactions, we conducted a series of studies that quantitatively characterises several aspects of foot-based interaction, including Fitts’s Law performance models, the effects of movement direction, foot dominance and visual feedback, and the overhead incurred by using the feet together with the hand. To enable all these studies, we developed a foot tracker based on a Kinect mounted under the desk. We show that the lower body can be used as a valuable complementary modality for computing input. Our third proposition is that by treating body movements as multiple modalities, rather than a single one, we can enable novel user experiences. We develop this proposition in the domain of 3D user interfaces, as it requires input with multiple degrees of freedom and offers a rich set of complex tasks. We propose an approach for tracking the whole body up close, by splitting the sensing of different body parts across multiple sensors. Our setup allows tracking gaze, head, mid-air gestures, multi-touch gestures, and foot movements. We investigate specific applications for multimodal combinations in the domain of 3DUI, specifically how gaze and mid-air gestures can be combined to improve selection and manipulation tasks; how the feet can support the canonical 3DUI tasks; and how a multimodal sensing platform can inspire new 3D game mechanics. We show that the combination of multiple modalities can lead to enhanced task performance, that offloading certain tasks to alternative modalities not only frees the hands, but also allows simultaneous control of multiple degrees of freedom, and that by sensing different modalities separately, we achieve a more detailed and precise full body tracking

    Activity Recognition for Ergonomics Assessment of Industrial Tasks with Automatic Feature Selection

    Get PDF
    International audienceIn industry, ergonomic assessment is currently performed manually based on the identification of postures and actions by experts. We aim at proposing a system for automatic ergonomic assessment based on activity recognition. In this paper, we define a taxonomy of activities, composed of four levels, compatible with items evaluated in standard ergonomic worksheets. The proposed taxonomy is applied to learn activity recognition models based on Hidden Markov Models. We also identify dedicated sets of features to be used as input of the recognition models so as to maximize the recognition performance for each level of our taxonomy. We compare three feature selection methods to obtain these subsets. Data from 13 participants performing a series of tasks mimicking industrial tasks are collected to train and test the recognition module. Results show that the selected subsets allow us to successfully infer ergonomically relevant postures and actions

    PHYSICAL TESTING OF POTENTIAL FOOTBALL HELMET DESIGN ENHANCEMENTS

    Get PDF
    Football is a much loved sport in the United States. Unfortunately, it is also hard on the players and puts them at very high risk of concussion. To combat this an inventor in Santa Barbara brought a new design to Cal Poly to be tested. The design was tested in small scale first in order to make some preliminary conclusions about the design. In order to fully test the helmet design; however, full scale testing was required. In order to carry out this testing a drop tower was built based on National Operating Committee on Standards for Athletic Equipment, NOCSAE, specification. The drop tower designed for Cal Poly is a lower cost and highly portable version of the standard NOCSAE design. Using this drop tower and a 3D printed prototype the new design was tested in full scale

    Effort in gestural interactions with imaginary objects in Hindustani Dhrupad vocal music

    Get PDF
    Physical effort has often been regarded as a key factor of expressivity in music performance. Nevertheless, systematic experimental approaches to the subject have been rare. In North Indian classical (Hindustani) vocal music, singers often engage with melodic ideas during improvisation by manipulating intangible, imaginary objects with their hands, such as through stretching, pulling, pushing, throwing etc. The above observation suggests that some patterns of change in acoustic features allude to interactions that real objects through their physical properties can afford. The present study reports on the exploration of the relationships between movement and sound by accounting for the physical effort that such interactions require in the Dhrupad genre of Hindustani vocal improvisation. The work follows a mixed methodological approach, combining qualitative and quantitative methods to analyse interviews, audio-visual material and movement data. Findings indicate that despite the flexibility in the way a Dhrupad vocalist might use his/her hands while singing, there is a certain degree of consistency by which performers associate effort levels with melody and types of gestural interactions with imaginary objects. However, different schemes of cross-modal associations are revealed for the vocalists analysed, that depend on the pitch space organisation of each particular melodic mode (rāga), the mechanical requirements of voice production, the macro-structure of the ālāp improvisation and morphological cross-domain analogies. Results further suggest that a good part of the variance in both physical effort and gesture type can be explained through a small set of sound and movement features. Based on the findings, I argue that gesturing in Dhrupad singing is guided by: the know-how of humans in interacting with and exerting effort on real objects of the environment, the movement–sound relationships transmitted from teacher to student in the oral music training context and the mechanical demands of vocalisation
    corecore