6 research outputs found

    Advances in Monocular Exemplar-based Human Body Pose Analysis: Modeling, Detection and Tracking

    Get PDF
    Esta tesis contribuye en el análisis de la postura del cuerpo humano a partir de secuencias de imágenes adquiridas con una sola cámara. Esta temática presenta un amplio rango de potenciales aplicaciones en video-vigilancia, video-juegos o aplicaciones biomédicas. Las técnicas basadas en patrones han tenido éxito, sin embargo, su precisión depende de la similitud del punto de vista de la cámara y de las propiedades de la escena entre las imágenes de entrenamiento y las de prueba. Teniendo en cuenta un conjunto de datos de entrenamiento capturado mediante un número reducido de cámaras fijas, paralelas al suelo, se han identificado y analizado tres escenarios posibles con creciente nivel de dificultad: 1) una cámara estática paralela al suelo, 2) una cámara de vigilancia fija con un ángulo de visión considerablemente diferente, y 3) una secuencia de video capturada con una cámara en movimiento o simplemente una sola imagen estática

    Efficient Deep Learning-Driven Systems for Real-Time Video Expression Recognition

    Get PDF
    The ability to detect, recognize, and interpret facial expressions is an important skill for humans to have due to the abundance of social interactions one faces on a daily basis, but it is also something that most take for granted. Being the social animals that we are, expression understanding not only enables us to gauge current emotional states, but also allows for the recognition of conversational cues such as level of interest, speaking turns, and level of information understanding. For individuals with autism spectrum disorder, a core challenge that they face is an impaired ability to infer other people's emotions based on their facial expressions, which can cause problems when creating and sustaining meaningful, positive relationships, leading to troubles integrating into society and a higher prevalence of depression and loneliness. However, with significant recent advances in machine learning, one potential solution is to leverage assistive technology to aid these individuals to better recognize facial expressions. Such a technology requires reasonable accuracy in order to provide users with correct information, but also must follow a real-time constraint to be relevant and seamless in a social setting. Due to the dynamic and transient nature of human facial expressions, a challenge during classification is the usage of temporal information to provide additional context to a scene. Many applications require the real-time aspect to be preserved, and thus temporal information must be leveraged in an efficient manner. Consequently, we explore the dynamic and transient nature of facial expressions through a novel deep time windowed convolutional neural network design called TimeConvNets, that is capable of encoding spatiotemporal information in an efficient manner. We compare against other methods capable of leveraging temporal information, and show that TimeConvNets can provide a real-time solution that is both accurate as well as architecturally and computationally less complex. Even with the strong performances that the TimeConvNet architecture offers, additional architecture modifications tailored specifically for human facial expression classification can likely result in increased performance gains. Thus, we explore a human-machine collaborative design strategy for the purpose of further reducing and optimizing these facial expression classifiers. EmotionNet Nano was created and tailored specifically for the task of expression classification on edge devices, by leveraging human experience combined with the meticulousness and speed of machines. Experimental results on the CK+ facial expression benchmark dataset demonstrate that the proposed EmotionNet Nano networks achieved accuracy comparable to state-of-the-art, while requiring significantly fewer parameters, and are also capable of performing inference in real-time, making them suitable for deployment on a variety of platforms including mobile phones. To train these models, a high quality expression dataset is required, specifically one that retains temporal information between consecutive image frames. We introduce FaceParty as a solution, which is a more difficult dataset created by the modified aggregation of six public video facial expression datasets, and provide details for replication. We hope that models trained using FaceParty can achieve increased generalization ability for faces in the wild due to the nature of the dataset

    飛行ロボットにおける人間・ロボットインタラクションの実現に向けて : ユーザー同伴モデルとセンシングインターフェース

    Get PDF
    学位の種別: 課程博士審査委員会委員 : (主査)東京大学准教授 矢入 健久, 東京大学教授 堀 浩一, 東京大学教授 岩崎 晃, 東京大学教授 土屋 武司, 東京理科大学教授 溝口 博University of Tokyo(東京大学
    corecore