25 research outputs found

    AFFECT-PRESERVING VISUAL PRIVACY PROTECTION

    Get PDF
    The prevalence of wireless networks and the convenience of mobile cameras enable many new video applications other than security and entertainment. From behavioral diagnosis to wellness monitoring, cameras are increasing used for observations in various educational and medical settings. Videos collected for such applications are considered protected health information under privacy laws in many countries. Visual privacy protection techniques, such as blurring or object removal, can be used to mitigate privacy concern, but they also obliterate important visual cues of affect and social behaviors that are crucial for the target applications. In this dissertation, we propose to balance the privacy protection and the utility of the data by preserving the privacy-insensitive information, such as pose and expression, which is useful in many applications involving visual understanding. The Intellectual Merits of the dissertation include a novel framework for visual privacy protection by manipulating facial image and body shape of individuals, which: (1) is able to conceal the identity of individuals; (2) provide a way to preserve the utility of the data, such as expression and pose information; (3) balance the utility of the data and capacity of the privacy protection. The Broader Impacts of the dissertation focus on the significance of privacy protection on visual data, and the inadequacy of current privacy enhancing technologies in preserving affect and behavioral attributes of the visual content, which are highly useful for behavior observation in educational and medical settings. This work in this dissertation represents one of the first attempts in achieving both goals simultaneously

    Understanding Person Identification Through Gait

    Get PDF
    Gait recognition is the process of identifying humans from their bipedal locomotion such as walking or running. As such, gait data is privacy sensitive information and should be anonymized where possible. With the rise of higher quality gait recording techniques, such as depth cameras or motion capture suits, an increasing amount of detailed gait data is captured and processed. Introduction and rise of the Metaverse is but one popular application scenario in which the gait of users is transferred onto digital avatars. As a first step towards developing effective anonymization techniques for high-quality gait data, we study different aspects of movement data to quantify their contribution to gait recognition. We first extract categories of features from the literature on human gait perception and then design experiments for each category to assess how much the information they contain contributes to recognition success. Our results show that gait anonymization will be challenging, as the data is highly redundant and interdependent

    Adaptive Body Gesture Representation for Automatic Emotion Recognition

    Get PDF
    We present a computational model and a system for the automated recognition of emotions starting from full-body movement. Three-dimensional motion data of full-body movements are obtained either from professional optical motion-capture systems (Qualisys) or from low-cost RGB-D sensors (Kinect and Kinect2). A number of features are then automatically extracted at different levels, from kinematics of a single joint to more global expressive features inspired by psychology and humanistic theories (e.g., contraction index, fluidity, and impulsiveness). An abstraction layer based on dictionary learning further processes these movement features to increase the model generality and to deal with intraclass variability, noise, and incomplete information characterizing emotion expression in human movement. The resulting feature vector is the input for a classifier performing real-time automatic emotion recognition based on linear support vector machines. The recognition performance of the proposed model is presented and discussed, including the tradeoff between precision of the tracking measures (we compare the Kinect RGB-D sensor and the Qualisys motion-capture system) versus dimension of the training dataset. The resulting model and system have been successfully applied in the development of serious games for helping autistic children learn to recognize and express emotions by means of their full-body movement

    Modeling Humans at Rest with Applications to Robot Assistance

    Get PDF
    Humans spend a large part of their lives resting. Machine perception of this class of body poses would be beneficial to numerous applications, but it is complicated by line-of-sight occlusion from bedding. Pressure sensing mats are a promising alternative, but data is challenging to collect at scale. To overcome this, we use modern physics engines to simulate bodies resting on a soft bed with a pressure sensing mat. This method can efficiently generate data at scale for training deep neural networks. We present a deep model trained on this data that infers 3D human pose and body shape from a pressure image, and show that it transfers well to real world data. We also present a model that infers pose, shape and contact pressure from a depth image facing the person in bed, and it does so in the presence of blankets. This model similarly benefits from synthetic data, which is created by simulating blankets on the bodies in bed. We evaluate this model on real world data and compare it to an existing method that requires RGB, depth, thermal and pressure imagery in the input. Our model only requires an input depth image, yet it is 12% more accurate. Our methods are relevant to applications in healthcare, including patient acuity monitoring and pressure injury prevention. We demonstrate this work in the context of robotic caregiving assistance, by using it to control a robot to move to locations on a person’s body in bed.Ph.D

    Fully Convolutional Networks for Semantic Segmentation from RGB-D images

    Get PDF
    In recent years new trends such as industry 4.0 boosted the research and development in the field of autonomous systems and robotics. Robots collaborate and even take over complete tasks of humans. But the high degree of automation requires high reliability even in complex and changing environments. Those challenging conditions make it hard to rely on static models of the real world. In addition to adaptable maps, mobile robots require a local and current understanding of the scene. The Bosch Start-Up Company is developing robots for intra-logistic systems, which could highly benefit from such a detailed scene understanding. The aim of this work is to research and develop such a system for warehouse environments. While the possible field of application is in general very broad, this work will focus on the detection and localization of warehouse specific objects such as palettes. In order to provide a meaningful perception of the surrounding a RGB-D camera is used. A pre-trained convolutional network extracts scene understanding in the form of pixelwise class labels. As this convolutional network is the core of the application, this work focuses on different network set-ups and learning strategies. One difficulty was the lack of annotated training data. Since the creation of densely labeled images is a very time consuming process it was important to elaborate on good alternatives. One interesting finding was that it’s possible to transfer learning to a high extent from similar models pre-trained on thousands of RGB-images. This is done by selective interventions on the net parameters. By ensuring a good initialization it’s possible to train towards a well performing model within few iterations. In this way it’s possible to train even branched nets at once. This can also be achieved by including certain normalization steps. Another important aspect was to find a suitable way to incorporate depth-information. How to fuse depth into the existing model? By providing the height over ground as an additional feature the segmentation accuracy was further improved while keeping the extra computational costs low. Finally the segmentation maps are refined by a conditional random field. The joint training of both parts results in accurate object segmentations comparable to recently published state-of-the-art models.Aktuelle Themen, wie zum Beispiel Industrie 4.0, haben Fortschritte im Bereich autonomer Systeme und Robotik vorangetrieben. Roboter kollaborieren mit Arbeitern oder ĂŒbernehmen komplette Arbeitsschritte. Dieser hohe Automatisierungsgrad erfordert, dass solche Systeme, selbst in komplexen Situationen und Umgebungen, hochgradig zuverlĂ€ssig und sicher arbeiten. Statische Modelle zur Abstrahierung der Umgebung sind unzureichend. Mobile Roboter benötigen neben dynamischen Lokalisierungskarten bestenfalls auch ein VerstĂ€ndnis der Umgebung. Die Bosch Start-Up GmbH entwickelt Roboter, welche zukĂŒnftig in Warenlagern eingesetzt werden sollen. Diese wĂŒrden von einem solchen VerstĂ€ndnis profitieren. Das Ziel war es aktuelle Erkenntnisse aus der Forschung zur semantischen Segmentierung mithilfe von Deep Learning Techniken zu einer prototypischen Anwendung zu transferieren. Die entwickelte Anwendung im Allgemeinen zwar universell einsetzbar, der Fokus dieser Arbeit liegt jedoch auf der Segmentierung von Objekten aus einem typischen Warenlager (bspw. Paletten). Die Segmentierung basiert auf den Bildern einer RGB-D Kamera und ermöglicht gleichzeitig eine rĂ€umliche Lokalisierung von Objekten. Ein spezielles tiefes neuronales Netz (FCN) fĂŒhrt die komplette Segmentierung durch. Die Arbeit beschĂ€ftigt sich schwerpunktmĂ€ĂŸig mit der Adaption und dem Training eines solches Netzes. Die Bereitstellung von annotatierten Daten ist Ă€ußerst aufwĂ€ndig. Um die Zahl der nötigen Daten gering zu halten wurden geeignete Techniken eingesetzt. Dazu wurden Modellparameter frei zugĂ€nglicher Netze transferiert, um eine möglichst gute Initialisierung sicherzustellen. Außerdem wurden Normalisierungsschritte in die Netzarchitektur eingefĂŒhrt, sodass auch verzweigte Strukturen in einem Trainingslauf trainiert werden können. Ein wichtiger Aspekt ist zudem die Einbeziehung von Tiefeninformation in den Segmentierungsprozess. Das finale Netz berĂŒcksichtigt neben RGB-Daten auch eine Höheninformation. Dadurch wurde die SegmentierungsqualitĂ€t mit nur geringem zusĂ€tzlichen Rechenaufwand verbessert. Zudem wurde ein Conditional Random Field zur iterativen Verfeinerung der Segmentierung eingesetzt. Das gemeinsame Training beider Komponenten, FCN und CRF, hat dazu beigetragen, dass die QualitĂ€t der Ergebnisse sich im Bereich aktueller Forschungsarbeiten bewegen
    corecore