9 research outputs found

    Real-time 3D human body pose estimation from monocular RGB input

    Get PDF
    Human motion capture finds extensive application in movies, games, sports and biomechanical analysis. However, existing motion capture solutions require cumbersome external and/or on-body instrumentation, or use active sensors with limits on the possible capture volume dictated by power consumption. The ubiquity and ease of deployment of RGB cameras makes monocular RGB based human motion capture an extremely useful problem to solve, which would lower the barrier-to entry for content creators to employ motion capture tools, and enable newer applications of human motion capture. This thesis demonstrates the first real-time monocular RGB based motion-capture solutions that work in general scene settings. They are based on developing neural network based approaches to address the ill-posed problem of estimating 3D human pose from a single RGB image, in combination with model based fitting. In particular, the contributions of this work make advances towards three key aspects of real-time monocular RGB based motion capture, namely speed, accuracy, and the ability to work for general scenes. New training datasets are proposed, for single-person and multi-person scenarios, which, together with the proposed transfer learning based training pipeline, allow learning based approaches to be appearance invariant. The training datasets are accompanied by evaluation benchmarks with multiple avenues of fine-grained evaluation. The evaluation benchmarks differ visually from the training datasets, so as to promote efforts towards solutions that generalize to in-the-wild scenes. The proposed task formulations for the single-person and multi-person case allow higher accuracy, and incorporate additional qualities such as occlusion robustness, that are helpful in the context of a full motion capture solution. The multi-person formulations are designed to have a nearly constant inference time regardless of the number of subjects in the scene, and combined with contributions towards fast neural network inference, enable real-time 3D pose estimation for multiple subjects. Combining the proposed learning-based approaches with a model-based kinematic skeleton fitting step provides temporally stable joint angle estimates, which can be readily employed for driving virtual characters.Menschlicher Motion Capture findet umfangreiche Anwendung in Filmen, Spielen, Sport und biomechanischen Analysen. Bestehende Motion-Capture-Lösungen erfordern jedoch umständliche externe Instrumentierung und / oder Instrumentierung am Körper, oder verwenden aktive Sensoren deren begrenztes Erfassungsvolumen durch den Stromverbrauch begrenzt wird. Die Allgegenwart und einfache Bereitstellung von RGB-Kameras macht die monokulare RGB-basierte Motion Capture zu einem äußerst nützlichen Problem. Dies würde die Eintrittsbarriere für Inhaltsersteller für die Verwendung der Motion Capture verringern und neuere Anwendungen dieser Tools zur Analyse menschlicher Bewegungen ermöglichen. Diese Arbeit zeigt die ersten monokularen RGB-basierten Motion-Capture-Lösungen in Echtzeit, die in allgemeinen Szeneneinstellungen funktionieren. Sie basieren auf der Entwicklung neuronaler netzwerkbasierter Ansätze, um das schlecht gestellte Problem der Schätzung der menschlichen 3D-Pose aus einem einzelnen RGB-Bild in Kombination mit einer modellbasierten Anpassung anzugehen. Insbesondere machen die Beiträge dieser Arbeit Fortschritte in Richtung drei Schlüsselaspekte der monokularen RGB-basierten Echtzeit-Bewegungserfassung, nämlich Geschwindigkeit, Genauigkeit und die Fähigkeit, für allgemeine Szenen zu arbeiten. Es werden neue Trainingsdatensätze für Einzel- und Mehrpersonen-Szenarien vorgeschlagen, die zusammen mit der vorgeschlagenen Trainingspipeline, die auf Transferlernen basiert, ermöglichen, dass lernbasierte Ansätze nicht von Unterschieden im Erscheinungsbild des Bildes beeinflusst werden. Die Trainingsdatensätze werden von Bewertungsbenchmarks mit mehreren Möglichkeiten einer feinkörnigen Bewertung begleitet. Die angegebenen Benchmarks unterscheiden sich visuell von den Trainingsaufzeichnungen, um die Entwicklung von Lösungen zu fördern, die sich auf verschiedene Szenen verallgemeinern lassen. Die vorgeschlagenen Aufgabenformulierungen für den Einzel- und Mehrpersonenfall ermöglichen eine höhere Genauigkeit und enthalten zusätzliche Eigenschaften wie die Robustheit der Okklusion, die im Kontext einer vollständigen Bewegungserfassungslösung hilfreich sind. Die Mehrpersonenformulierungen sind so konzipiert, dass sie unabhängig von der Anzahl der Subjekte in der Szene eine nahezu konstante Inferenzzeit haben. In Kombination mit Beiträgen zur schnellen Inferenz neuronaler Netze ermöglichen sie eine 3D-Posenschätzung in Echtzeit für mehrere Subjekte. Die Kombination der vorgeschlagenen lernbasierten Ansätze mit einem modellbasierten kinematischen Skelettanpassungsschritt liefert zeitlich stabile Gelenkwinkelschätzungen, die leicht zum Ansteuern virtueller Charaktere verwendet werden können

    Die Virtuelle Videokamera: ein System zur Blickpunktsynthese in beliebigen, dynamischen Szenen

    Get PDF
    The Virtual Video Camera project strives to create free viewpoint video from casually captured multi-view data. Multiple video streams of a dynamic scene are captured with off-the-shelf camcorders, and the user can re-render the scene from novel perspectives. In this thesis the algorithmic core of the Virtual Video Camera is presented. This includes the algorithm for image correspondence estimation as well as the image-based renderer. Furthermore, its application in the context of an actual video production is showcased, and the rendering and image processing pipeline is extended to incorporate depth information.Das Virtual Video Camera Projekt dient der Erzeugung von Free Viewpoint Video Ansichten von Multi-View Aufnahmen: Material mehrerer Videoströme wird hierzu mit handelsüblichen Camcordern aufgezeichnet. Im Anschluss kann die Szene aus beliebigen, von den ursprünglichen Kameras nicht abgedeckten Blickwinkeln betrachtet werden. In dieser Dissertation wird der algorithmische Kern der Virtual Video Camera vorgestellt. Dies beinhaltet das Verfahren zur Bildkorrespondenzschätzung sowie den bildbasierten Renderer. Darüber hinaus wird die Anwendung im Kontext einer Videoproduktion beleuchtet. Dazu wird die bildbasierte Erzeugung neuer Blickpunkte um die Erzeugung und Einbindung von Tiefeninformationen erweitert

    Image-Based Rendering Of Real Environments For Virtual Reality

    Get PDF

    Hand shape estimation for South African sign language

    Get PDF
    >Magister Scientiae - MScHand shape recognition is a pivotal part of any system that attempts to implement Sign Language recognition. This thesis presents a novel system which recognises hand shapes from a single camera view in 2D. By mapping the recognised hand shape from 2D to 3D,it is possible to obtain 3D co-ordinates for each of the joints within the hand using the kinematics embedded in a 3D hand avatar and smooth the transformation in 3D space between any given hand shapes. The novelty in this system is that it does not require a hand pose to be recognised at every frame, but rather that hand shapes be detected at a given step size. This architecture allows for a more efficient system with better accuracy than other related systems. Moreover, a real-time hand tracking strategy was developed that works efficiently for any skin tone and a complex background

    Interdisciplinarity in the Age of the Triple Helix: a Film Practitioner's Perspective

    Get PDF
    This integrative chapter contextualises my research including articles I have published as well as one of the creative artefacts developed from it, the feature film The Knife That Killed Me. I review my work considering the ways in which technology, industry methods and academic practice have evolved as well as how attitudes to interdisciplinarity have changed, linking these to Etzkowitz and Leydesdorff’s ‘Triple Helix’ model (1995). I explore my own experiences and observations of opportunities and challenges that have been posed by the intersection of different stakeholder needs and expectations, both from industry and academic perspectives, and argue that my work provides novel examples of the applicability of the ‘Triple Helix’ to the creative industries. The chapter concludes with a reflection on the evolution and direction of my work, the relevance of the ‘Triple Helix’ to creative practice, and ways in which this relationship could be investigated further

    New HCI techniques for better living through technology

    Get PDF
    In the Human Computer Interaction community, researchers work on many projects that investigate the efficacy of new technologies for better living, but unlike other research fields, these researchers must have an approach that is typically multi-disciplinary. Technology is always developing thus improving our lives in many ways like education, health and communication. This due to the fact that it is supposed to make life easier. This dissertation explores three main aspects: the first is learning with new technologies, the second is the improvement of real life by using innovative devices while the third is the usage of mobile devices in combination with image processing algorithms and computer graphics techniques. We firstly describe the progress on the state of the art and related work that have been necessary to implement such tools on commodity hardware and deploy them in both mobile and desktop settings. We propose the usage of different technologies in different settings, comparing these solutions for enhancing the interaction experience by introducing virtual/augmented reality tools for supporting this kind of activities. We also applied well-known gamification techniques coming from different mobile applications for demonstrating how users can be entertained and motivated in their working out. We describe our design and prototype of several integrated systems created to improve the educational process, to enhance the shopping experience, to provide new experiences for travellers and even to improve fitness and wellness activities. Finally, we discuss our findings and frame them in the broader context of better living thanks to technology, drawing the lessons learnt from each work while also proposing relative future work

    8th. International congress on archaeology computer graphica. Cultural heritage and innovation

    Full text link
    El lema del Congreso es: 'Documentación 3D avanzada, modelado y reconstrucción de objetos patrimoniales, monumentos y sitios.Invitamos a investigadores, profesores, arqueólogos, arquitectos, ingenieros, historiadores de arte... que se ocupan del patrimonio cultural desde la arqueología, la informática gráfica y la geomática, a compartir conocimientos y experiencias en el campo de la Arqueología Virtual. La participación de investigadores y empresas de prestigio será muy apreciada. Se ha preparado un atractivo e interesante programa para participantes y visitantes.Lerma García, JL. (2016). 8th. International congress on archaeology computer graphica. Cultural heritage and innovation. Editorial Universitat Politècnica de València. http://hdl.handle.net/10251/73708EDITORIA

    A technical view: progress and aesthetic changes in cinematography at the beginning of the 21st century

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Ciencias de la Información, leída el 20-09-2019Esta tesis doctoral atiende a la dirección de fotografía, una especialidad en la creacióncinematográfica que recibe escasa atención en los estudios fílmicos. Esta especialidad es la encargada de traducir la visión del director a parámetros técnicos y herramientas que se utilizarán tanto en rodaje como en la postproducción. Para ello, los directores de fotografía recurren a diversas técnicas como la luz, cámara, ópticas y soportes de captación. Éstas se han visto profundamente afectadas por la introducción de la cinematografía digital.Una mirada técnica: avances en dirección de fotografía y cambios estéticos a principios del siglo XXI es una investigación cuantitativa y cualitativa qué cuestiona cuáles han sido los efectos estéticos producidos por los avances técnicos digitales que afectan al trabajo del director de fotografía entre los años 2000 y 2015. Se trata de un estudio que aborda las interacciones entre tecnología y estética a través de su aplicación y evolución cronológica durante un periodo transicional en el que ha habido un cambio drástico en los modelos de producción y postproducción...The main focus of this doctoral thesis is cinematography, a specialized branch within film-making that has received little attention within media studies. Cinematography is thespeciality responsible for translating the ideas of the director into the technical parameters and tools that will be used during both shooting and post-production. In order to do this, directors of photography use a diverse array of tools, such as lighting, the camera, lenses and different image-recording mediums. These have been deeply affected by the introduction of digital cinematography. A Technical View: Progress and Aesthetic Changes in Cinematography at the Beginning of the 21st Century is a quantitative and qualitative study that questions what have been the aesthetic consequences of the development of digital technology for cinematography between the years 2000 and 2015. This work addresses the interaction between technology and aesthetics through their practical use and chronological development during a transitional period in which there have been drastic changes in production and post-production models...Fac. de Ciencias de la InformaciónTRUEunpu