3,930 research outputs found

    Learning-based 3D human motion capture and animation synthesis

    Get PDF
    Realistic virtual human avatar is a crucial element in a wide range of applications, from 3D animated movies to emerging AR/VR technologies. However, producing a believable 3D motion for such avatars is widely known to be a challenging task. A traditional 3D human motion generation pipeline consists of several stages, each requiring expensive equipment and skilled human labor to perform, limiting its usage beyond the entertainment industry despite its massive potential benefits. This thesis attempts to explore some alternative solutions to reduce the complexity of the traditional 3D animation pipeline. To this end, it presents several novel ways to perform 3D human motion capture, synthesis, and control. Specifically, it focuses on using learning-based methods to bypass the critical bottlenecks of the classical animation approach. First, a new 3D pose estimation method from in-the-wild monocular images is proposed, eliminating the need for a multi-camera setup in the traditional motion capture system. Second, it explores several data-driven designs to achieve a believable 3D human motion synthesis and control that can potentially reduce the need for manual animation. In particular, the problem of speech-driven 3D gesture synthesis is chosen as the case study due to its uniquely ambiguous nature. The improved motion generation quality is achieved by introducing a novel adversarial objective that rates the difference between real and synthetic data. A novel motion generation strategy is also introduced by combining a classical database search algorithm with a powerful deep learning method, resulting in a greater motion control variation than the purely predictive counterparts. Furthermore, this thesis also contributes a new way of collecting a large-scale 3D motion dataset through the use of learning-based monocular estimations methods. This result demonstrates the promising capability of learning-based monocular approaches and shows the prospect of combining these learning-based modules into an integrated 3D animation framework. The presented learning-based solutions open the possibility of democratizing the traditional 3D animation system that can be enabled using low-cost equipment, e.g., a single RGB camera. Finally, this thesis also discusses the potential further integration of these learning-based approaches to enhance 3D animation technology.Realistische virtuelle menschliche Avatare sind ein entscheidendes Element in einer Vielzahl von Anwendungen, von 3D-Animationsfilmen bis hin zu neuen AR/VR-Technologien. Die Erzeugung glaubwürdiger Bewegungen solcher Avatare in drei Dimensionen ist bekanntermaßen eine herausfordernde Aufgabe. Traditionelle Pipelines zur Erzeugung menschlicher 3D-Bewegungen bestehen aus mehreren Stufen, die jede für sich genommen teure Ausrüstung und den Einsatz von Expertenwissen erfordern und daher trotz ihrer enormen potenziellen Vorteile abseits der Unterhaltungsindustrie nur eingeschränkt verwendbar sind. Diese Arbeit untersucht verschiedene Alternativen um die Komplexität der traditionellen 3D-Animations-Pipeline zu reduzieren. Zu diesem Zweck stellt sie mehrere neuartige Möglichkeiten zur Erfassung, Synthese und Steuerung humanoider 3D-Bewegungen vor. Sie konzentriert sich auf die Verwendung lernbasierter Methoden, um kritische Teile des klassischen Animationsansatzes zu überbrücken: Zunächst wird eine neue 3D-Pose-Estimation-Methode für monokulare Bilder vorgeschlagen, um die Notwendigkeit mehrerer Kameras im traditionellen Motion-Capture-Ansatz zu beseitigen. Des Weiteren untersucht die Arbeit mehrere datengetriebene Ansätze zur Synthese und Steuerung glaubwürdiger humanoider 3D-Bewegungen, die möglicherweise den Bedarf an manueller Animation reduzieren können. Als Fallstudie wird, aufgrund seiner einzigartig mehrdeutigen Natur, das Problem der sprachgetriebenen 3D-Gesten-Synthese untersucht. Die Verbesserungen in der Qualität der erzeugten Bewegungen wird durch eine neuartige Kostenfunktion erreicht, die den Unterschied zwischen realen und synthetischen Daten bewertet. Außerdem wird eine neue Strategie zur Bewegungssynthese beschrieben, die eine klassische Datenbanksuche mit einer leistungsstarken Deep-Learning-Methode kombiniert, was zu einer größeren Variation der Bewegungssteuerung führt, als rein lernbasierte Verfahren sie bieten. Ein weiterer Beitrag dieser Dissertation besteht in einer neuen Methode zum Aufbau eines großen Datensatzes dreidimensionaler Bewegungen, auf Grundlage lernbasierter monokularer Pose-Estimation- Methoden. Dies demonstriert die vielversprechenden Möglichkeiten lernbasierter monokularer Methoden und lässt die Aussicht erkennen, diese lernbasierten Module zu einem integrierten 3D-Animations- Framework zu kombinieren. Die in dieser Arbeit vorgestellten lernbasierten Lösungen eröffnen die Möglichkeit, das traditionelle 3D-Animationssystem auch mit kostengünstiger Ausrüstung, wie z.B. einer einzelnen RGB-Kamera verwendbar zu machen. Abschließend diskutiert diese Arbeit auch die mögliche weitere Integration dieser lernbasierten Ansätze zur Verbesserung der 3D-Animationstechnologie

    Generating anatomical substructures for physically-based facial animation.

    Get PDF
    Physically-based facial animation techniques are capable of producing realistic facial deformations, but have failed to find meaningful use outside the academic community because they are notoriously difficult to create, reuse, and art-direct, in comparison to other methods of facial animation. This thesis addresses these shortcomings and presents a series of methods for automatically generating a skull, the superficial musculoaponeurotic system (SMAS – a layer of fascia investing and interlinking the mimic muscle system), and mimic muscles for any given 3D face model. This is done toward (the goal of) a production-viable framework or rig-builder for physically-based facial animation. This workflow consists of three major steps. First, a generic skull is fitted to a given head model using thin-plate splines computed from the correspondence between landmarks placed on both models. Second, the SMAS is constructed as a variational implicit or radial basis function surface in the interface between the head model and the generic skull fitted to it. Lastly, muscle fibres are generated as boundary-value straightest geodesics, connecting muscle attachment regions defined on the surface of the SMAS. Each step of this workflow is developed with speed, realism and reusability in mind

    Using genetic algorithms to uncover individual differences in how humans represent facial emotion

    Get PDF
    Emotional facial expressions critically impact social interactions and cognition. However, emotion research to date has generally relied on the assumption that people represent categorical emotions in the same way, using standardized stimulus sets and overlooking important individual differences. To resolve this problem, we developed and tested a task using genetic algorithms to derive assumption-free, participant-generated emotional expressions. One hundred and five participants generated a subjective representation of happy, angry, fearful and sad faces. Population-level consistency was observed for happy faces, but fearful and sad faces showed a high degree of variability. High test-retest reliability was observed across all emotions. A separate group of 108 individuals accurately identified happy and angry faces from the first study, while fearful and sad faces were commonly misidentified. These findings are an important first step towards understanding individual differences in emotion representation, with the potential to reconceptualize the way we study atypical emotion processing in future research

    What If (Dublin)

    Get PDF
    Raby developed three ‘What If...’ exhibitions with Dunne (RCA), asking what role design can play in imagining possible futures and raising social, cultural and ethical questions, building on 20 years’ practice in Critical Design theorised inter alia in Dunne and Raby’s Design Noir (2001), Hertzian Tales (2005) and Speculative Everything (2013). Raby’s research included concept development, extended collaboration with exhibitors to develop their contributions, and devising the engagement strategy: all three required localised approaches to audiences, circumstances and commissioning hosts. Extensive investigation was needed in synthetic biology, nanotechnology, surveillance technologies and the domestication of natural phenomena, working with scientific partners at Imperial College and Cambridge University. ‘What If…’ Dublin (2009) comprised 29 projects envisioning hypothetical futures and was reviewed in Irish broadsheets (Examiner, Times, Independent), Wired and New Scientist: ‘the exhibits…address questions on scientific or medical ethics that must be asked in our bio-technological age’ (http://www.newscientist.com/blogs/culturelab/2009/12/post-2.html). Exhibits were also shown at the Art Institute of Chicago, Israel Museum, MoMA and Ars Electronica Center. About 1.8 million people pass the windows of the Wellcome Trust building in London annually, making them an important means of science communication. Wellcome commissioned a changing ‘What If…’ exhibition of 15 themes over 15 months (February 2010 – March 2011). Raby reconceived the design strategy with exhibits engaging at different distances, from passing buses to close-up study. The third exhibition, for the Beijing International Design Triennial (2011), explored the impact on future life of novel technologies through 58 projects in 130 exhibits from 36 designers (12 from China), for a diverse audience. The exhibition and related symposium at Tsinghua University were supported by the British Council. The Triennial was visited by approximately 500,000 visitors and featured widely, e.g. China Central Television, People's Daily, New York Times (all 2011) and Zhuangshi journal (2011 and 2012)

    Emotion based Facial Animation using Four Contextual Control Modes

    Get PDF
    An Embodied Conversational Agent (ECA) is an intelligent agent that interacts with users through verbal and nonverbal expressions. When used as the interface of software applications, the presence of these agents creates a positive impact on user experience. Due to their potential in providing online assistance in areas such as E-Commerce, there is an increasing need to make ECAs more believable for the user, which has been achieved mainly by using realistic facial animation and emotions. This thesis presents a new approach of ECA modeling that empowers intelligent agents with synthesized emotions. This approach applies the Contextual Control Model for the construction of an emotion generator that uses information obtained from dialogue to select one of the four modes for the emotion, i.e., Scrambled, Opportunistic, Tactical, and Strategic mode. The emotions are produced in format of the Ortony Clore &Collins (OCC) model for emotion expressions
    • …
    corecore