259 research outputs found

    Some consideration on expressive audiovisual speech corpus acquisition using a multimodal platform

    Get PDF
    International audienceIn this paper, we present a multimodal acquisition setup that combines different motion-capture systems. This system is mainly aimed for recording expressive audiovisual corpus in the context of audiovisual speech synthesis. When dealing with speech recording, the standard optical motion-capture systems fail in tracking the articulators finely, especially the inner mouth region, due to the disappearing of certain markers during the articulation. Also, some systems have limited frame rates and are not suitable for smooth speech tracking. In this work, we demonstrate how those limitations can be overcome by creating a heterogeneous system taking advantage of different tracking systems. In the scope of this work, we recorded a prototypical corpus using our combined system for a single subject. This corpus was used to validate our multimodal data acquisition protocol and to assess the quality of the expressiveness before recording a large corpus. We conducted two evaluations of the recorded data, the first one concerns the production aspect of speech and the second one focuses on the speech perception aspect (both evaluations concern visual and acoustic modalities). Production analysis allowed us to identify characteristics specific to each expressive context. This analysis showed that the expressive content of the recorded data is globally in line with what is commonly expected in the literature. The perceptual evaluation, conducted as a human emotion recognition task using different types of stimulus, confirmed that the different recorded emotions were well perceived

    Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data

    Get PDF
    Part 1: Fundamental IssuesInternational audienceThis paper presents the results of our participation to the ninth eNTERFACE workshop on multimodal user interfaces. Our target for this workshop was to bring some technologies currently used in speech recognition and synthesis to a new level, i.e. being the core of a new HMM-based mapping system. The idea of statistical mapping has been investigated, more precisely how to use Gaussian Mixture Models and Hidden Markov Models for realtime and reactive generation of new trajectories from inputted labels and for realtime regression in a continuous-to-continuous use case. As a result, we have developed several proofs of concept, including an incremental speech synthesiser, a software for exploring stylistic spaces for gait and facial motion in realtime, a reactive audiovisual laughter and a prototype demonstrating the realtime reconstruction of lower body gait motion strictly from upper body motion, with conservation of the stylistic properties. This project has been the opportunity to formalise HMM-based mapping, integrate various of these innovations into the Mage library and explore the development of a realtime gesture recognition tool

    Sculpting Unrealities: Using Machine Learning to Control Audiovisual Compositions in Virtual Reality

    Get PDF
    This thesis explores the use of interactive machine learning (IML) techniques to control audiovisual compositions within the emerging medium of virtual reality (VR). Accompanying the text is a portfolio of original compositions and open-source software. These research outputs represent the practical elements of the project that help to shed light on the core research question: how can IML techniques be used to control audiovisual compositions in VR? In order to find some answers to this question, it was broken down into its constituent elements. To situate the research, an exploration of the contemporary field of audiovisual art locates the practice between the areas of visual music and generative AV. This exploration of the field results in a new method of categorising the constituent practices. The practice of audiovisual composition is then explored, focusing on the concept of equality. It is found that, throughout the literature, audiovisual artists aim to treat audio and visual material equally. This is interpreted as a desire for balance between the audio and visual material. This concept is then examined in the context of VR. A feeling of presence is found to be central to this new medium and is identified as an important consideration for the audiovisual composer in addition to the senses of sight and sound. Several new terms are formulated which provide the means by which the compositions within the portfolio are analysed. A control system, based on IML techniques, is developed called the Neural AV Mapper. This is used to develop a compositional methodology through the creation of several studies. The outcomes from these studies are incorporated into two live performance pieces, Ventriloquy I and Ventriloquy II. These pieces showcase the use of IML techniques to control audiovisual compositions in a live performance context. The lessons learned from these pieces are incorporated into the development of the ImmersAV toolkit. This open-source software toolkit was built specifically to allow for the exploration of the IML control paradigm within VR. The toolkit provides the means by which the immersive audiovisual compositions, Obj_#3 and Ag Fás Ar Ais Arís are created. Obj_#3 takes the form of an immersive audiovisual sculpture that can be manipulated in real-time by the user. The title of the thesis references the physical act of sculpting audiovisual material. It also refers to the ability of VR to create alternate realities that are not bound to the physics of real-life. This exploration of unrealities emerges as an important aspect of the medium. The final piece in the portfolio, Ag Fás Ar Ais Arís takes the knowledge gained from the earlier work and pushes the boundaries to maximise the potential of the medium and the material

    Lip syncing method for realistic expressive three-dimensional face model

    Get PDF
    Lip synchronization of 3D face model is now being used in a multitude of important fields. It brings a more human and dramatic reality to computer games, films and interactive multimedia, and is growing in use and importance. High level realism can be used in demanding applications such as computer games and cinema. Authoring lip syncing with complex and subtle expressions is still difficult and fraught with problems in terms of realism. Thus, this study proposes a lip syncing method of realistic expressive 3D face model. Animated lips require a 3D face model capable of representing the movement of face muscles during speech and a method to produce the correct lip shape at the correct time. The 3D face model is designed based on MPEG-4 facial animation standard to support lip syncing that is aligned with input audio file. It deforms using Raised Cosine Deformation function that is grafted onto the input facial geometry. This study also proposes a method to animate the 3D face model over time to create animated lip syncing using a canonical set of visemes for all pairwise combinations of a reduced phoneme set called ProPhone. Finally, this study integrates emotions by considering both Ekman model and Plutchik’s wheel with emotive eye movements by implementing Emotional Eye Movements Markup Language to produce realistic 3D face model. The experimental results show that the proposed model can generate visually satisfactory animations with Mean Square Error of 0.0020 for neutral, 0.0024 for happy expression, 0.0020 for angry expression, 0.0030 for fear expression, 0.0026 for surprise expression, 0.0010 for disgust expression, and 0.0030 for sad expression

    3D Virtual Worlds and the Metaverse: Current Status and Future Possibilities

    Get PDF
    Moving from a set of independent virtual worlds to an integrated network of 3D virtual worlds or Metaverse rests on progress in four areas: immersive realism, ubiquity of access and identity, interoperability, and scalability. For each area, the current status and needed developments in order to achieve a functional Metaverse are described. Factors that support the formation of a viable Metaverse, such as institutional and popular interest and ongoing improvements in hardware performance, and factors that constrain the achievement of this goal, including limits in computational methods and unrealized collaboration among virtual world stakeholders and developers, are also considered

    Faces and hands : modeling and animating anatomical and photorealistic models with regard to the communicative competence of virtual humans

    Get PDF
    In order to be believable, virtual human characters must be able to communicate in a human-like fashion realistically. This dissertation contributes to improving and automating several aspects of virtual conversations. We have proposed techniques to add non-verbal speech-related facial expressions to audiovisual speech, such as head nods for of emphasis. During conversation, humans experience shades of emotions much more frequently than the strong Ekmanian basic emotions. This prompted us to develop a method that interpolates between facial expressions of emotions to create new ones based on an emotion model. In the area of facial modeling, we have presented a system to generate plausible 3D face models from vague mental images. It makes use of a morphable model of faces and exploits correlations among facial features. The hands also play a major role in human communication. Since the basis for every realistic animation of gestures must be a convincing model of the hand, we devised a physics-based anatomical hand model, where a hybrid muscle model drives the animations. The model was used to visualize complex hand movement captured using multi-exposure photography.Um überzeugend zu wirken, müssen virtuelle Figuren auf dieselbe Art wie lebende Menschen kommunizieren können. Diese Dissertation hat das Ziel, verschiedene Aspekte virtueller Unterhaltungen zu verbessern und zu automatisieren. Wir führten eine Technik ein, die es erlaubt, audiovisuelle Sprache durch nichtverbale sprachbezogene Gesichtsausdrücke zu bereichern, wie z.B. Kopfnicken zur Betonung. Während einer Unterhaltung empfinden Menschen weitaus öfter Emotionsnuancen als die ausgeprägten Ekmanschen Basisemotionen. Dies bewog uns, eine Methode zu entwickeln, die Gesichtsausdrücke für neue Emotionen erzeugt, indem sie, ausgehend von einem Emotionsmodell, zwischen bereits bekannten Gesichtsausdrücken interpoliert. Auf dem Gebiet der Gesichtsmodellierung stellten wir ein System vor, um plausible 3D-Gesichtsmodelle aus vagen geistigen Bildern zu erzeugen. Dieses System basiert auf einem Morphable Model von Gesichtern und nutzt Korrelationen zwischen Gesichtszügen aus. Auch die Hände spielen ein große Rolle in der menschlichen Kommunikation. Da der Ausgangspunkt für jede realistische Animation von Gestik ein überzeugendes Handmodell sein muß, entwikkelten wir ein physikbasiertes anatomisches Handmodell, bei dem ein hybrides Muskelmodell die Animationen antreibt. Das Modell wurde verwendet, um komplexe Handbewegungen zu visualisieren, die aus mehrfach belichteten Photographien extrahiert worden waren

    Sonic Interactions in Virtual Environments

    Get PDF
    This open access book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are: Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologies Sonic interaction: the human-computer interplay through auditory feedback in VE VR systems: naturally support multimodal integration, impacting different application domains Sonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond. Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments

    Improving User Involvement Through Live Collaborative Creation

    Full text link
    Creating an artifact - such as writing a book, developing software, or performing a piece of music - is often limited to those with domain-specific experience or training. As a consequence, effectively involving non-expert end users in such creative processes is challenging. This work explores how computational systems can facilitate collaboration, communication, and participation in the context of involving users in the process of creating artifacts while mitigating the challenges inherent to such processes. In particular, the interactive systems presented in this work support live collaborative creation, in which artifact users collaboratively participate in the artifact creation process with creators in real time. In the systems that I have created, I explored liveness, the extent to which the process of creating artifacts and the state of the artifacts are immediately and continuously perceptible, for applications such as programming, writing, music performance, and UI design. Liveness helps preserve natural expressivity, supports real-time communication, and facilitates participation in the creative process. Live collaboration is beneficial for users and creators alike: making the process of creation visible encourages users to engage in the process and better understand the final artifact. Additionally, creators can receive immediate feedback in a continuous, closed loop with users. Through these interactive systems, non-expert participants help create such artifacts as GUI prototypes, software, and musical performances. This dissertation explores three topics: (1) the challenges inherent to collaborative creation in live settings, and computational tools that address them; (2) methods for reducing the barriers of entry to live collaboration; and (3) approaches to preserving liveness in the creative process, affording creators more expressivity in making artifacts and affording users access to information traditionally only available in real-time processes. In this work, I showed that enabling collaborative, expressive, and live interactions in computational systems allow the broader population to take part in various creative practices.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145810/1/snaglee_1.pd
    corecore