3,934 research outputs found

    Lip syncing method for realistic expressive 3D face model

    Get PDF
    Lip synchronization of 3D face model is now being used in a multitude of important fields. It brings a more human, social and dramatic reality to computer games, films and interactive multimedia, and is growing in use and importance. High level of realism can be used in demanding applications such as computer games and cinema. Authoring lip syncing with complex and subtle expressions is still difficult and fraught with problems in terms of realism. This research proposed a lip syncing method of realistic expressive 3D face model. Animated lips requires a 3D face model capable of representing the myriad shapes the human face experiences during speech and a method to produce the correct lip shape at the correct time. The paper presented a 3D face model designed to support lip syncing that align with input audio file. It deforms using Raised Cosine Deformation (RCD) function that is grafted onto the input facial geometry. The face model was based on MPEG-4 Facial Animation (FA) Standard. This paper proposed a method to animate the 3D face model over time to create animated lip syncing using a canonical set of visemes for all pairwise combinations of a reduced phoneme set called ProPhone. The proposed research integrated emotions by the consideration of Ekman model and Plutchik’s wheel with emotive eye movements by implementing Emotional Eye Movements Markup Language (EEMML) to produce realistic 3D face model. © 2017 Springer Science+Business Media New Yor

    Enhancing Situational Awareness for Rotorcraft Pilots Using Virtual and Augmented Reality

    Get PDF
    Rotorcraft pilots often face the challenge of processing a multitude of data, integrating it with prior experience and making informed decisions in complex, rapidly changing multisensory environments. Virtual Reality (VR), and more recently Augmented Reality (AR) technologies have been applied for providing users with immersive, interactive and navigable experiences. The research work described in this thesis demonstrates that VR/AR are particularly effective in providing real-time information without detracting from the pilot\u27s mission in both civilian and military engagements. The immersion of the pilot inside of the VR model provides enhanced realism. Interaction with the VR environment allows pilots to practice appropriately responding to simulated threats. Navigation allows the VR environment to change with varying parameters. In this thesis, VR/AR environments are applied for the design and development of a head-up display (HUD) for helicopter pilots. The usability of the HUD that is developed as a part of this thesis is assessed using established frameworks for human systems engineering by incorporating best practices for user-centered design. The research work described in this thesis will demonstrate that VR/AR environments can provide flexible, ergonomic, and user-focused interfaces for real-time operations in complex, multisensory environments

    Multimedia Data Analysis using ImageTcl (Extended version)

    Get PDF
    ImageTcl is an new system which provides powerful Tcl/Tk based media scripting capabilities similar to those of the ViewSystem and Rivl in a unique environment that allows rapid prototyping and development of new components in the C++ language. Powerful user tools automate the creation of new components as well as the addition of new data types and file formats. Applications using ImageTcl at the Dartmouth Experimental Visualization Laboratory (DEVLAB) include multiple stream media data analysis, automatic image annotation, and image sequence motion analysis. ImageTcl combines the high speed of compiled languages with the testing and parameterization advantages of scripting languages

    Design of an embedded speech-centric interface for applications in handheld terminals

    Get PDF
    The embedded speech-centric interface for handheld wireless devices has been implemented on a commercially available PDA as a part of an application that allows real-time access to stock prices through GPRS. In this article, we have focused mainly in the optimization of the ASR subsystem for minimizing the use of the handheld computational resources. This optimization has been accomplished through the fixed-point implementation of all the algorithms involved in the ASR subsystem and the use of PCA to reduce the feature vector dimensionality. The influence of several parameters, such as the Qn resolution in the fixed-point implementation and the number of PCA components retained, have been studied and evaluated in the ASR subsystem, obtaining word recognition rates of around 96% for the best configuration. Finally, a field evaluation of the system has been performed showing that our design of the speech centric interface achieved good results in a real-life scenario.This work was supported in part by the Spanish Government grants TSI-020110-2009-103, IPT-120000-2010-24, and TEC2011-26807 and the Spanish Regional grant CCG08-UC3M/TIC-4457

    Audio-visual speech processing system for Polish applicable to human-computer interaction

    Get PDF
    This paper describes audio-visual speech recognition system for Polish language and a set of performance tests under various acoustic conditions. We first present the overall structure of AVASR systems with three main areas: audio features extraction, visual features extraction and subsequently, audiovisual speech integration. We present MFCC features for audio stream with standard HMM modeling technique, then we describe appearance and shape based visual features. Subsequently we present two feature integration techniques, feature concatenation and model fusion. We also discuss the results of a set of experiments conducted to select best system setup for Polish, under noisy audio conditions. Experiments are simulating human-computer interaction in computer control case with voice commands in difficult audio environments. With Active Appearance Model (AAM) and multistream Hidden Markov Model (HMM) we can improve system accuracy by reducing Word Error Rate for more than 30%, comparing to audio-only speech recognition, when Signal-to-Noise Ratio goes down to 0dB

    Automatic speaker recognition

    Get PDF
    06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır
    corecore