Search CORE

204 research outputs found

Motion compensation and very low bit rate video coding

Author: Lin Shu
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/1997
Field of study

Recently, many activities of the International Telecommunication Union (ITU) and the International Standard Organization (ISO) are leading to define new standards for very low bit-rate video coding, such as H.263 and MPEG-4 after successful applications of the international standards H.261 and MPEG-1/2 for video coding above 64kbps. However, at very low bit-rate the classic block matching based DCT video coding scheme suffers seriously from blocking artifacts which degrade the quality of reconstructed video frames considerably. To solve this problem, a new technique in which motion compensation is based on dense motion field is presented in this dissertation. Four efficient new video coding algorithms based on this new technique for very low bit-rate are proposed. (1) After studying model-based video coding algorithms, we propose an optical flow based video coding algorithm with thresh-olding techniques. A statistic model is established for distribution of intensity difference between two successive frames, and four thresholds are used to control the bit-rate and the quality of reconstructed frames. It outperforms the typical model-based techniques in terms of complexity and quality of reconstructed frames. (2) An efficient algorithm using DCT coded optical flow. It is found that dense motion fields can be modeled as the first order auto-regressive model, and efficiently compressed with DCT technique, hence achieving very low bit-rate and higher visual quality than the H.263/TMN5. (3) A region-based discrete wavelet transform video coding algorithm. This algorithm implements dense motion field and regions are segmented according to their content significance. The DWT is applied to residual images region by region, and bits are adaptively allocated to regions. It improves the visual quality and PSNR of significant regions while maintaining low bit-rate. (4) A segmentation-based video coding algorithm for stereo sequence. A correlation-feedback algorithm with Kalman filter is utilized to improve the accuracy of optical flow fields. Three criteria, which are associated with 3-D information, 2-D connectivity and motion vector fields, respectively, are defined for object segmentation. A chain code is utilized to code the shapes of the segmented objects. it can achieve very high compression ratio up to several thousands

Digital Commons @ New Jersey Institute of Technology (NJIT)

Audio Visual Speech Recognition and Segmentation Based on DBN Models

Author: Dongmei Jiang
Guoyun Lv
Hichem Sahli
Ilse Ravyse
Rongchun Zhao
Xiaoyue Jiang
Yanning Zhang
Publication venue: 'IntechOpen'
Publication date: 01/06/2007
Field of study

IntechOpen

CiteSeerX

Fusing face and body gesture for machine recognition of emotions

Author: Gunes H
Piccardi M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2005
Field of study

Research shows that humans are more likely to consider computers to be human-like when those computers understand and display appropriate nonverbal communicative behavior. Most of the existing systems attempting to analyze the human nonverbal behavior focus only on the face; research that aims to integrate gesture as an expression mean has only recently emerged. This paper presents an approach to automatic visual recognition of expressive face and upper body action units (FAUs and BAUs) suitable for use in a vision-based affective multimodal framework. After describing the feature extraction techniques, classification results from three subjects are presented. Firstly, individual classifiers are trained separately with face and body features for classification into FAU and BAU categories. Secondly, the same procedure is applied for classification into labeled emotion categories. Finally, we fuse face and body information for classification into combined emotion categories. In our experiments, the emotion classification using the two modalities achieved a better recognition accuracy outperforming the classification using the individual face modality. © 2005 IEEE

OPUS - University of Technology Sydney

Fitting and tracking of a scene model in very low bit rate video coding

Author: Antoszczyszyn Paul
Publication venue: The University of Edinburgh
Publication date: 01/01/1998
Field of study

Edinburgh Research Archive

Automatic Visual Speech Recognition

Author: Alin Chiţu
Léon J.M. Rothkrantz
Publication venue: 'IntechOpen'
Publication date: 03/03/2012
Field of study

Intelligent SystemsElectrical Engineering, Mathematics and Computer Scienc

IntechOpen

Crossref

TU Delft Repository

King's speech: pronounce a foreign language with style

Author: Athanasopoulos Georgios
Chatelain Julie
Cierro Alessandro
Guérit Robin
Hagihara Kaori
Lucas Céline
Lugan Sébastien
Macq Benoît
Publication venue: 'Universidade Catolica Portuguesa'
Publication date: 01/01/2018
Field of study

Computer assisted pronunciation training requires strategies that capture the attention of the learners and guide them along the learning pathway. In this paper, we introduce an immersive storytelling scenario for creating appropriate learning conditions. The proposed learning interaction is orchestrated by a spoken karaoke. We motivate the concept of the spoken karaoke and describe our design. Driven by the requirements of the proposed scenario, we suggest a modular architecture designed for immersive learning applications. We present our prototype system and our approach for the processing of spoken and visual interaction modalities. Finally, we discuss how technological challenges can be addressed in order to enable the learner's self-evaluation

Revistas Científicas da Universidade Católica Portuguesa

DIAL UCLouvain

Registration and statistical analysis of the tongue shape during speech production

Author: Hewer Alexander
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2019
Field of study

This thesis analyzes the human tongue shape during speech production. First, a semi-supervised approach is derived for estimating the tongue shape from volumetric magnetic resonance imaging data of the human vocal tract. Results of this extraction are used to derive parametric tongue models. Next, a framework is presented for registering sparse motion capture data of the tongue by means of such a model. This method allows to generate full three-dimensional animations of the tongue. Finally, a multimodal and statistical text-to-speech system is developed that is able to synthesize audio and synchronized tongue motion from text.Diese Dissertation beschäftigt sich mit der Analyse der menschlichen Zungenform während der Sprachproduktion. Zunächst wird ein semi-überwachtes Verfahren vorgestellt, mit dessen Hilfe sich Zungenformen von volumetrischen Magnetresonanztomographie- Aufnahmen des menschlichen Vokaltrakts schätzen lassen. Die Ergebnisse dieses Extraktionsverfahrens werden genutzt, um ein parametrisches Zungenmodell zu konstruieren. Danach wird eine Methode hergeleitet, die ein solches Modell nutzt, um spärliche Bewegungsaufnahmen der Zunge zu registrieren. Dieser Ansatz erlaubt es, dreidimensionale Animationen der Zunge zu erstellen. Zuletzt wird ein multimodales und statistisches Text-to-Speech-System entwickelt, das in der Lage ist, Audio und die dazu synchrone Zungenbewegung zu synthetisieren.German Research Foundatio

Universaar

Acronym

Visual and auditory cortices represent acoustic speech-related information during silent lip reading

Author: Brohl Felix
Kayser Christoph
Keitel Anne
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 22/02/2022
Field of study

University of Dundee Online Publications

Articulatory features for robust visual speech recognition

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Seeing a talking face matters to infants, children and adults : behavioural and neurophysiological studies

Author: Tan Sok Hui (Jessica)
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2020
Field of study

Everyday conversations typically occur face-to-face. Over and above auditory information, visual information from a speaker’s face, e.g., lips, eyebrows, contributes to speech perception and comprehension. The facilitation that visual speech cues bring— termed the visual speech benefit—are experienced by infants, children and adults. Even so, studies on speech perception have largely focused on auditory-only speech leaving a relative paucity of research on the visual speech benefit. Central to this thesis are the behavioural and neurophysiological manifestations of the visual speech benefit. As the visual speech benefit assumes that a listener is attending to a speaker’s talking face, the investigations are conducted in relation to the possible modulating effects that gaze behaviour brings. Three investigations were conducted. Collectively, these studies demonstrate that visual speech information facilitates speech perception, and this has implications for individuals who do not have clear access to the auditory speech signal. The results, for instance the enhancement of 5-month-olds’ cortical tracking by visual speech cues, and the effect of idiosyncratic differences in gaze behaviour on speech processing, expand knowledge of auditory-visual speech processing, and provide firm bases for new directions in this burgeoning and important area of research

Western Sydney ResearchDirect