Search CORE

6,881 research outputs found

A Variability-Based Testing Approach for Synthesizing Video Sequences

Author: Benoit Baudry
David Benavides
Inria Rennes
José A. Galindo
Mathieu Acher
Mauricio Alférez
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2014
Field of study

A key problem when developing video processing software is the di culty to test di erent input combinations. In this paper, we present VANE, a variability-based testing approach to derive video sequence variants. The ideas of VANE are i) to encode in a variability model what can vary within a video sequence; ii) to exploit the variability model to generate testable con gurations; iii) to synthesize variants of video sequences corresponding to con gurations. VANE computes T-wise covering sets while optimizing a function over attributes. Also, we present a preliminary validation of the scalability and practicality of VANE in the context of an industrial project involving the test of video processing algorithms.Ministerio de Economía y Competitividad TIN2012-32273Junta de Andalucía TIC-5906Junta de Andalucía P12-TIC-186

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

idUS. Depósito de Investigación Universidad de Sevilla

HAL-Rennes 1

ViViD: A Variability-Based Tool for Synthesizing Video Sequences

Author: Acher Mathieu
Alférez Mauricio
Baudry Benoit
Galindo Duarte José Ángel
Romenteau Pierre
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2014
Field of study

We present ViViD, a variability-based tool to synthesize variants of video sequences. ViViD is developed and used in the context of an industrial project involving consumers and providers of video processing algorithms. The goal is to synthesize synthetic video variants with a wide range of characteristics to then test the algorithms. We describe the key components of ViViD (1) a variability language and an environment to model what can vary within a video sequence;(2) a reasoning back-end to generate relevant testing configurations; (3) a video synthesizer in charge of producing variants of video sequences corresponding to configurations. We show how ViViD can synthesize realistic videos with differ-ent characteristics such as luminances, vehicles and persons that cover a diversity of testing scenarios

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

idUS. Depósito de Investigación Universidad de Sevilla

HAL-Rennes 1

Unsupervised Video Understanding by Reconciliation of Posture Similarities

Author: Bautista Miguel
Milbich Timo
Ommer Bjorn
Sutter Ekaterina
Publication venue
Publication date: 03/08/2017
Field of study

Understanding human activity and being able to explain it in detail surpasses mere action classification by far in both complexity and value. The challenge is thus to describe an activity on the basis of its most fundamental constituents, the individual postures and their distinctive transitions. Supervised learning of such a fine-grained representation based on elementary poses is very tedious and does not scale. Therefore, we propose a completely unsupervised deep learning procedure based solely on video sequences, which starts from scratch without requiring pre-trained networks, predefined body models, or keypoints. A combinatorial sequence matching algorithm proposes relations between frames from subsets of the training data, while a CNN is reconciling the transitivity conflicts of the different subsets to learn a single concerted pose embedding despite changes in appearance across sequences. Without any manual annotation, the model learns a structured representation of postures and their temporal development. The model not only enables retrieval of similar postures but also temporal super-resolution. Additionally, based on a recurrent formulation, next frames can be synthesized.Comment: Accepted by ICCV 201

arXiv.org e-Print Archive

Crossref

Modeling Variability in the Video Domain: Language and Experience Report

Author: Acher Mathieu
Alférez Mauricio
Baudry Benoit
Galindo Duarte José Angel
Publication venue: HAL CCSD
Publication date: 11/07/2014
Field of study

This paper reports about a new domain-specific variability modeling language, called VM, resulting from the close collaboration with industrial partners in the video domain. We expose the requirements and advanced variability constructs required to characterize and realize variations of physical properties of a video (such as objects' speed or scene illumination). The results of our experiments and industrial experience show that VM is effective to model complex variability information and can be exploited to synthesize video variants. We concluded that basic variability mechanisms are useful but not enough, attributes and multi-features are of prior importance, and meta-information is relevant for efficient variability analysis. In addition, we questioned the existence of one-size-fits-all variability modeling solution applicable in any industry. Yet, some common needs for modeling variability are becoming apparent such as support for attributes and multi-features.Ce document décrit un nouveau langage de modélisation dédiée à la variabilité, appelé VM, résultant de la collaboration avec des partenaires industriels dans le domaine de la vidéo. Nous exposons les exigences et les constructions de la variabilité avancées requises pour caractériser et implémenter les variations des propriétés physiques d'une vidéo (tels que la vitesse des objets ou l'illumination de la scène). Les résultats de nos expérimentations et de l'expérience industrielle montrent que VM est efficace pour modéliser l'information de variabilité complexe et peut être exploitée pour synthétiser des variantes de vidéo. Nous avons conclu que les mécanismes basiques de la variabilité sont certes utiles, mais insuffisants. Les attributs et multi-caractéristiques sont nécessaires alors que les méta-informations sont pertinentes pour une analyse efficace de la variabilité. En s'appuyant sur notre expérience, nous mettons en doute l'existence d'une solution de modélisation de la variabilité applicable à n'importe quelle industrie et domaine. Néanmoins, certains besoins communs pour la modélisation de la variabilité à sont apparents, comme le support pour les attributs et multi-caractéristiques

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI

Author: Fels Sidney
Saha Pramit
Srungarapu Praneeth
Publication venue
Publication date: 29/07/2018
Field of study

Vocal tract configurations play a vital role in generating distinguishable speech sounds, by modulating the airflow and creating different resonant cavities in speech production. They contain abundant information that can be utilized to better understand the underlying speech production mechanism. As a step towards automatic mapping of vocal tract shape geometry to acoustics, this paper employs effective video action recognition techniques, like Long-term Recurrent Convolutional Networks (LRCN) models, to identify different vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract. Such a model typically combines a CNN based deep hierarchical visual feature extractor with Recurrent Networks, that ideally makes the network spatio-temporally deep enough to learn the sequential dynamics of a short video clip for video classification tasks. We use a database consisting of 2D real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The comparative performances of this class of algorithms under various parameter settings and for various classification tasks are discussed. Interestingly, the results show a marked difference in the model performance in the context of speech classification with respect to generic sequence or video classification tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding

arXiv.org e-Print Archive

Crossref

Capture, Learning, and Synthesis of 3D Speaking Styles

Author: Black Michael J.
Bolkart Timo
Cudeiro Daniel
Laidlaw Cassidy
Ranjan Anurag
Publication venue
Publication date: 01/01/2019
Field of study

Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input - even speech in languages other than English - and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Speech-Driven 3D Face Animation with Composite and Regional Facial Movements

Author: Jia Jia
Wen Qi
Wen Xiang
Wu Haozhe
Xing Junliang
Zhou Songtao
Publication venue
Publication date: 10/08/2023
Field of study

Speech-driven 3D face animation poses significant challenges due to the intricacy and variability inherent in human facial movements. This paper emphasizes the importance of considering both the composite and regional natures of facial movements in speech-driven 3D face animation. The composite nature pertains to how speech-independent factors globally modulate speech-driven facial movements along the temporal dimension. Meanwhile, the regional nature alludes to the notion that facial movements are not globally correlated but are actuated by local musculature along the spatial dimension. It is thus indispensable to incorporate both natures for engendering vivid animation. To address the composite nature, we introduce an adaptive modulation module that employs arbitrary facial movements to dynamically adjust speech-driven facial movements across frames on a global scale. To accommodate the regional nature, our approach ensures that each constituent of the facial features for every frame focuses on the local spatial movements of 3D faces. Moreover, we present a non-autoregressive backbone for translating audio to 3D facial movements, which maintains high-frequency nuances of facial movements and facilitates efficient inference. Comprehensive experiments and user studies demonstrate that our method surpasses contemporary state-of-the-art approaches both qualitatively and quantitatively.Comment: Accepted by MM 2023, 9 pages, 7 figures. arXiv admin note: text overlap with arXiv:2303.0979

arXiv.org e-Print Archive