Search CORE

28,457 research outputs found

Speech Separation Using Partially Asynchronous Microphone Arrays Without Resampling

Author: Corey Ryan M.
Singer Andrew C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/07/2018
Field of study

We consider the problem of separating speech sources captured by multiple spatially separated devices, each of which has multiple microphones and samples its signals at a slightly different rate. Most asynchronous array processing methods rely on sample rate offset estimation and resampling, but these offsets can be difficult to estimate if the sources or microphones are moving. We propose a source separation method that does not require offset estimation or signal resampling. Instead, we divide the distributed array into several synchronous subarrays. All arrays are used jointly to estimate the time-varying signal statistics, and those statistics are used to design separate time-varying spatial filters in each array. We demonstrate the method for speech mixtures recorded on both stationary and moving microphone arrays.Comment: To appear at the International Workshop on Acoustic Signal Enhancement (IWAENC 2018

arXiv.org e-Print Archive

Crossref

The Visual Centrifuge: Model-Free Layered Video Representations

Author: Alayrac Jean-Baptiste
Carreira João
Zisserman Andrew
Publication venue
Publication date: 04/04/2019
Field of study

True video understanding requires making sense of non-lambertian scenes where the color of light arriving at the camera sensor encodes information about not just the last object it collided with, but about multiple mediums -- colored windows, dirty mirrors, smoke or rain. Layered video representations have the potential of accurately modelling realistic scenes but have so far required stringent assumptions on motion, lighting and shape. Here we propose a learning-based approach for multi-layered video representation: we introduce novel uncertainty-capturing 3D convolutional architectures and train them to separate blended videos. We show that these models then generalize to single videos, where they exhibit interesting abilities: color constancy, factoring out shadows and separating reflections. We present quantitative and qualitative results on real world videos.Comment: Appears in: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019). This arXiv contains the CVPR Camera Ready version of the paper (although we have included larger figures) as well as an appendix detailing the model architectur

arXiv.org e-Print Archive

Oxford University Research Archive

Capture, Learning, and Synthesis of 3D Speaking Styles

Author: Black Michael J.
Bolkart Timo
Cudeiro Daniel
Laidlaw Cassidy
Ranjan Anurag
Publication venue
Publication date: 01/01/2019
Field of study

Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input - even speech in languages other than English - and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Geometric Wavelet Scattering Networks on Compact Riemannian Manifolds

Author: Gao Feng
Hirn Matthew
Perlmutter Michael
Wolf Guy
Publication venue
Publication date: 16/07/2020
Field of study

The Euclidean scattering transform was introduced nearly a decade ago to improve the mathematical understanding of convolutional neural networks. Inspired by recent interest in geometric deep learning, which aims to generalize convolutional neural networks to manifold and graph-structured domains, we define a geometric scattering transform on manifolds. Similar to the Euclidean scattering transform, the geometric scattering transform is based on a cascade of wavelet filters and pointwise nonlinearities. It is invariant to local isometries and stable to certain types of diffeomorphisms. Empirical results demonstrate its utility on several geometric learning tasks. Our results generalize the deformation stability and local translation invariance of Euclidean scattering, and demonstrate the importance of linking the used filter structures to the underlying geometry of the data.Comment: 35 pages; 3 figures; 2 tables; v3: Revisions based on reviewer comment

arXiv.org e-Print Archive

Recommended from our members

Vortex Shedding and Modal Behavior of a Circular Cylinder Equipped with Flexible Flaps

Author: Brücker C.
Geyer T. F.
Kamps L.
Sarradj E.
Publication venue: 'S. Hirzel Verlag'
Publication date: 01/01/2019
Field of study

When a cylinder is subject to a flow, vortices will be shed that can lead to strong tonal noise. The modification of the cylinder with soft, flexible flaps made of silicone rubber has been shown to affect the vortex shedding cycle in a way that the Strouhal number associated with the vortex shedding suddenly jumps to a higher value at a certain Reynolds number. In the present study, the effect of the flexible flaps on the vortex shedding is further examined by subsequently reducing the number of flaps and additionally shortening their length. Acoustic measurements and camera recordings of the flap motion, performed in an aeroacoustic wind tunnel, suggest that the sudden jump of the Reynolds number is caused by the movement of the outer flaps. A comparison with the eigenfrequencies obtained from a numerical modal analysis of the different flap rings revealed that the cause of the Strouhal number jump is most likely a lock-in of the natural vortex shedding cycle with the next higher eigenfrequency of the outer flaps

City Research Online

Crossref

Attractor reconstruction of an impact oscillator for parameter identification

Author: Baptista Murilo Da Silva
Ing James
Sayah Mukthar
Wiercigroch Marian
Publication venue: 'Elsevier BV'
Publication date: 05/09/2015
Field of study

Peer reviewedPreprin

Aberdeen University Research

Recommended from our members

An Investigation of the Sound Field Above a Surface With Periodically-Spaced Roughness

Author: Stronach Alex
Publication venue
Publication date: 31/03/2020
Field of study

Outdoor audio-frequency acoustic signals can be amplified passively at selected frequencies by exploiting the interaction of incident sound with surfaces composed of periodically-spaced rectangular strips on an acoustically-hard base. When sound is incident near grazing on acoustically-rigid ground with roughness composed from elements with periodic sub-wavelength spacing, air-borne acoustic surface waves are generated due to a high imaginary-component to the surface impedance as well as the formation and coupling of quarter-wavelength resonances in the gaps. This allows for passive amplification of acoustic signals at the surface wave frequency. This thesis provides a detailed, systematic study into the total sound field generated above surfaces with periodic roughness and how the topography and geometry affect the generation of air-borne acoustic surface waves. Surfaces with a high number of scattering edges per wavelength result in strong surface wave generation due to high reactive component to the impedance. As the gap is increased thereby reducing the number of edges per wavelength, the gap resonances couple less strongly and the surface behaves as a rough surface. As the number of edges per wavelength approaches one, the signal enhancement is provided by Bragg diffraction. Through measurements and predictions, it is found that surface wave enhancement is not detected by a collocated geophone in sand via acoustic-seismic coupling since the sand is sufficiently absorbing so that no surface wave is detected. This systematic study provides a detailed insight into the formation of audio-frequency surface waves generated over periodically-rough surfaces

Open Research Online (The Open University)