28,457 research outputs found
Speech Separation Using Partially Asynchronous Microphone Arrays Without Resampling
We consider the problem of separating speech sources captured by multiple
spatially separated devices, each of which has multiple microphones and samples
its signals at a slightly different rate. Most asynchronous array processing
methods rely on sample rate offset estimation and resampling, but these offsets
can be difficult to estimate if the sources or microphones are moving. We
propose a source separation method that does not require offset estimation or
signal resampling. Instead, we divide the distributed array into several
synchronous subarrays. All arrays are used jointly to estimate the time-varying
signal statistics, and those statistics are used to design separate
time-varying spatial filters in each array. We demonstrate the method for
speech mixtures recorded on both stationary and moving microphone arrays.Comment: To appear at the International Workshop on Acoustic Signal
Enhancement (IWAENC 2018
The Visual Centrifuge: Model-Free Layered Video Representations
True video understanding requires making sense of non-lambertian scenes where
the color of light arriving at the camera sensor encodes information about not
just the last object it collided with, but about multiple mediums -- colored
windows, dirty mirrors, smoke or rain. Layered video representations have the
potential of accurately modelling realistic scenes but have so far required
stringent assumptions on motion, lighting and shape. Here we propose a
learning-based approach for multi-layered video representation: we introduce
novel uncertainty-capturing 3D convolutional architectures and train them to
separate blended videos. We show that these models then generalize to single
videos, where they exhibit interesting abilities: color constancy, factoring
out shadows and separating reflections. We present quantitative and qualitative
results on real world videos.Comment: Appears in: 2019 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2019). This arXiv contains the CVPR Camera Ready version of
the paper (although we have included larger figures) as well as an appendix
detailing the model architectur
Capture, Learning, and Synthesis of 3D Speaking Styles
Audio-driven 3D facial animation has been widely explored, but achieving
realistic, human-like performance is still unsolved. This is due to the lack of
available 3D datasets, models, and standard evaluation metrics. To address
this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans
captured at 60 fps and synchronized audio from 12 speakers. We then train a
neural network on our dataset that factors identity from facial motion. The
learned model, VOCA (Voice Operated Character Animation) takes any speech
signal as input - even speech in languages other than English - and
realistically animates a wide range of adult faces. Conditioning on subject
labels during training allows the model to learn a variety of realistic
speaking styles. VOCA also provides animator controls to alter speaking style,
identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball
rotations) during animation. To our knowledge, VOCA is the only realistic 3D
facial animation model that is readily applicable to unseen subjects without
retargeting. This makes VOCA suitable for tasks like in-game video, virtual
reality avatars, or any scenario in which the speaker, speech, or language is
not known in advance. We make the dataset and model available for research
purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201
Geometric Wavelet Scattering Networks on Compact Riemannian Manifolds
The Euclidean scattering transform was introduced nearly a decade ago to
improve the mathematical understanding of convolutional neural networks.
Inspired by recent interest in geometric deep learning, which aims to
generalize convolutional neural networks to manifold and graph-structured
domains, we define a geometric scattering transform on manifolds. Similar to
the Euclidean scattering transform, the geometric scattering transform is based
on a cascade of wavelet filters and pointwise nonlinearities. It is invariant
to local isometries and stable to certain types of diffeomorphisms. Empirical
results demonstrate its utility on several geometric learning tasks. Our
results generalize the deformation stability and local translation invariance
of Euclidean scattering, and demonstrate the importance of linking the used
filter structures to the underlying geometry of the data.Comment: 35 pages; 3 figures; 2 tables; v3: Revisions based on reviewer
comment
Recommended from our members
Vortex Shedding and Modal Behavior of a Circular Cylinder Equipped with Flexible Flaps
When a cylinder is subject to a flow, vortices will be shed that can lead to strong tonal noise. The modification of the cylinder with soft, flexible flaps made of silicone rubber has been shown to affect the vortex shedding cycle in a way that the Strouhal number associated with the vortex shedding suddenly jumps to a higher value at a certain Reynolds number. In the present study, the effect of the flexible flaps on the vortex shedding is further examined by subsequently reducing the number of flaps and additionally shortening their length. Acoustic measurements and camera recordings of the flap motion, performed in an aeroacoustic wind tunnel, suggest that the sudden jump of the Reynolds number is caused by the movement of the outer flaps. A comparison with the eigenfrequencies obtained from a numerical modal analysis of the different flap rings revealed that the cause of the Strouhal number jump is most likely a lock-in of the natural vortex shedding cycle with the next higher eigenfrequency of the outer flaps
Attractor reconstruction of an impact oscillator for parameter identification
Peer reviewedPreprin
Recommended from our members
An Investigation of the Sound Field Above a Surface With Periodically-Spaced Roughness
Outdoor audio-frequency acoustic signals can be amplified passively at selected frequencies by exploiting the interaction of incident sound with surfaces composed of periodically-spaced rectangular strips on an acoustically-hard base.
When sound is incident near grazing on acoustically-rigid ground with roughness composed from elements with periodic sub-wavelength spacing, air-borne acoustic surface waves are generated due to a high imaginary-component to the surface impedance as well as the formation and coupling of quarter-wavelength resonances in the gaps. This allows for passive amplification of acoustic signals at the surface wave frequency. This thesis provides a detailed, systematic study into the total sound field generated above surfaces with periodic roughness and how the topography and geometry affect the generation of air-borne acoustic surface waves.
Surfaces with a high number of scattering edges per wavelength result in strong surface wave generation due to high reactive component to the impedance. As the gap is increased thereby reducing the number of edges per wavelength, the gap resonances couple less strongly and the surface behaves as a rough surface. As the number of edges per wavelength approaches one, the signal enhancement is provided by Bragg diffraction. Through measurements and predictions, it is found that surface wave enhancement is not detected by a collocated geophone in sand via acoustic-seismic coupling since the sand is sufficiently absorbing so that no surface wave is detected. This systematic study provides a detailed insight into the formation of audio-frequency surface waves generated over periodically-rough surfaces
- …