7,052 research outputs found
Acoustic Space Learning for Sound Source Separation and Localization on Binaural Manifolds
In this paper we address the problems of modeling the acoustic space
generated by a full-spectrum sound source and of using the learned model for
the localization and separation of multiple sources that simultaneously emit
sparse-spectrum sounds. We lay theoretical and methodological grounds in order
to introduce the binaural manifold paradigm. We perform an in-depth study of
the latent low-dimensional structure of the high-dimensional interaural
spectral data, based on a corpus recorded with a human-like audiomotor robot
head. A non-linear dimensionality reduction technique is used to show that
these data lie on a two-dimensional (2D) smooth manifold parameterized by the
motor states of the listener, or equivalently, the sound source directions. We
propose a probabilistic piecewise affine mapping model (PPAM) specifically
designed to deal with high-dimensional data exhibiting an intrinsic piecewise
linear structure. We derive a closed-form expectation-maximization (EM)
procedure for estimating the model parameters, followed by Bayes inversion for
obtaining the full posterior density function of a sound source direction. We
extend this solution to deal with missing data and redundancy in real world
spectrograms, and hence for 2D localization of natural sound sources such as
speech. We further generalize the model to the challenging case of multiple
sound sources and we propose a variational EM framework. The associated
algorithm, referred to as variational EM for source separation and localization
(VESSL) yields a Bayesian estimation of the 2D locations and time-frequency
masks of all the sources. Comparisons of the proposed approach with several
existing methods reveal that the combination of acoustic-space learning with
Bayesian inference enables our method to outperform state-of-the-art methods.Comment: 19 pages, 9 figures, 3 table
Access to recorded interviews: A research agenda
Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
Recommended from our members
ToScA North America (6 – 8 June 2017, The University of Texas, Austin, TX) Program
ToScA North America will address key areas of science,
including Multi-modal Imaging, Geosciences, Forensics, Increasing Contrast,
Educational Outreach, Data, Materials Science and Medical and Biological
Science.University of Texas High-Resolution X-ray CT Facility (UTCT);
Jackson School of Geosciences, The University of Texas at Austin;
Natural History Museum (London);
Royal Microscopical Society (Oxford, UK)Geological Science
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image
To facilitate the analysis of human actions, interactions and emotions, we
compute a 3D model of human body pose, hand pose, and facial expression from a
single monocular image. To achieve this, we use thousands of 3D scans to train
a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with
fully articulated hands and an expressive face. Learning to regress the
parameters of SMPL-X directly from images is challenging without paired images
and 3D ground truth. Consequently, we follow the approach of SMPLify, which
estimates 2D features and then optimizes model parameters to fit the features.
We improve on SMPLify in several significant ways: (1) we detect 2D features
corresponding to the face, hands, and feet and fit the full SMPL-X model to
these; (2) we train a new neural network pose prior using a large MoCap
dataset; (3) we define a new interpenetration penalty that is both fast and
accurate; (4) we automatically detect gender and the appropriate body models
(male, female, or neutral); (5) our PyTorch implementation achieves a speedup
of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to
both controlled images and images in the wild. We evaluate 3D accuracy on a new
curated dataset comprising 100 images with pseudo ground-truth. This is a step
towards automatic expressive human capture from monocular RGB data. The models,
code, and data are available for research purposes at
https://smpl-x.is.tue.mpg.de.Comment: To appear in CVPR 201
Accelerated High-Resolution Photoacoustic Tomography via Compressed Sensing
Current 3D photoacoustic tomography (PAT) systems offer either high image
quality or high frame rates but are not able to deliver high spatial and
temporal resolution simultaneously, which limits their ability to image dynamic
processes in living tissue. A particular example is the planar Fabry-Perot (FP)
scanner, which yields high-resolution images but takes several minutes to
sequentially map the photoacoustic field on the sensor plane, point-by-point.
However, as the spatio-temporal complexity of many absorbing tissue structures
is rather low, the data recorded in such a conventional, regularly sampled
fashion is often highly redundant. We demonstrate that combining variational
image reconstruction methods using spatial sparsity constraints with the
development of novel PAT acquisition systems capable of sub-sampling the
acoustic wave field can dramatically increase the acquisition speed while
maintaining a good spatial resolution: First, we describe and model two general
spatial sub-sampling schemes. Then, we discuss how to implement them using the
FP scanner and demonstrate the potential of these novel compressed sensing PAT
devices through simulated data from a realistic numerical phantom and through
measured data from a dynamic experimental phantom as well as from in-vivo
experiments. Our results show that images with good spatial resolution and
contrast can be obtained from highly sub-sampled PAT data if variational image
reconstruction methods that describe the tissues structures with suitable
sparsity-constraints are used. In particular, we examine the use of total
variation regularization enhanced by Bregman iterations. These novel
reconstruction strategies offer new opportunities to dramatically increase the
acquisition speed of PAT scanners that employ point-by-point sequential
scanning as well as reducing the channel count of parallelized schemes that use
detector arrays.Comment: submitted to "Physics in Medicine and Biology
Image-based Lagrangian Particle Tracking in bed-load experiments
Image analysis has been increasingly used for the measurement of river flows due to its capabilities to furnish detailed quantitative depictions
at a relatively low cost. This manuscript describes an application of particle tracking velocimetry (PTV) to a bed-load experiment with lightweight
sediment. The key characteristics of the investigated sediment transport conditions were the presence of a covered flow and of a fixed rough
bed above which particles were released in limited number at the flume inlet. Under the applied flow conditions, the motion of the individual
bed-load particles was intermittent, with alternating movement and stillness terms. The flow pattern was preliminarily characterized by acoustic
measurements of vertical profiles of the stream-wise velocity. During process visualization, a large field of view was obtained using two actioncameras
placed at different locations along the flume. The experimental protocol is described in terms of channel calibration, experiment
realization, image pre-processing, automatic particle tracking, and post-processing of particle track data from the two cameras. The presented
proof-of-concept results include probability distributions of the particle hop length and duration. The achievements of this work are compared to
those of existing literature to demonstrate the validity of the protocol
- …