5,931 research outputs found
Probabilistic Modeling Paradigms for Audio Source Separation
This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems
Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements
We present an approach to depth estimation that fuses information from a
stereo pair with sparse range measurements derived from a LIDAR sensor or a
range camera. The goal of this work is to exploit the complementary strengths
of the two sensor modalities, the accurate but sparse range measurements and
the ambiguous but dense stereo information. These two sources are effectively
and efficiently fused by combining ideas from anisotropic diffusion and
semi-global matching.
We evaluate our approach on the KITTI 2015 and Middlebury 2014 datasets,
using randomly sampled ground truth range measurements as our sparse depth
input. We achieve significant performance improvements with a small fraction of
range measurements on both datasets. We also provide qualitative results from
our platform using the PMDTec Monstar sensor. Our entire pipeline runs on an
NVIDIA TX-2 platform at 5Hz on 1280x1024 stereo images with 128 disparity
levels.Comment: 7 pages, 5 figures, 2 table
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Depth Fields: Extending Light Field Techniques to Time-of-Flight Imaging
A variety of techniques such as light field, structured illumination, and
time-of-flight (TOF) are commonly used for depth acquisition in consumer
imaging, robotics and many other applications. Unfortunately, each technique
suffers from its individual limitations preventing robust depth sensing. In
this paper, we explore the strengths and weaknesses of combining light field
and time-of-flight imaging, particularly the feasibility of an on-chip
implementation as a single hybrid depth sensor. We refer to this combination as
depth field imaging. Depth fields combine light field advantages such as
synthetic aperture refocusing with TOF imaging advantages such as high depth
resolution and coded signal processing to resolve multipath interference. We
show applications including synthesizing virtual apertures for TOF imaging,
improved depth mapping through partial and scattering occluders, and single
frequency TOF phase unwrapping. Utilizing space, angle, and temporal coding,
depth fields can improve depth sensing in the wild and generate new insights
into the dimensions of light's plenoptic function.Comment: 9 pages, 8 figures, Accepted to 3DV 201
Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery
One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-opera- tive morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilites by observ- ing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted in- struments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This paper reviews the state-of-the-art methods for optical intra-operative 3D reconstruction in laparoscopic surgery and discusses the technical challenges and future perspectives towards clinical translation. With the recent paradigm shift of surgical practice towards MIS and new developments in 3D opti- cal imaging, this is a timely discussion about technologies that could facilitate complex CAS procedures in dynamic and deformable anatomical regions
Learning Rank Reduced Interpolation with Principal Component Analysis
In computer vision most iterative optimization algorithms, both sparse and
dense, rely on a coarse and reliable dense initialization to bootstrap their
optimization procedure. For example, dense optical flow algorithms profit
massively in speed and robustness if they are initialized well in the basin of
convergence of the used loss function. The same holds true for methods as
sparse feature tracking when initial flow or depth information for new features
at arbitrary positions is needed. This makes it extremely important to have
techniques at hand that allow to obtain from only very few available
measurements a dense but still approximative sketch of a desired 2D structure
(e.g. depth maps, optical flow, disparity maps, etc.). The 2D map is regarded
as sample from a 2D random process. The method presented here exploits the
complete information given by the principal component analysis (PCA) of that
process, the principal basis and its prior distribution. The method is able to
determine a dense reconstruction from sparse measurement. When facing
situations with only very sparse measurements, typically the number of
principal components is further reduced which results in a loss of
expressiveness of the basis. We overcome this problem and inject prior
knowledge in a maximum a posterior (MAP) approach. We test our approach on the
KITTI and the virtual KITTI datasets and focus on the interpolation of depth
maps for driving scenes. The evaluation of the results show good agreement to
the ground truth and are clearly better than results of interpolation by the
nearest neighbor method which disregards statistical information.Comment: Accepted at Intelligent Vehicles Symposium (IV), Los Angeles, USA,
June 201
- …