1,035 research outputs found
Simulations for Validation of Vision Systems
As the computer vision matures into a systems science and engineering
discipline, there is a trend in leveraging latest advances in computer graphics
simulations for performance evaluation, learning, and inference. However, there
is an open question on the utility of graphics simulations for vision with
apparently contradicting views in the literature. In this paper, we place the
results from the recent literature in the context of performance
characterization methodology outlined in the 90's and note that insights
derived from simulations can be qualitative or quantitative depending on the
degree of fidelity of models used in simulation and the nature of the question
posed by the experimenter. We describe a simulation platform that incorporates
latest graphics advances and use it for systematic performance characterization
and trade-off analysis for vision system design. We verify the utility of the
platform in a case study of validating a generative model inspired vision
hypothesis, Rank-Order consistency model, in the contexts of global and local
illumination changes, and bad weather, and high-frequency noise. Our approach
establishes the link between alternative viewpoints, involving models with
physics based semantics and signal and perturbation semantics and confirms
insights in literature on robust change detection
Image Restoration Using Joint Statistical Modeling in Space-Transform Domain
This paper presents a novel strategy for high-fidelity image restoration by
characterizing both local smoothness and nonlocal self-similarity of natural
images in a unified statistical manner. The main contributions are three-folds.
First, from the perspective of image statistics, a joint statistical modeling
(JSM) in an adaptive hybrid space-transform domain is established, which offers
a powerful mechanism of combining local smoothness and nonlocal self-similarity
simultaneously to ensure a more reliable and robust estimation. Second, a new
form of minimization functional for solving image inverse problem is formulated
using JSM under regularization-based framework. Finally, in order to make JSM
tractable and robust, a new Split-Bregman based algorithm is developed to
efficiently solve the above severely underdetermined inverse problem associated
with theoretical proof of convergence. Extensive experiments on image
inpainting, image deblurring and mixed Gaussian plus salt-and-pepper noise
removal applications verify the effectiveness of the proposed algorithm.Comment: 14 pages, 18 figures, 7 Tables, to be published in IEEE Transactions
on Circuits System and Video Technology (TCSVT). High resolution pdf version
and Code can be found at: http://idm.pku.edu.cn/staff/zhangjian/IRJSM
A sparsity-driven approach to multi-camera tracking in visual sensor networks
In this paper, a sparsity-driven approach is presented for multi-camera tracking in visual sensor networks (VSNs). VSNs consist of image sensors, embedded processors and wireless transceivers which are powered by batteries. Since the energy and bandwidth resources are limited, setting up a tracking system in VSNs is a challenging problem. Motivated by the goal of tracking in a bandwidth-constrained environment, we present a sparsity-driven method to compress the features extracted by the camera nodes, which are then transmitted across the network for distributed inference. We have designed special overcomplete dictionaries that match the structure of the features, leading to very parsimonious yet accurate representations. We have tested our method in indoor and outdoor people tracking scenarios. Our experimental results demonstrate how our approach leads to communication savings without significant loss in tracking performance
Infinite Sparse Structured Factor Analysis
Matrix factorisation methods decompose multivariate observations as linear
combinations of latent feature vectors. The Indian Buffet Process (IBP)
provides a way to model the number of latent features required for a good
approximation in terms of regularised reconstruction error. Previous work has
focussed on latent feature vectors with independent entries. We extend the
model to include nondiagonal latent covariance structures representing
characteristics such as smoothness. This is done by . Using simulations we
demonstrate that under appropriate conditions a smoothness prior helps to
recover the true latent features, while denoising more accurately. We
demonstrate our method on a real neuroimaging dataset, where computational
tractability is a sufficient challenge that the efficient strategy presented
here is essential
Optimization Methods for Convolutional Sparse Coding
Sparse and convolutional constraints form a natural prior for many
optimization problems that arise from physical processes. Detecting motifs in
speech and musical passages, super-resolving images, compressing videos, and
reconstructing harmonic motions can all leverage redundancies introduced by
convolution. Solving problems involving sparse and convolutional constraints
remains a difficult computational problem, however. In this paper we present an
overview of convolutional sparse coding in a consistent framework. The
objective involves iteratively optimizing a convolutional least-squares term
for the basis functions, followed by an L1-regularized least squares term for
the sparse coefficients. We discuss a range of optimization methods for solving
the convolutional sparse coding objective, and the properties that make each
method suitable for different applications. In particular, we concentrate on
computational complexity, speed to {\epsilon} convergence, memory usage, and
the effect of implied boundary conditions. We present a broad suite of examples
covering different signal and application domains to illustrate the general
applicability of convolutional sparse coding, and the efficacy of the available
optimization methods
VLSI Friendly Framework for Scalable Video Coding based on Compressed Sensing
This paper presents a new VLSI friendly framework for scalable video coding
based on Compressed Sensing (CS). It achieves scalability through 3-Dimensional
Discrete Wavelet Transform (3-D DWT) and better compression ratio by exploiting
the inherent sparsity of the high-frequency wavelet sub-bands through CS. By
using 3-D DWT and a proposed adaptive measurement scheme called AMS at the
encoder, one can succeed in improving the compression ratio and reducing the
complexity of the decoder. The proposed video codec uses only 7% of the total
number of multipliers needed in a conventional CS-based video coding system. A
codebook of Bernoulli matrices with different sizes corresponding to the
predefined sparsity levels is maintained at both the encoder and the decoder.
Based on the calculated l0-norm of the input vector, one of the sixteen
possible Bernoulli matrices will be selected for taking the CS measurements and
its index will be transmitted along with the measurements. Based on this index,
the corresponding Bernoulli matrix has been used in CS reconstruction algorithm
to get back the high-frequency wavelet sub-bands at the decoder. At the
decoder, a new Enhanced Approximate Message Passing (EAMP) algorithm has been
proposed to reconstruct the wavelet coefficients and apply the inverse wavelet
transform for restoring back the video frames. Simulation results have
established the superiority of the proposed framework over the existing schemes
and have increased its suitability for VLSI implementation. Moreover, the coded
video is found to be scalable with an increase in a number of levels of wavelet
decomposition
Multivariate Cryptosystems for Secure Processing of Multidimensional Signals
Multidimensional signals like 2-D and 3-D images or videos are inherently
sensitive signals which require privacy-preserving solutions when processed in
untrustworthy environments, but their efficient encrypted processing is
particularly challenging due to their structure, dimensionality and size. This
work introduces a new cryptographic hard problem denoted m-RLWE (multivariate
Ring Learning with Errors) which generalizes RLWE, and proposes several
relinearization-based techniques to efficiently convert signals with different
structures and dimensionalities. The proposed hard problem and the developed
techniques give support to lattice cryptosystems that enable encrypted
processing of multidimensional signals and efficient conversion between
different structures. We show an example cryptosystem and prove that it
outperforms its RLWE counterpart in terms of security against basis-reduction
attacks, efficiency and cipher expansion for encrypted image processing, and we
exemplify some of the proposed transformation techniques in critical and
ubiquitous block-based processing application
Merge Frame Design for Video Stream Switching using Piecewise Constant Functions
The ability to efficiently switch from one pre-encoded video stream to
another (e.g., for bitrate adaptation or view switching) is important for many
interactive streaming applications. Recently, stream-switching mechanisms based
on distributed source coding (DSC) have been proposed. In order to reduce the
overall transmission rate, these approaches provide a "merge" mechanism, where
information is sent to the decoder such that the exact same frame can be
reconstructed given that any one of a known set of side information (SI) frames
is available at the decoder (e.g., each SI frame may correspond to a different
stream from which we are switching). However, the use of bit-plane coding and
channel coding in many DSC approaches leads to complex coding and decoding. In
this paper, we propose an alternative approach for merging multiple SI frames,
using a piecewise constant (PWC) function as the merge operator. In our
approach, for each block to be reconstructed, a series of parameters of these
PWC merge functions are transmitted in order to guarantee identical
reconstruction given the known side information blocks. We consider two
different scenarios. In the first case, a target frame is first given, and then
merge parameters are chosen so that this frame can be reconstructed exactly at
the decoder. In contrast, in the second scenario, the reconstructed frame and
merge parameters are jointly optimized to meet a rate-distortion criteria.
Experiments show that for both scenarios, our proposed merge techniques can
outperform both a recent approach based on DSC and the SP-frame approach in
H.264, in terms of compression efficiency and decoder complexity
A new adaptive interframe transform coding using directional classification
Version of RecordPublishe
Wavelet Video Coding Algorithm Based on Energy Weighted Significance Probability Balancing Tree
This work presents a 3-D wavelet video coding algorithm. By analyzing the
contribution of each biorthogonal wavelet basis to reconstructed signal's
energy, we weight each wavelet subband according to its basis energy. Based on
distribution of weighted coefficients, we further discuss a 3-D wavelet tree
structure named \textbf{significance probability balancing tree}, which places
the coefficients with similar probabilities of being significant on the same
layer. It is implemented by using hybrid spatial orientation tree and
temporal-domain block tree. Subsequently, a novel 3-D wavelet video coding
algorithm is proposed based on the energy-weighted significance probability
balancing tree. Experimental results illustrate that our algorithm always
achieves good reconstruction quality for different classes of video sequences.
Compared with asymmetric 3-D orientation tree, the average peak signal-to-noise
ratio (PSNR) gain of our algorithm are 1.24dB, 2.54dB and 2.57dB for luminance
(Y) and chrominance (U,V) components, respectively. Compared with
temporal-spatial orientation tree algorithm, our algorithm gains 0.38dB, 2.92dB
and 2.39dB higher PSNR separately for Y, U, and V components. In addition, the
proposed algorithm requires lower computation cost than those of the above two
algorithms.Comment: 17 pages, 2 figures, submission to Multimedia Tools and Application
- …