1,035 research outputs found

    Simulations for Validation of Vision Systems

    Full text link
    As the computer vision matures into a systems science and engineering discipline, there is a trend in leveraging latest advances in computer graphics simulations for performance evaluation, learning, and inference. However, there is an open question on the utility of graphics simulations for vision with apparently contradicting views in the literature. In this paper, we place the results from the recent literature in the context of performance characterization methodology outlined in the 90's and note that insights derived from simulations can be qualitative or quantitative depending on the degree of fidelity of models used in simulation and the nature of the question posed by the experimenter. We describe a simulation platform that incorporates latest graphics advances and use it for systematic performance characterization and trade-off analysis for vision system design. We verify the utility of the platform in a case study of validating a generative model inspired vision hypothesis, Rank-Order consistency model, in the contexts of global and local illumination changes, and bad weather, and high-frequency noise. Our approach establishes the link between alternative viewpoints, involving models with physics based semantics and signal and perturbation semantics and confirms insights in literature on robust change detection

    Image Restoration Using Joint Statistical Modeling in Space-Transform Domain

    Full text link
    This paper presents a novel strategy for high-fidelity image restoration by characterizing both local smoothness and nonlocal self-similarity of natural images in a unified statistical manner. The main contributions are three-folds. First, from the perspective of image statistics, a joint statistical modeling (JSM) in an adaptive hybrid space-transform domain is established, which offers a powerful mechanism of combining local smoothness and nonlocal self-similarity simultaneously to ensure a more reliable and robust estimation. Second, a new form of minimization functional for solving image inverse problem is formulated using JSM under regularization-based framework. Finally, in order to make JSM tractable and robust, a new Split-Bregman based algorithm is developed to efficiently solve the above severely underdetermined inverse problem associated with theoretical proof of convergence. Extensive experiments on image inpainting, image deblurring and mixed Gaussian plus salt-and-pepper noise removal applications verify the effectiveness of the proposed algorithm.Comment: 14 pages, 18 figures, 7 Tables, to be published in IEEE Transactions on Circuits System and Video Technology (TCSVT). High resolution pdf version and Code can be found at: http://idm.pku.edu.cn/staff/zhangjian/IRJSM

    A sparsity-driven approach to multi-camera tracking in visual sensor networks

    Get PDF
    In this paper, a sparsity-driven approach is presented for multi-camera tracking in visual sensor networks (VSNs). VSNs consist of image sensors, embedded processors and wireless transceivers which are powered by batteries. Since the energy and bandwidth resources are limited, setting up a tracking system in VSNs is a challenging problem. Motivated by the goal of tracking in a bandwidth-constrained environment, we present a sparsity-driven method to compress the features extracted by the camera nodes, which are then transmitted across the network for distributed inference. We have designed special overcomplete dictionaries that match the structure of the features, leading to very parsimonious yet accurate representations. We have tested our method in indoor and outdoor people tracking scenarios. Our experimental results demonstrate how our approach leads to communication savings without significant loss in tracking performance

    Infinite Sparse Structured Factor Analysis

    Full text link
    Matrix factorisation methods decompose multivariate observations as linear combinations of latent feature vectors. The Indian Buffet Process (IBP) provides a way to model the number of latent features required for a good approximation in terms of regularised reconstruction error. Previous work has focussed on latent feature vectors with independent entries. We extend the model to include nondiagonal latent covariance structures representing characteristics such as smoothness. This is done by . Using simulations we demonstrate that under appropriate conditions a smoothness prior helps to recover the true latent features, while denoising more accurately. We demonstrate our method on a real neuroimaging dataset, where computational tractability is a sufficient challenge that the efficient strategy presented here is essential

    Optimization Methods for Convolutional Sparse Coding

    Full text link
    Sparse and convolutional constraints form a natural prior for many optimization problems that arise from physical processes. Detecting motifs in speech and musical passages, super-resolving images, compressing videos, and reconstructing harmonic motions can all leverage redundancies introduced by convolution. Solving problems involving sparse and convolutional constraints remains a difficult computational problem, however. In this paper we present an overview of convolutional sparse coding in a consistent framework. The objective involves iteratively optimizing a convolutional least-squares term for the basis functions, followed by an L1-regularized least squares term for the sparse coefficients. We discuss a range of optimization methods for solving the convolutional sparse coding objective, and the properties that make each method suitable for different applications. In particular, we concentrate on computational complexity, speed to {\epsilon} convergence, memory usage, and the effect of implied boundary conditions. We present a broad suite of examples covering different signal and application domains to illustrate the general applicability of convolutional sparse coding, and the efficacy of the available optimization methods

    VLSI Friendly Framework for Scalable Video Coding based on Compressed Sensing

    Full text link
    This paper presents a new VLSI friendly framework for scalable video coding based on Compressed Sensing (CS). It achieves scalability through 3-Dimensional Discrete Wavelet Transform (3-D DWT) and better compression ratio by exploiting the inherent sparsity of the high-frequency wavelet sub-bands through CS. By using 3-D DWT and a proposed adaptive measurement scheme called AMS at the encoder, one can succeed in improving the compression ratio and reducing the complexity of the decoder. The proposed video codec uses only 7% of the total number of multipliers needed in a conventional CS-based video coding system. A codebook of Bernoulli matrices with different sizes corresponding to the predefined sparsity levels is maintained at both the encoder and the decoder. Based on the calculated l0-norm of the input vector, one of the sixteen possible Bernoulli matrices will be selected for taking the CS measurements and its index will be transmitted along with the measurements. Based on this index, the corresponding Bernoulli matrix has been used in CS reconstruction algorithm to get back the high-frequency wavelet sub-bands at the decoder. At the decoder, a new Enhanced Approximate Message Passing (EAMP) algorithm has been proposed to reconstruct the wavelet coefficients and apply the inverse wavelet transform for restoring back the video frames. Simulation results have established the superiority of the proposed framework over the existing schemes and have increased its suitability for VLSI implementation. Moreover, the coded video is found to be scalable with an increase in a number of levels of wavelet decomposition

    Multivariate Cryptosystems for Secure Processing of Multidimensional Signals

    Full text link
    Multidimensional signals like 2-D and 3-D images or videos are inherently sensitive signals which require privacy-preserving solutions when processed in untrustworthy environments, but their efficient encrypted processing is particularly challenging due to their structure, dimensionality and size. This work introduces a new cryptographic hard problem denoted m-RLWE (multivariate Ring Learning with Errors) which generalizes RLWE, and proposes several relinearization-based techniques to efficiently convert signals with different structures and dimensionalities. The proposed hard problem and the developed techniques give support to lattice cryptosystems that enable encrypted processing of multidimensional signals and efficient conversion between different structures. We show an example cryptosystem and prove that it outperforms its RLWE counterpart in terms of security against basis-reduction attacks, efficiency and cipher expansion for encrypted image processing, and we exemplify some of the proposed transformation techniques in critical and ubiquitous block-based processing application

    Merge Frame Design for Video Stream Switching using Piecewise Constant Functions

    Full text link
    The ability to efficiently switch from one pre-encoded video stream to another (e.g., for bitrate adaptation or view switching) is important for many interactive streaming applications. Recently, stream-switching mechanisms based on distributed source coding (DSC) have been proposed. In order to reduce the overall transmission rate, these approaches provide a "merge" mechanism, where information is sent to the decoder such that the exact same frame can be reconstructed given that any one of a known set of side information (SI) frames is available at the decoder (e.g., each SI frame may correspond to a different stream from which we are switching). However, the use of bit-plane coding and channel coding in many DSC approaches leads to complex coding and decoding. In this paper, we propose an alternative approach for merging multiple SI frames, using a piecewise constant (PWC) function as the merge operator. In our approach, for each block to be reconstructed, a series of parameters of these PWC merge functions are transmitted in order to guarantee identical reconstruction given the known side information blocks. We consider two different scenarios. In the first case, a target frame is first given, and then merge parameters are chosen so that this frame can be reconstructed exactly at the decoder. In contrast, in the second scenario, the reconstructed frame and merge parameters are jointly optimized to meet a rate-distortion criteria. Experiments show that for both scenarios, our proposed merge techniques can outperform both a recent approach based on DSC and the SP-frame approach in H.264, in terms of compression efficiency and decoder complexity

    A new adaptive interframe transform coding using directional classification

    Get PDF
    Version of RecordPublishe

    Wavelet Video Coding Algorithm Based on Energy Weighted Significance Probability Balancing Tree

    Full text link
    This work presents a 3-D wavelet video coding algorithm. By analyzing the contribution of each biorthogonal wavelet basis to reconstructed signal's energy, we weight each wavelet subband according to its basis energy. Based on distribution of weighted coefficients, we further discuss a 3-D wavelet tree structure named \textbf{significance probability balancing tree}, which places the coefficients with similar probabilities of being significant on the same layer. It is implemented by using hybrid spatial orientation tree and temporal-domain block tree. Subsequently, a novel 3-D wavelet video coding algorithm is proposed based on the energy-weighted significance probability balancing tree. Experimental results illustrate that our algorithm always achieves good reconstruction quality for different classes of video sequences. Compared with asymmetric 3-D orientation tree, the average peak signal-to-noise ratio (PSNR) gain of our algorithm are 1.24dB, 2.54dB and 2.57dB for luminance (Y) and chrominance (U,V) components, respectively. Compared with temporal-spatial orientation tree algorithm, our algorithm gains 0.38dB, 2.92dB and 2.39dB higher PSNR separately for Y, U, and V components. In addition, the proposed algorithm requires lower computation cost than those of the above two algorithms.Comment: 17 pages, 2 figures, submission to Multimedia Tools and Application
    corecore