21 research outputs found

    Appropriate kernels for Divisive Normalization explained by Wilson-Cowan equations

    Get PDF
    Cascades of standard Linear+NonLinear-Divisive Normalization transforms [Carandini&Heeger12] can be easily fitted using the appropriate formulation introduced in [Martinez17a] to reproduce the perception of image distortion in naturalistic environments. However, consistently with [Rust&Movshon05], training the model in naturalistic environments does not guarantee the prediction of well known phenomena illustrated by artificial stimuli. For example, the cascade of Divisive Normalizations fitted with image quality databases has to be modified to include a variety aspects of masking of simple patterns. Specifically, the standard Gaussian kernels of [Watson&Solomon97] have to be augmented with extra weights [Martinez17b]. These can be introduced ad-hoc using the intuition to solve the empirical failures found in the original model, but it would be nice a better justification for this hack. In this work we give a theoretical justification of such empirical modification of the Watson&Solomon kernel based on the Wilson-Cowan [WilsonCowan73] model of cortical interactions. Specifically, we show that the analytical relation between the Divisive Normalization model and the Wilson-Cowan model proposed here leads to the kind of extra factors that have to be included and its qualitative dependence with frequency

    Derivatives and Inverse of a Linear-Nonlinear Multi-Layer Spatial Vision Model

    Get PDF
    Analyzing the mathematical properties of perceptually meaningful linear-nonlinear transforms is interesting because this computation is at the core of many vision models. Here we make such analysis in detail using a specific model [Malo & Simoncelli, SPIE Human Vision Electr. Imag. 2015] which is illustrative because it consists of a cascade of standard linear-nonlinear modules. The interest of the analytic results and the numerical methods involved transcend the particular model because of the ubiquity of the linear-nonlinear structure. Here we extend [Malo&Simoncelli 15] by considering 4 layers: (1) linear spectral integration and nonlinear brightness response, (2) definition of local contrast by using linear filters and divisive normalization, (3) linear CSF filter and nonlinear local con- trast masking, and (4) linear wavelet-like decomposition and nonlinear divisive normalization to account for orientation and scale-dependent masking. The extra layers were measured using Maximum Differentiation [Malo et al. VSS 2016]. First, we describe the general architecture using a unified notation in which every module is composed by isomorphic linear and nonlinear transforms. The chain-rule is interesting to simplify the analysis of systems with this modular architecture, and invertibility is related to the non-singularity of the Jacobian matrices. Second, we consider the details of the four layers in our particular model, and how they improve the original version of the model. Third, we explicitly list the derivatives of every module, which are relevant for the definition of perceptual distances, perceptual gradient descent, and characterization of the deformation of space. Fourth, we address the inverse, and we find different analytical and numerical problems in each specific module. Solutions are proposed for all of them. Finally, we describe through examples how to use the toolbox to apply and check the above theory. In summary, the formulation and toolbox are ready to explore the geometric and perceptual issues addressed in the introductory section (giving all the technical information that was missing in [Malo&Simoncelli 15])

    Vision models for wide color gamut imaging in cinema

    Get PDF
    Gamut mapping is the problem of transforming the colors of image or video content so as to fully exploit the color palette of the display device where the content will be shown, while preserving the artistic intent of the original content's creator. In particular, in the cinema industry, the rapid advancement in display technologies has created a pressing need to develop automatic and fast gamut mapping algorithms. In this article, we propose a novel framework that is based on vision science models, performs both gamut reduction and gamut extension, is of low computational complexity, produces results that are free from artifacts and outperforms state-of-the-art methods according to psychophysical tests. Our experiments also highlight the limitations of existing objective metrics for the gamut mapping problem

    Contrast Sensitivity Functions in Autoencoders

    Get PDF
    Three decades ago, Atick et al. suggested that human frequency sensitivity may emerge from the enhancement required for a more efficient analysis of retinal images. Here we reassess the relevance of low-level vision tasks in the explanation of the Contrast Sensitivity Functions (CSFs) in light of (1) the current trend of using artificial neural networks for studying vision, and (2) the current knowledge of retinal image representations. As a first contribution, we show that a very popular type of convolutional neural networks (CNNs), called autoencoders, may develop human-like CSFs in the spatio-temporal and chromatic dimensions when trained to perform some basic low-level vision tasks (like retinal noise and optical blur removal), but not others (like chromatic adaptation) or pure reconstruction after simple bottlenecks). As an illustrative example, the best CNN (in the considered set of simple architectures for enhancement of the retinal signal) reproduces the CSFs with an RMSE error of 11\% of the maximum sensitivity. As a second contribution, we provide experimental evidence of the fact that, for some functional goals (at low abstraction level), deeper CNNs that are better in reaching the quantitative goal are actually worse in replicating human-like phenomena (such as the CSFs). This low-level result (for the explored networks) is not necessarily in contradiction with other works that report advantages of deeper nets in modeling higher-level vision goals. However, in line with a growing body of literature, our results suggests another word of caution about CNNs in vision science since the use of simplified units or unrealistic architectures in goal optimization may be a limitation for the modeling and understanding of human vision

    Angular-Based Preprocessing for Image Denoising

    No full text

    Video inpainting of occluding and occluded objects

    No full text
    We present a basic technique to fill-in missing parts of a video sequence taken from a static camera. Two important cases are considered. The first case is concerned with the removal of non-stationary objects that occlude stationary background. We use a priority based spatio-temporal synthesis scheme for inpainting the stationary background. The second and more difficult case involves filling-in moving objects when they are partially occluded. For this, we propose a priority scheme to first inpaint the occluded moving objects and then fill-in the remaining area with stationary background using the method proposed for the first case. We use as input an optical-flow based mask, which tells if an undamaged pixel is moving or is stationary. The moving object is inpainted by copying patches from undamaged frames, and this copying is independent of the background of the moving object in either frame. This work has applications in a variety of different areas, including video special effects and restoration and enhancement of damaged videos. The examples shown in the paper illustrate these ideas. 1
    corecore