59 research outputs found
Information Flow in Color Appearance Neural Networks
Color Appearance Models are biological networks that consist of a cascade of
linear+nonlinear layers that modify the linear measurements at the retinal
photo-receptors leading to an internal (nonlinear) representation of color that
correlates with psychophysical experience. The basic layers of these networks
include: (1) chromatic adaptation (normalization of the mean and covariance of
the color manifold), (2) change to opponent color channels (PCA-like rotation
in the color space), and (3) saturating nonlinearities to get perceptually
Euclidean color representations (similar to dimensionwise equalization). The
Efficient Coding Hypothesis argues that these transforms should emerge from
information-theoretic goals. In case this hypothesis holds in color vision, the
question is, what is the coding gain due to the different layers of the color
appearance networks?
In this work, a representative family of Color Appearance Models is analyzed
in terms of how the redundancy among the chromatic components is modified along
the network and how much information is transferred from the input data to the
noisy response. The proposed analysis is done using data and methods that were
not available before: (1) new colorimetrically calibrated scenes in different
CIE illuminations for proper evaluation of chromatic adaptation, and (2) new
statistical tools to estimate (multivariate) information-theoretic quantities
between multidimensional sets based on Gaussianization. Results confirm that
the Efficient Coding Hypothesis holds for current color vision models, and
identify the psychophysical mechanisms critically responsible for gains in
information transference: opponent channels and their nonlinear nature are more
important than chromatic adaptation at the retina
Appropriate kernels for Divisive Normalization explained by Wilson-Cowan equations
Cascades of standard Linear+NonLinear-Divisive Normalization transforms [Carandini&Heeger12] can be easily fitted using the appropriate formulation introduced in [Martinez17a] to reproduce the perception of image distortion in naturalistic environments. However, consistently with [Rust&Movshon05], training the model in naturalistic environments does not guarantee the prediction of well known phenomena illustrated by artificial stimuli. For example, the cascade of Divisive Normalizations fitted with image quality databases has to be modified to include a variety aspects of masking of simple patterns. Specifically, the standard Gaussian kernels of [Watson&Solomon97] have to be augmented with extra weights [Martinez17b]. These can be introduced ad-hoc using the intuition to solve the empirical failures found in the original model, but it would be nice a better justification for this hack. In this work we give a theoretical justification of such empirical modification of the Watson&Solomon kernel based on the Wilson-Cowan [WilsonCowan73] model of cortical interactions. Specifically, we show that the analytical relation between the Divisive Normalization model and the Wilson-Cowan model proposed here leads to the kind of extra factors that have to be included and its qualitative dependence with frequency
Functional Connectome of the Human Brain with Total Correlation
Recent studies proposed the use of Total Correlation to describe functional connectivity
among brain regions as a multivariate alternative to conventional pairwise measures such as correlation or mutual information. In this work, we build on this idea to infer a large-scale (whole-brain)
connectivity network based on Total Correlation and show the possibility of using this kind of
network as biomarkers of brain alterations. In particular, this work uses Correlation Explanation
(CorEx) to estimate Total Correlation. First, we prove that CorEx estimates of Total Correlation and
clustering results are trustable compared to ground truth values. Second, the inferred large-scale
connectivity network extracted from the more extensive open fMRI datasets is consistent with existing
neuroscience studies, but, interestingly, can estimate additional relations beyond pairwise regions.
And finally, we show how the connectivity graphs based on Total Correlation can also be an effective
tool to aid in the discovery of brain diseases
Towards a Functional Explanation of the Connectivity LGN - V1
The principles behind the connectivity between LGN and V1 are not well understood. Models have to explain two basic experimental trends: (i) the combination of thalamic responses is local and it gives rise to a variety of oriented Gabor-like receptive felds in V1 [1], and (ii) these filters are spatially organized in orientation maps [2]. Competing explanations of orientation maps use purely geometrical arguments such as optimal wiring or packing from LGN [3-5], but they make no explicit reference to visual function. On the other hand, explanations based on func- tional arguments such as maximum information transference (infomax) [6,7] usually neglect a potential contribution from LGN local circuitry. In this work we explore the abil- ity of the conventional functional arguments (infomax and variants), to derive both trends simultaneously assuming a plausible sampling model linking the retina to the LGN [8], as opposed to previous attempts operating from the retina.
Consistently with other aspects of human vi- sion [14-16], additional constraints should be added to plain infomax to understand the second trend of the LGN-V1 con- nectivity. Possibilities include energy budget [11], wiring constraints [8], or error minimization in noisy systems, ei- ther linear [16] or nonlinear [14, 15]. In particular, consideration of high noise (neglected here) would favor the redundancy in the prediction (which would be required to match the relations between spatially neighbor neurons in the same orientation domain)
What You Hear Is What You See: Audio Quality Metrics From Image Quality Metrics
In this study, we investigate the feasibility of utilizing state-of-the-art
image perceptual metrics for evaluating audio signals by representing them as
spectrograms. The encouraging outcome of the proposed approach is based on the
similarity between the neural mechanisms in the auditory and visual pathways.
Furthermore, we customise one of the metrics which has a psychoacoustically
plausible architecture to account for the peculiarities of sound signals. We
evaluate the effectiveness of our proposed metric and several baseline metrics
using a music dataset, with promising results in terms of the correlation
between the metrics and the perceived quality of audio as rated by human
evaluators
PerceptNet:A Human Visual System Inspired Neural Network for Estimating Perceptual Distance
Traditionally, the vision community has devised algorithms to estimate the
distance between an original image and images that have been subject to
perturbations. Inspiration was usually taken from the human visual perceptual
system and how the system processes different perturbations in order to
replicate to what extent it determines our ability to judge image quality.
While recent works have presented deep neural networks trained to predict human
perceptual quality, very few borrow any intuitions from the human visual
system. To address this, we present PerceptNet, a convolutional neural network
where the architecture has been chosen to reflect the structure and various
stages in the human visual system. We evaluate PerceptNet on various
traditional perception datasets and note strong performance on a number of them
as compared with traditional image quality metrics. We also show that including
a nonlinearity inspired by the human visual system in classical deep neural
networks architectures can increase their ability to judge perceptual
similarity. Compared to similar deep learning methods, the performance is
similar, although our network has a number of parameters that is several orders
of magnitude less
Derivatives and Inverse of a Linear-Nonlinear Multi-Layer Spatial Vision Model
Analyzing the mathematical properties of perceptually meaningful linear-nonlinear transforms is interesting because this computation is at the core of many vision models. Here we make such analysis in detail using a specific model [Malo & Simoncelli, SPIE Human Vision Electr. Imag. 2015] which is illustrative because it consists of a cascade of standard linear-nonlinear modules. The interest of the analytic results and the numerical methods involved transcend the particular model because of the ubiquity of the linear-nonlinear structure.
Here we extend [Malo&Simoncelli 15] by considering 4 layers: (1) linear spectral integration and nonlinear brightness response, (2) definition of local contrast by using linear filters and divisive normalization, (3) linear CSF filter and nonlinear local con- trast masking, and (4) linear wavelet-like decomposition and nonlinear divisive normalization to account for orientation and scale-dependent masking. The extra layers were measured using Maximum Differentiation [Malo et al. VSS 2016].
First, we describe the general architecture using a unified notation in which every module is composed by isomorphic linear and nonlinear transforms. The chain-rule is interesting to simplify the analysis of systems with this modular architecture, and invertibility is related to the non-singularity of the Jacobian matrices. Second, we consider the details of the four layers in our particular model, and how they improve the original version of the model. Third, we explicitly list the derivatives of every module, which are relevant for the definition of perceptual distances, perceptual gradient descent, and characterization of the deformation of space. Fourth, we address the inverse, and we find different analytical and numerical problems in each specific module. Solutions are proposed for all of them. Finally, we describe through examples how to use the toolbox to apply and check the above theory.
In summary, the formulation and toolbox are ready to explore the geometric and perceptual issues addressed in the introductory section (giving all the technical information that was missing in [Malo&Simoncelli 15])
- …