43,232 research outputs found
Perceptually-Driven Video Coding with the Daala Video Codec
The Daala project is a royalty-free video codec that attempts to compete with
the best patent-encumbered codecs. Part of our strategy is to replace core
tools of traditional video codecs with alternative approaches, many of them
designed to take perceptual aspects into account, rather than optimizing for
simple metrics like PSNR. This paper documents some of our experiences with
these tools, which ones worked and which did not. We evaluate which tools are
easy to integrate into a more traditional codec design, and show results in the
context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital
Image Processing (ADIP), 201
An efficient rate control algorithm for a wavelet video codec
Rate control plays an essential role in video coding and transmission to provide the best video quality at the receiver's end given the constraint of certain network conditions. In this paper, a rate control algorithm using the Quality Factor (QF) optimization method is proposed for the wavelet-based video codec and implemented on an open source Dirac video encoder. A mathematical model which we call Rate-QF (R - QF) model is derived to generate the optimum QF for the current coding frame according to the target bitrate. The proposed algorithm is a complete one pass process and does not require complex mathematical calculation. The process of calculating the QF is quite simple and further calculation is not required for each coded frame. The experimental results show that the proposed algorithm can control the bitrate precisely (within 1% of target bitrate in average). Moreover, the variation of bitrate over each Group of Pictures (GOPs) is lower than that of H.264. This is an advantage in preventing the buffer overflow and underflow for real-time multimedia data streaming
A New Fast Motion Estimation and Mode Decision algorithm for H.264 Depth Maps encoding in Free Viewpoint TV
In this paper, we consider a scenario where 3D scenes are modeled through a View+Depth representation. This representation is to be used at the rendering side to generate synthetic views for free viewpoint video. The encoding of both type of data (view and depth) is carried out using two H.264/AVC encoders. In this scenario we address the reduction of the encoding complexity of depth data. Firstly, an analysis of the Mode Decision and Motion Estimation processes has been conducted for both view and depth sequences, in order to capture the correlation between them. Taking advantage of this correlation, we propose a fast mode decision and motion estimation algorithm for the depth encoding. Results show that the proposed algorithm reduces the computational burden with a negligible loss in terms of quality of the rendered synthetic views. Quality measurements have been conducted using the Video Quality Metric
Perception of categories: from coding efficiency to reaction times
Reaction-times in perceptual tasks are the subject of many experimental and
theoretical studies. With the neural decision making process as main focus,
most of these works concern discrete (typically binary) choice tasks, implying
the identification of the stimulus as an exemplar of a category. Here we
address issues specific to the perception of categories (e.g. vowels, familiar
faces, ...), making a clear distinction between identifying a category (an
element of a discrete set) and estimating a continuous parameter (such as a
direction). We exhibit a link between optimal Bayesian decoding and coding
efficiency, the latter being measured by the mutual information between the
discrete category set and the neural activity. We characterize the properties
of the best estimator of the likelihood of the category, when this estimator
takes its inputs from a large population of stimulus-specific coding cells.
Adopting the diffusion-to-bound approach to model the decisional process, this
allows to relate analytically the bias and variance of the diffusion process
underlying decision making to macroscopic quantities that are behaviorally
measurable. A major consequence is the existence of a quantitative link between
reaction times and discrimination accuracy. The resulting analytical expression
of mean reaction times during an identification task accounts for empirical
facts, both qualitatively (e.g. more time is needed to identify a category from
a stimulus at the boundary compared to a stimulus lying within a category), and
quantitatively (working on published experimental data on phoneme
identification tasks)
Temporal Dynamics of Decision-Making during Motion Perception in the Visual Cortex
How does the brain make decisions? Speed and accuracy of perceptual decisions covary with certainty in the input, and correlate with the rate of evidence accumulation in parietal and frontal cortical "decision neurons." A biophysically realistic model of interactions within and between Retina/LGN and cortical areas V1, MT, MST, and LIP, gated by basal ganglia, simulates dynamic properties of decision-making in response to ambiguous visual motion stimuli used by Newsome, Shadlen, and colleagues in their neurophysiological experiments. The model clarifies how brain circuits that solve the aperture problem interact with a recurrent competitive network with self-normalizing choice properties to carry out probablistic decisions in real time. Some scientists claim that perception and decision-making can be described using Bayesian inference or related general statistical ideas, that estimate the optimal interpretation of the stimulus given priors and likelihoods. However, such concepts do not propose the neocortical mechanisms that enable perception, and make decisions. The present model explains behavioral and neurophysiological decision-making data without an appeal to Bayesian concepts and, unlike other existing models of these data, generates perceptual representations and choice dynamics in response to the experimental visual stimuli. Quantitative model simulations include the time course of LIP neuronal dynamics, as well as behavioral accuracy and reaction time properties, during both correct and error trials at different levels of input ambiguity in both fixed duration and reaction time tasks. Model MT/MST interactions compute the global direction of random dot motion stimuli, while model LIP computes the stochastic perceptual decision that leads to a saccadic eye movement.National Science Foundation (SBE-0354378, IIS-02-05271); Office of Naval Research (N00014-01-1-0624); National Institutes of Health (R01-DC-02852
- …