7,182 research outputs found
Predictive Encoding of Contextual Relationships for Perceptual Inference, Interpolation and Prediction
We propose a new neurally-inspired model that can learn to encode the global
relationship context of visual events across time and space and to use the
contextual information to modulate the analysis by synthesis process in a
predictive coding framework. The model learns latent contextual representations
by maximizing the predictability of visual events based on local and global
contextual information through both top-down and bottom-up processes. In
contrast to standard predictive coding models, the prediction error in this
model is used to update the contextual representation but does not alter the
feedforward input for the next layer, and is thus more consistent with
neurophysiological observations. We establish the computational feasibility of
this model by demonstrating its ability in several aspects. We show that our
model can outperform state-of-art performances of gated Boltzmann machines
(GBM) in estimation of contextual information. Our model can also interpolate
missing events or predict future events in image sequences while simultaneously
estimating contextual information. We show it achieves state-of-art
performances in terms of prediction accuracy in a variety of tasks and
possesses the ability to interpolate missing frames, a function that is lacking
in GBM
Towards visualization and searching :a dual-purpose video coding approach
In modern video applications, the role of the decoded video is much more than filling a screen for visualization. To offer powerful video-enabled applications, it is increasingly critical not only to visualize the decoded video but also to provide efficient searching capabilities for similar content. Video surveillance and personal communication applications are critical examples of these dual visualization and searching requirements. However, current video coding solutions are strongly biased towards the visualization needs. In this context, the goal of this work is to propose a dual-purpose video coding solution targeting both visualization and searching needs by adopting a hybrid coding framework where the usual pixel-based coding approach is combined with a novel feature-based coding approach. In this novel dual-purpose video coding solution, some frames are coded using a set of keypoint matches, which not only allow decoding for visualization, but also provide the decoder valuable feature-related information, extracted at the encoder from the original frames, instrumental for efficient searching. The proposed solution is based on a flexible joint Lagrangian optimization framework where pixel-based and feature-based processing are combined to find the most appropriate trade-off between the visualization and searching performances. Extensive experimental results for the assessment of the proposed dual-purpose video coding solution under meaningful test conditions are presented. The results show the flexibility of the proposed coding solution to achieve different optimization trade-offs, notably competitive performance regarding the state-of-the-art HEVC standard both in terms of visualization and searching performance.Em modernas aplicações de vídeo, o papel do vídeo decodificado é muito mais que simplesmente preencher uma tela para visualização. Para oferecer aplicações mais poderosas por meio de sinais de vídeo,é cada vez mais crítico não apenas considerar a qualidade do conteúdo objetivando sua visualização, mas também possibilitar meios de realizar busca por conteúdos semelhantes. Requisitos de visualização e de busca são considerados, por exemplo, em modernas aplicações de vídeo vigilância e comunicações pessoais. No entanto, as atuais soluções de codificação de vídeo são fortemente voltadas aos requisitos de visualização. Nesse contexto, o objetivo deste trabalho é propor uma solução de codificação de vídeo de propósito duplo, objetivando tanto requisitos de visualização quanto de busca. Para isso, é proposto um arcabouço de codificação em que a abordagem usual de codificação de pixels é combinada com uma nova abordagem de codificação baseada em features visuais. Nessa solução, alguns quadros são codificados usando um conjunto de pares de keypoints casados, possibilitando não apenas visualização, mas também provendo ao decodificador valiosas informações de features visuais, extraídas no codificador a partir do conteúdo original, que são instrumentais em aplicações de busca. A solução proposta emprega um esquema flexível de otimização Lagrangiana onde o processamento baseado em pixel é combinado com o processamento baseado em features visuais objetivando encontrar um compromisso adequado entre os desempenhos de visualização e de busca. Os resultados experimentais mostram a flexibilidade da solução proposta em alcançar diferentes compromissos de otimização, nomeadamente desempenho competitivo em relação ao padrão HEVC tanto em termos de visualização quanto de busca
Coding local and global binary visual features extracted from video sequences
Binary local features represent an effective alternative to real-valued
descriptors, leading to comparable results for many visual analysis tasks,
while being characterized by significantly lower computational complexity and
memory requirements. When dealing with large collections, a more compact
representation based on global features is often preferred, which can be
obtained from local features by means of, e.g., the Bag-of-Visual-Word (BoVW)
model. Several applications, including for example visual sensor networks and
mobile augmented reality, require visual features to be transmitted over a
bandwidth-limited network, thus calling for coding techniques that aim at
reducing the required bit budget, while attaining a target level of efficiency.
In this paper we investigate a coding scheme tailored to both local and global
binary features, which aims at exploiting both spatial and temporal redundancy
by means of intra- and inter-frame coding. In this respect, the proposed coding
scheme can be conveniently adopted to support the Analyze-Then-Compress (ATC)
paradigm. That is, visual features are extracted from the acquired content,
encoded at remote nodes, and finally transmitted to a central controller that
performs visual analysis. This is in contrast with the traditional approach, in
which visual content is acquired at a node, compressed and then sent to a
central unit for further processing, according to the Compress-Then-Analyze
(CTA) paradigm. In this paper we experimentally compare ATC and CTA by means of
rate-efficiency curves in the context of two different visual analysis tasks:
homography estimation and content-based retrieval. Our results show that the
novel ATC paradigm based on the proposed coding primitives can be competitive
with CTA, especially in bandwidth limited scenarios.Comment: submitted to IEEE Transactions on Image Processin
Predictive coding: A Possible Explanation of Filling-in at the blind spot
Filling-in at the blind-spot is a perceptual phenomenon in which the visual
system fills the informational void, which arises due to the absence of retinal
input corresponding to the optic disc, with surrounding visual attributes.
Though there are enough evidence to conclude that some kind of neural
computation is involved in filling-in at the blind spot especially in the early
visual cortex, the knowledge of the actual computational mechanism is far from
complete. We have investigated the bar experiments and the associated
filling-in phenomenon in the light of the hierarchical predictive coding
framework, where the blind-spot was represented by the absence of early
feed-forward connection. We recorded the responses of predictive estimator
neurons at the blind-spot region in the V1 area of our three level (LGN-V1-V2)
model network. These responses are in agreement with the results of earlier
physiological studies and using the generative model we also showed that these
response profiles indeed represent the filling-in completion. These demonstrate
that predictive coding framework could account for the filling-in phenomena
observed in several psychophysical and physiological experiments involving bar
stimuli. These results suggest that the filling-in could naturally arise from
the computational principle of hierarchical predictive coding (HPC) of natural
images.Comment: 23 pages, 9 figure
- …