Search CORE

3,461 research outputs found

Spatial intra-prediction based on mixtures of sparse representations

Author: Angelique Dremeau
Cedric Herzet
Christine Guillemot
Jean-Jacques Fuchs
Mehmet Turkan
Publication venue
Publication date: 03/04/2020
Field of study

Abstract-In this paper, we consider the problem of spatial prediction based on sparse representations. Several algorithms dealing with this problem can be found in the literature. We propose a novel method involving a mixture of sparse representations. We first place this approach into a probabilistic framework and then derive a practical procedure to solve it. Comparisons of the rate-distortion performance show the superiority of the proposed algorithm with regard to other stateof-the-art algorithms

CiteSeerX

Neural Expectation Maximization

Author: Greff Klaus
Schmidhuber Jürgen
van Steenkiste Sjoerd
Publication venue
Publication date: 04/11/2017
Field of study

Many real world tasks such as reasoning and physical interaction require identification and manipulation of conceptual entities. A first step towards solving these tasks is the automated discovery of distributed symbol-like representations. In this paper, we explicitly formalize this problem as inference in a spatial mixture model where each component is parametrized by a neural network. Based on the Expectation Maximization framework we then derive a differentiable clustering method that simultaneously learns how to group and represent individual entities. We evaluate our method on the (sequential) perceptual grouping task and find that it is able to accurately recover the constituent objects. We demonstrate that the learned representations are useful for next-step prediction.Comment: Accepted to NIPS 201

arXiv.org e-Print Archive

FigShare

Steered mixture-of-experts for light field images and video : representation and coding

Author: Lambert Peter
Sikora Thomas
Van Wallendael Glenn
Verhack Ruben
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution

Ghent University Academic Bibliography

Adapting Computer Vision Models To Limitations On Input Dimensionality And Model Complexity

Author: Abbas Alhabib
Publication venue: UCL (University College London)
Publication date: 28/02/2020
Field of study

When considering instances of distributed systems where visual sensors communicate with remote predictive models, data traffic is limited to the capacity of communication channels, and hardware limits the processing of collected data prior to transmission. We study novel methods of adapting visual inference to limitations on complexity and data availability at test time, wherever the aforementioned limitations exist. Our contributions detailed in this thesis consider both task-specific and task-generic approaches to reducing the data requirement for inference, and evaluate our proposed methods on a wide range of computer vision tasks. This thesis makes four distinct contributions: (i) We investigate multi-class action classification via two-stream convolutional neural networks that directly ingest information extracted from compressed video bitstreams. We show that selective access to macroblock motion vector information provides a good low-dimensional approximation of the underlying optical flow in visual sequences. (ii) We devise a bitstream cropping method by which AVC/H.264 and H.265 bitstreams are reduced to the minimum amount of necessary elements for optical flow extraction, while maintaining compliance with codec standards. We additionally study the effect of codec rate-quality control on the sparsity and noise incurred on optical flow derived from resulting bitstreams, and do so for multiple coding standards. (iii) We demonstrate degrees of variability in the amount of data required for action classification, and leverage this to reduce the dimensionality of input volumes by inferring the required temporal extent for accurate classification prior to processing via learnable machines. (iv) We extend the Mixtures-of-Experts (MoE) paradigm to adapt the data cost of inference for any set of constituent experts. We postulate that the minimum acceptable data cost of inference varies for different input space partitions, and consider mixtures where each expert is designed to meet a different set of constraints on input dimensionality. To take advantage of the flexibility of such mixtures in processing different input representations and modalities, we train biased gating functions such that experts requiring less information to make their inferences are favoured to others. We finally note that, our proposed data utility optimization solutions include a learnable component which considers specified priorities on the amount of information to be used prior to inference, and can be realized for any combination of tasks, modalities, and constraints on available data

UCL Discovery

Persistent Evidence of Local Image Properties in Generic ConvNets

Author: Azizpour Hossein
Carlsson Stefan
Ek Carl Henrik
Maki Atsuto
Razavian Ali Sharif
Sullivan Josephine
Publication venue
Publication date: 24/11/2014
Field of study

Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or the variation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. Surprisingly, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer, i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks, and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes

arXiv.org e-Print Archive

Publikationer från KTH

CiteSeerX

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Explore Bristol Research

Steered Mixture-of-Experts' for image and light field representation, processing and coding : a universal approach for immersive experiences of camera-captured scenes

Author: Verhack Ruben
Publication venue: Universiteit Gent. Faculteit Ingenieurswetenschappen en Architectuur
Publication date: 01/01/2020
Field of study

Ghent University Academic Bibliography

Self-organization in the olfactory system: one shot odor recognition in insects

Author: A Gelperin
A Whitehead
B Cazelles
B Ehmer
B Ermentrout
C Cortes
C Pelz
CD Brody
CG Galizia
DA Wilson
DG Wüstenberg
EC Marin
FT Sommer
G Laurent
G Laurent
G-Q Bi
G-Q Bi
H Ikeno
H Markram
H Zhu
HB Treloar
HDI Abarbanel
Henry D. I. Abarbanel
J Joerges
J Kauer
J Perez-Orive
J White
JS Belle de
JS Hosler
LB Vosshall
M Barth
M Garcia-Sanchez
M Heisenberg
M Stopfer
M Stopfer
M Wehr
Mikhail I. Rabinovich
N Uchida
NF Rulkov
NK Tanaka
O Hendin
P Mombaerts
R Huerta
R Malinov
Ramón Huerta
RC O’Reilly
RD Traub
RW Friedrich
S Sachse
S Sachse
T Cover
T Komiyama
T Nowotny
Thomas Nowotny
W Gerstner
W Gerstner
W Maas
Y Wang
Z Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2005
Field of study

We show in a model of spiking neurons that synaptic plasticity in the mushroom bodies in combination with the general fan-in, fan-out properties of the early processing layers of the olfactory system might be sufficient to account for its efficient recognition of odors. For a large variety of initial conditions the model system consistently finds a working solution without any fine-tuning, and is, therefore, inherently robust. We demonstrate that gain control through the known feedforward inhibition of lateral horn interneurons increases the capacity of the system but is not essential for its general function. We also predict an upper limit for the number of odor classes Drosophila can discriminate based on the number and connectivity of its olfactory neurons

Crossref

eScholarship - University of California

Sussex Research Online