24,605 research outputs found
Deep Markov Random Field for Image Modeling
Markov Random Fields (MRFs), a formulation widely used in generative image
modeling, have long been plagued by the lack of expressive power. This issue is
primarily due to the fact that conventional MRFs formulations tend to use
simplistic factors to capture local patterns. In this paper, we move beyond
such limitations, and propose a novel MRF model that uses fully-connected
neurons to express the complex interactions among pixels. Through theoretical
analysis, we reveal an inherent connection between this model and recurrent
neural networks, and thereon derive an approximated feed-forward network that
couples multiple RNNs along opposite directions. This formulation combines the
expressive power of deep neural networks and the cyclic dependency structure of
MRF in a unified model, bringing the modeling capability to a new level. The
feed-forward approximation also allows it to be efficiently learned from data.
Experimental results on a variety of low-level vision tasks show notable
improvement over state-of-the-arts.Comment: Accepted at ECCV 201
To âSketch-a-Scratchâ
A surface can be harsh and raspy, or smooth and silky, and everything in between. We are used to sense these features with our fingertips as well as with our eyes and ears: the exploration of a surface is a multisensory experience. Tools, too, are often employed in the interaction with surfaces, since they augment our manipulation capabilities. âSketch-a-Scratchâ is a tool for the multisensory exploration and sketching of surface textures. The userâs actions drive a physical sound model of real materialsâ response to interactions such as scraping, rubbing or rolling. Moreover, different input signals can be converted into 2D visual surface profiles, thus enabling to experience them visually, aurally and haptically
Still Moving
Here is something puzzling. Still Lifes can be expressive. Expression involves movement. Hence, (some) Still Lifes move. This seems odd. I consider a novel explanation to this âstatic-dynamicâ puzzle from Mitchell Green (2007). Green defends an analysis of artistic expressivity that is heavily indebted to work on intermodal perception. He says visual stimuli, like colours and shapes, can elicit experienced resemblances to sounds, smells and feelings. This enables viewers to know how an emotion feels by looking at the picture. The hypothesis is intriguing, but I show that his suggestion that we empathize with the pictorial content is implausible and that this exposes a flaw in the way his argument moves from experiential mappings to experiential-affective mappings. Consequently, I register some reservations about the way Green supposes we detect these cross-modal qualities
What is Holding Back Convnets for Detection?
Convolutional neural networks have recently shown excellent results in
general object detection and many other tasks. Albeit very effective, they
involve many user-defined design choices. In this paper we want to better
understand these choices by inspecting two key aspects "what did the network
learn?", and "what can the network learn?". We exploit new annotations
(Pascal3D+), to enable a new empirical analysis of the R-CNN detector. Despite
common belief, our results indicate that existing state-of-the-art convnet
architectures are not invariant to various appearance factors. In fact, all
considered networks have similar weak points which cannot be mitigated by
simply increasing the training data (architectural changes are needed). We show
that overall performance can improve when using image renderings for data
augmentation. We report the best known results on the Pascal3D+ detection and
view-point estimation tasks
- âŠ