13,336 research outputs found
Unsupervised Discovery of Parts, Structure, and Dynamics
Humans easily recognize object parts and their hierarchical structure by
watching how they move; they can then predict how each part moves in the
future. In this paper, we propose a novel formulation that simultaneously
learns a hierarchical, disentangled object representation and a dynamics model
for object parts from unlabeled videos. Our Parts, Structure, and Dynamics
(PSD) model learns to, first, recognize the object parts via a layered image
representation; second, predict hierarchy via a structural descriptor that
composes low-level concepts into a hierarchical structure; and third, model the
system dynamics by predicting the future. Experiments on multiple real and
synthetic datasets demonstrate that our PSD model works well on all three
tasks: segmenting object parts, building their hierarchical structure, and
capturing their motion distributions.Comment: ICLR 2019. The first two authors contributed equally to this wor
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
A segmentation-based coding system allowing manipulation of objects (sesame)
We present a coding scheme that achieves, for each image in the sequence, the best segmentation in terms of rate-distortion theory. It is obtained from a set of initial regions and a set of available coding techniques. The segmentation combines spatial and motion criteria. It selects at each area of the image the most adequate criterion for defining a partition in order to obtain the best compromise between cost and quality. In addition, the proposed scheme is very suitable for addressing content-based functionalities.Peer ReviewedPostprint (published version
Segmentation-based video coding:temporals links
This paper analyzes the main elements that a segmentation-based video coding approach should be based on so that it can address coding efficiency and content-based functionalities. Such elements can be defined as temporal linking and rate control. The basic features of such elements are discussed and, in both cases, a specific implementation is proposed.Peer ReviewedPostprint (published version
STV-based Video Feature Processing for Action Recognition
In comparison to still image-based processes, video features can provide rich and intuitive information about dynamic events occurred over a period of time, such as human actions, crowd behaviours, and other subject pattern changes. Although substantial progresses have been made in the last decade on image processing and seen its successful applications in face matching and object recognition, video-based event detection still remains one of the most difficult challenges in computer vision research due to its complex continuous or discrete input signals, arbitrary dynamic feature definitions, and the often ambiguous analytical methods. In this paper, a Spatio-Temporal Volume (STV) and region intersection (RI) based 3D shape-matching method has been proposed to facilitate the definition and recognition of human actions recorded in videos. The distinctive characteristics and the performance gain of the devised approach stemmed from a coefficient factor-boosted 3D region intersection and matching mechanism developed in this research. This paper also reported the investigation into techniques for efficient STV data filtering to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle in the implemented system. The encouraging features and improvements on the operational performance registered in the experiments have been discussed at the end
Multiresolution hierarchy co-clustering for semantic segmentation in sequences with small variations
This paper presents a co-clustering technique that, given a collection of
images and their hierarchies, clusters nodes from these hierarchies to obtain a
coherent multiresolution representation of the image collection. We formalize
the co-clustering as a Quadratic Semi-Assignment Problem and solve it with a
linear programming relaxation approach that makes effective use of information
from hierarchies. Initially, we address the problem of generating an optimal,
coherent partition per image and, afterwards, we extend this method to a
multiresolution framework. Finally, we particularize this framework to an
iterative multiresolution video segmentation algorithm in sequences with small
variations. We evaluate the algorithm on the Video Occlusion/Object Boundary
Detection Dataset, showing that it produces state-of-the-art results in these
scenarios.Comment: International Conference on Computer Vision (ICCV) 201
- …