5,987 research outputs found
Click Carving: Segmenting Objects in Video with Point Clicks
We present a novel form of interactive video object segmentation where a few
clicks by the user helps the system produce a full spatio-temporal segmentation
of the object of interest. Whereas conventional interactive pipelines take the
user's initialization as a starting point, we show the value in the system
taking the lead even in initialization. In particular, for a given video frame,
the system precomputes a ranked list of thousands of possible segmentation
hypotheses (also referred to as object region proposals) using image and motion
cues. Then, the user looks at the top ranked proposals, and clicks on the
object boundary to carve away erroneous ones. This process iterates (typically
2-3 times), and each time the system revises the top ranked proposal set, until
the user is satisfied with a resulting segmentation mask. Finally, the mask is
propagated across the video to produce a spatio-temporal object tube. On three
challenging datasets, we provide extensive comparisons with both existing work
and simpler alternative methods. In all, the proposed Click Carving approach
strikes an excellent balance of accuracy and human effort. It outperforms all
similarly fast methods, and is competitive or better than those requiring 2 to
12 times the effort.Comment: A preliminary version of the material in this document was filed as
University of Texas technical report no. UT AI16-0
An Appearance-Based Framework for 3D Hand Shape Classification and Camera Viewpoint Estimation
An appearance-based framework for 3D hand shape classification and simultaneous camera viewpoint estimation is presented. Given an input image of a segmented hand, the most similar matches from a large database of synthetic hand images are retrieved. The ground truth labels of those matches, containing hand shape and camera viewpoint information, are returned by the system as estimates for the input image. Database retrieval is done hierarchically, by first quickly rejecting the vast majority of all database views, and then ranking the remaining candidates in order of similarity to the input. Four different similarity measures are employed, based on edge location, edge orientation, finger location and geometric moments.National Science Foundation (IIS-9912573, EIA-9809340
Coherent multi-dimensional segmentation of multiview images using a variational framework and applications to image based rendering
Image Based Rendering (IBR) and in particular light field rendering has attracted a lot of
attention for interpolating new viewpoints from a set of multiview images. New images of
a scene are interpolated directly from nearby available ones, thus enabling a photorealistic
rendering. Sampling theory for light fields has shown that exact geometric information
in the scene is often unnecessary for rendering new views. Indeed, the band of the function
is approximately limited and new views can be rendered using classical interpolation
methods. However, IBR using undersampled light fields suffers from aliasing effects and
is difficult particularly when the scene has large depth variations and occlusions. In order
to deal with these cases, we study two approaches:
New sampling schemes have recently emerged that are able to perfectly reconstruct
certain classes of parametric signals that are not bandlimited but characterized by a finite
number of parameters. In this context, we derive novel sampling schemes for piecewise
sinusoidal and polynomial signals. In particular, we show that a piecewise sinusoidal signal
with arbitrarily high frequencies can be exactly recovered given certain conditions. These
results are applied to parametric multiview data that are not bandlimited.
We also focus on the problem of extracting regions (or layers) in multiview images
that can be individually rendered free of aliasing. The problem is posed in a multidimensional
variational framework using region competition. In extension to previous
methods, layers are considered as multi-dimensional hypervolumes. Therefore the segmentation
is done jointly over all the images and coherence is imposed throughout the
data. However, instead of propagating active hypersurfaces, we derive a semi-parametric
methodology that takes into account the constraints imposed by the camera setup and the
occlusion ordering. The resulting framework is a global multi-dimensional region competition that is consistent in all the images and efficiently handles occlusions. We show the
validity of the approach with captured light fields. Other special effects such as augmented
reality and disocclusion of hidden objects are also demonstrated
- …