5,987 research outputs found

    Click Carving: Segmenting Objects in Video with Point Clicks

    Full text link
    We present a novel form of interactive video object segmentation where a few clicks by the user helps the system produce a full spatio-temporal segmentation of the object of interest. Whereas conventional interactive pipelines take the user's initialization as a starting point, we show the value in the system taking the lead even in initialization. In particular, for a given video frame, the system precomputes a ranked list of thousands of possible segmentation hypotheses (also referred to as object region proposals) using image and motion cues. Then, the user looks at the top ranked proposals, and clicks on the object boundary to carve away erroneous ones. This process iterates (typically 2-3 times), and each time the system revises the top ranked proposal set, until the user is satisfied with a resulting segmentation mask. Finally, the mask is propagated across the video to produce a spatio-temporal object tube. On three challenging datasets, we provide extensive comparisons with both existing work and simpler alternative methods. In all, the proposed Click Carving approach strikes an excellent balance of accuracy and human effort. It outperforms all similarly fast methods, and is competitive or better than those requiring 2 to 12 times the effort.Comment: A preliminary version of the material in this document was filed as University of Texas technical report no. UT AI16-0

    An Appearance-Based Framework for 3D Hand Shape Classification and Camera Viewpoint Estimation

    Full text link
    An appearance-based framework for 3D hand shape classification and simultaneous camera viewpoint estimation is presented. Given an input image of a segmented hand, the most similar matches from a large database of synthetic hand images are retrieved. The ground truth labels of those matches, containing hand shape and camera viewpoint information, are returned by the system as estimates for the input image. Database retrieval is done hierarchically, by first quickly rejecting the vast majority of all database views, and then ranking the remaining candidates in order of similarity to the input. Four different similarity measures are employed, based on edge location, edge orientation, finger location and geometric moments.National Science Foundation (IIS-9912573, EIA-9809340

    Coherent multi-dimensional segmentation of multiview images using a variational framework and applications to image based rendering

    No full text
    Image Based Rendering (IBR) and in particular light field rendering has attracted a lot of attention for interpolating new viewpoints from a set of multiview images. New images of a scene are interpolated directly from nearby available ones, thus enabling a photorealistic rendering. Sampling theory for light fields has shown that exact geometric information in the scene is often unnecessary for rendering new views. Indeed, the band of the function is approximately limited and new views can be rendered using classical interpolation methods. However, IBR using undersampled light fields suffers from aliasing effects and is difficult particularly when the scene has large depth variations and occlusions. In order to deal with these cases, we study two approaches: New sampling schemes have recently emerged that are able to perfectly reconstruct certain classes of parametric signals that are not bandlimited but characterized by a finite number of parameters. In this context, we derive novel sampling schemes for piecewise sinusoidal and polynomial signals. In particular, we show that a piecewise sinusoidal signal with arbitrarily high frequencies can be exactly recovered given certain conditions. These results are applied to parametric multiview data that are not bandlimited. We also focus on the problem of extracting regions (or layers) in multiview images that can be individually rendered free of aliasing. The problem is posed in a multidimensional variational framework using region competition. In extension to previous methods, layers are considered as multi-dimensional hypervolumes. Therefore the segmentation is done jointly over all the images and coherence is imposed throughout the data. However, instead of propagating active hypersurfaces, we derive a semi-parametric methodology that takes into account the constraints imposed by the camera setup and the occlusion ordering. The resulting framework is a global multi-dimensional region competition that is consistent in all the images and efficiently handles occlusions. We show the validity of the approach with captured light fields. Other special effects such as augmented reality and disocclusion of hidden objects are also demonstrated
    • …
    corecore