88 research outputs found

    Consistent Video Filtering for Camera Arrays

    Get PDF
    International audienceVisual formats have advanced beyond single-view images and videos: 3D movies are commonplace, researchers have developed multi-view navigation systems, and VR is helping to push light field cameras to mass market. However, editing tools for these media are still nascent, and even simple filtering operations like color correction or stylization are problematic: naively applying image filters per frame or per view rarely produces satisfying results due to time and space inconsistencies. Our method preserves and stabilizes filter effects while being agnostic to the inner working of the filter. It captures filter effects in the gradient domain, then uses \emph{input} frame gradients as a reference to impose temporal and spatial consistency. Our least-squares formulation adds minimal overhead compared to naive data processing. Further, when filter cost is high, we introduce a filter transfer strategy that reduces the number of per-frame filtering computations by an order of magnitude, with only a small reduction in visual quality. We demonstrate our algorithm on several camera array formats including stereo videos, light fields, and wide baselines

    Faithful completion of images of scenic landmarks using internet images

    Get PDF
    Abstract—Previous works on image completion typically aim to produce visually plausible results rather than factually correct ones. In this paper, we propose an approach to faithfully complete the missing regions of an image. We assume that the input image is taken at a well-known landmark, so similar images taken at the same location can be easily found on the Internet. We first download thousands of images from the Internet using a text label provided by the user. Next, we apply two-step filtering to reduce them to a small set of candidate images for use as source images for completion. For each candidate image, a co-matching algorithm is used to find correspondences of both points and lines between the candidate image and the input image. These are used to find an optimal warp relating the two images. A completion result is obtained by blending the warped candidate image into the missing region of the input image. The completion results are ranked according to combination score, which considers both warping and blending energy, and the highest ranked ones are shown to the user. Experiments and results demonstrate that our method can faithfully complete images

    Multi-View Inpainting for Image-Based Scene Editing and Rendering

    Get PDF
    International audienceWe propose a method to remove objects such as people and cars from multi-view urban image datasets, enabling free-viewpoint Image-Based Rendering (IBR) in the edited scenes. Our method combines information from multi-view 3D reconstruction with image inpainting techniques, by formulating the problem as an optimization of a global patch-based objective function. We use IBR techniques to reproject information from neighboring views, and 3D multi-view stereo reconstruction to perform multi-view coherent initialization for inpainting of pixels not filled by reprojection. Our algorithm performs multi-view consistent inpainting for color and 3D by blending reprojections with patch-based image inpaint-ing. We run our algorithm on casually captured datasets, and Google Street View data, removing objects such as cars, people and pillars, showing that our approach produces results of sufficient quality for free-viewpoint IBR on " cleaned up " scenes, as well as IBR scene editing, such as limited displacement of real objects

    Visual analysis and synthesis with physically grounded constraints

    Get PDF
    The past decade has witnessed remarkable progress in image-based, data-driven vision and graphics. However, existing approaches often treat the images as pure 2D signals and not as a 2D projection of the physical 3D world. As a result, a lot of training examples are required to cover sufficiently diverse appearances and inevitably suffer from limited generalization capability. In this thesis, I propose "inference-by-composition" approaches to overcome these limitations by modeling and interpreting visual signals in terms of physical surface, object, and scene. I show how we can incorporate physically grounded constraints such as scene-specific geometry in a non-parametric optimization framework for (1) revealing the missing parts of an image due to removal of a foreground or background element, (2) recovering high spatial frequency details that are not resolvable in low-resolution observations. I then extend the framework from 2D images to handle spatio-temporal visual data (videos). I demonstrate that we can convincingly fill spatio-temporal holes in a temporally coherent fashion by jointly reconstructing the appearance and motion. Compared to existing approaches, our technique can synthesize physically plausible contents even in challenging videos. For visual analysis, I apply stereo camera constraints for discovering multiple approximately linear structures in extremely noisy videos with an ecological application to bird migration monitoring at night. The resulting algorithms are simple and intuitive while achieving state-of-the-art performance without the need of training on an exhaustive set of visual examples
    • …
    corecore