85 research outputs found
Where and Who? Automatic Semantic-Aware Person Composition
Image compositing is a method used to generate realistic yet fake imagery by
inserting contents from one image to another. Previous work in compositing has
focused on improving appearance compatibility of a user selected foreground
segment and a background image (i.e. color and illumination consistency). In
this work, we instead develop a fully automated compositing model that
additionally learns to select and transform compatible foreground segments from
a large collection given only an input image background. To simplify the task,
we restrict our problem by focusing on human instance composition, because
human segments exhibit strong correlations with their background and because of
the availability of large annotated data. We develop a novel branching
Convolutional Neural Network (CNN) that jointly predicts candidate person
locations given a background image. We then use pre-trained deep feature
representations to retrieve person instances from a large segment database.
Experimental results show that our model can generate composite images that
look visually convincing. We also develop a user interface to demonstrate the
potential application of our method.Comment: 10 pages, 9 figure
Deep Image Matting: A Comprehensive Survey
Image matting refers to extracting precise alpha matte from natural images,
and it plays a critical role in various downstream applications, such as image
editing. Despite being an ill-posed problem, traditional methods have been
trying to solve it for decades. The emergence of deep learning has
revolutionized the field of image matting and given birth to multiple new
techniques, including automatic, interactive, and referring image matting. This
paper presents a comprehensive review of recent advancements in image matting
in the era of deep learning. We focus on two fundamental sub-tasks: auxiliary
input-based image matting, which involves user-defined input to predict the
alpha matte, and automatic image matting, which generates results without any
manual intervention. We systematically review the existing methods for these
two tasks according to their task settings and network structures and provide a
summary of their advantages and disadvantages. Furthermore, we introduce the
commonly used image matting datasets and evaluate the performance of
representative matting methods both quantitatively and qualitatively. Finally,
we discuss relevant applications of image matting and highlight existing
challenges and potential opportunities for future research. We also maintain a
public repository to track the rapid development of deep image matting at
https://github.com/JizhiziLi/matting-survey
FactorMatte: Redefining Video Matting for Re-Composition Tasks
We propose "factor matting", an alternative formulation of the video matting
problem in terms of counterfactual video synthesis that is better suited for
re-composition tasks. The goal of factor matting is to separate the contents of
video into independent components, each visualizing a counterfactual version of
the scene where contents of other components have been removed. We show that
factor matting maps well to a more general Bayesian framing of the matting
problem that accounts for complex conditional interactions between layers.
Based on this observation, we present a method for solving the factor matting
problem that produces useful decompositions even for video with complex
cross-layer interactions like splashes, shadows, and reflections. Our method is
trained per-video and requires neither pre-training on external large datasets,
nor knowledge about the 3D structure of the scene. We conduct extensive
experiments, and show that our method not only can disentangle scenes with
complex interactions, but also outperforms top methods on existing tasks such
as classical video matting and background subtraction. In addition, we
demonstrate the benefits of our approach on a range of downstream tasks. Please
refer to our project webpage for more details: https://factormatte.github.ioComment: Project webpage: https://factormatte.github.i
Virtual Occlusions Through Implicit Depth
For augmented reality (AR), it is important that virtual assets appear to 'sit among' real world objects. The virtual element should variously occlude and be occluded by real matter, based on a plausible depth ordering. This occlusion should be consistent over time as the viewer's camera moves. Unfortunately, small mistakes in the estimated scene depth can ruin the downstream occlusion mask, and thereby the AR illusion. Especially in real-time settings, depths inferred near boundaries or across time can be inconsistent. In this paper, we challenge the need for depth-regression as an intermediate step. We instead propose an implicit model for depth and use that to predict the occlusion mask directly. The inputs to our network are one or more color images, plus the known depths of any virtual geometry. We show how our occlusion predictions are more accurate and more temporally stable than predictions derived from traditional depth-estimation models. We obtain state-of-the-art occlusion results on the challenging ScanNetv2 dataset and superior qualitative results on real scenes
- …