3,762 research outputs found
Fast Preprocessing for Robust Face Sketch Synthesis
Exemplar-based face sketch synthesis methods usually meet the challenging
problem that input photos are captured in different lighting conditions from
training photos. The critical step causing the failure is the search of similar
patch candidates for an input photo patch. Conventional illumination invariant
patch distances are adopted rather than directly relying on pixel intensity
difference, but they will fail when local contrast within a patch changes. In
this paper, we propose a fast preprocessing method named Bidirectional
Luminance Remapping (BLR), which interactively adjust the lighting of training
and input photos. Our method can be directly integrated into state-of-the-art
exemplar-based methods to improve their robustness with ignorable computational
cost.Comment: IJCAI 2017. Project page:
http://www.cs.cityu.edu.hk/~yibisong/ijcai17_sketch/index.htm
Visual Object Tracking: The Initialisation Problem
Model initialisation is an important component of object tracking. Tracking
algorithms are generally provided with the first frame of a sequence and a
bounding box (BB) indicating the location of the object. This BB may contain a
large number of background pixels in addition to the object and can lead to
parts-based tracking algorithms initialising their object models in background
regions of the BB. In this paper, we tackle this as a missing labels problem,
marking pixels sufficiently away from the BB as belonging to the background and
learning the labels of the unknown pixels. Three techniques, One-Class SVM
(OC-SVM), Sampled-Based Background Model (SBBM) (a novel background model based
on pixel samples), and Learning Based Digital Matting (LBDM), are adapted to
the problem. These are evaluated with leave-one-video-out cross-validation on
the VOT2016 tracking benchmark. Our evaluation shows both OC-SVMs and SBBM are
capable of providing a good level of segmentation accuracy but are too
parameter-dependent to be used in real-world scenarios. We show that LBDM
achieves significantly increased performance with parameters selected by cross
validation and we show that it is robust to parameter variation.Comment: 15th Conference on Computer and Robot Vision (CRV 2018). Source code
available at https://github.com/georgedeath/initialisation-proble
Where and Who? Automatic Semantic-Aware Person Composition
Image compositing is a method used to generate realistic yet fake imagery by
inserting contents from one image to another. Previous work in compositing has
focused on improving appearance compatibility of a user selected foreground
segment and a background image (i.e. color and illumination consistency). In
this work, we instead develop a fully automated compositing model that
additionally learns to select and transform compatible foreground segments from
a large collection given only an input image background. To simplify the task,
we restrict our problem by focusing on human instance composition, because
human segments exhibit strong correlations with their background and because of
the availability of large annotated data. We develop a novel branching
Convolutional Neural Network (CNN) that jointly predicts candidate person
locations given a background image. We then use pre-trained deep feature
representations to retrieve person instances from a large segment database.
Experimental results show that our model can generate composite images that
look visually convincing. We also develop a user interface to demonstrate the
potential application of our method.Comment: 10 pages, 9 figure
Towards Ghost-free Shadow Removal via Dual Hierarchical Aggregation Network and Shadow Matting GAN
Shadow removal is an essential task for scene understanding. Many studies
consider only matching the image contents, which often causes two types of
ghosts: color in-consistencies in shadow regions or artifacts on shadow
boundaries. In this paper, we tackle these issues in two ways. First, to
carefully learn the border artifacts-free image, we propose a novel network
structure named the dual hierarchically aggregation network~(DHAN). It contains
a series of growth dilated convolutions as the backbone without any
down-samplings, and we hierarchically aggregate multi-context features for
attention and prediction, respectively. Second, we argue that training on a
limited dataset restricts the textural understanding of the network, which
leads to the shadow region color in-consistencies. Currently, the largest
dataset contains 2k+ shadow/shadow-free image pairs. However, it has only 0.1k+
unique scenes since many samples share exactly the same background with
different shadow positions. Thus, we design a shadow matting generative
adversarial network~(SMGAN) to synthesize realistic shadow mattings from a
given shadow mask and shadow-free image. With the help of novel masks or
scenes, we enhance the current datasets using synthesized shadow images.
Experiments show that our DHAN can erase the shadows and produce high-quality
ghost-free images. After training on the synthesized and real datasets, our
network outperforms other state-of-the-art methods by a large margin. The code
is available: http://github.com/vinthony/ghost-free-shadow-removal/Comment: Accepted by AAAI 202
FactorMatte: Redefining Video Matting for Re-Composition Tasks
We propose "factor matting", an alternative formulation of the video matting
problem in terms of counterfactual video synthesis that is better suited for
re-composition tasks. The goal of factor matting is to separate the contents of
video into independent components, each visualizing a counterfactual version of
the scene where contents of other components have been removed. We show that
factor matting maps well to a more general Bayesian framing of the matting
problem that accounts for complex conditional interactions between layers.
Based on this observation, we present a method for solving the factor matting
problem that produces useful decompositions even for video with complex
cross-layer interactions like splashes, shadows, and reflections. Our method is
trained per-video and requires neither pre-training on external large datasets,
nor knowledge about the 3D structure of the scene. We conduct extensive
experiments, and show that our method not only can disentangle scenes with
complex interactions, but also outperforms top methods on existing tasks such
as classical video matting and background subtraction. In addition, we
demonstrate the benefits of our approach on a range of downstream tasks. Please
refer to our project webpage for more details: https://factormatte.github.ioComment: Project webpage: https://factormatte.github.i
- …