27 research outputs found
Semantic Human Matting
Human matting, high quality extraction of humans from natural images, is
crucial for a wide variety of applications. Since the matting problem is
severely under-constrained, most previous methods require user interactions to
take user designated trimaps or scribbles as constraints. This user-in-the-loop
nature makes them difficult to be applied to large scale data or time-sensitive
scenarios. In this paper, instead of using explicit user input constraints, we
employ implicit semantic constraints learned from data and propose an automatic
human matting algorithm (SHM). SHM is the first algorithm that learns to
jointly fit both semantic information and high quality details with deep
networks. In practice, simultaneously learning both coarse semantics and fine
details is challenging. We propose a novel fusion strategy which naturally
gives a probabilistic estimation of the alpha matte. We also construct a very
large dataset with high quality annotations consisting of 35,513 unique
foregrounds to facilitate the learning and evaluation of human matting.
Extensive experiments on this dataset and plenty of real images show that SHM
achieves comparable results with state-of-the-art interactive matting methods.Comment: ACM Multimedia 201
Boosting Semantic Human Matting with Coarse Annotations
Semantic human matting aims to estimate the per-pixel opacity of the
foreground human regions. It is quite challenging and usually requires user
interactive trimaps and plenty of high quality annotated data. Annotating such
kind of data is labor intensive and requires great skills beyond normal users,
especially considering the very detailed hair part of humans. In contrast,
coarse annotated human dataset is much easier to acquire and collect from the
public dataset. In this paper, we propose to use coarse annotated data coupled
with fine annotated data to boost end-to-end semantic human matting without
trimaps as extra input. Specifically, we train a mask prediction network to
estimate the coarse semantic mask using the hybrid data, and then propose a
quality unification network to unify the quality of the previous coarse mask
outputs. A matting refinement network takes in the unified mask and the input
image to predict the final alpha matte. The collected coarse annotated dataset
enriches our dataset significantly, allows generating high quality alpha matte
for real images. Experimental results show that the proposed method performs
comparably against state-of-the-art methods. Moreover, the proposed method can
be used for refining coarse annotated public dataset, as well as semantic
segmentation methods, which reduces the cost of annotating high quality human
data to a great extent
Inductive Guided Filter: Real-time Deep Image Matting with Weakly Annotated Masks on Mobile Devices
Recently, significant progress has been achieved in deep image matting. Most
of the classical image matting methods are time-consuming and require an ideal
trimap which is difficult to attain in practice. A high efficient image matting
method based on a weakly annotated mask is in demand for mobile applications.
In this paper, we propose a novel method based on Deep Learning and Guided
Filter, called Inductive Guided Filter, which can tackle the real-time general
image matting task on mobile devices. We design a lightweight hourglass network
to parameterize the original Guided Filter method that takes an image and a
weakly annotated mask as input. Further, the use of Gabor loss is proposed for
training networks for complicated textures in image matting. Moreover, we
create an image matting dataset MAT-2793 with a variety of foreground objects.
Experimental results demonstrate that our proposed method massively reduces
running time with robust accuracy
Salient Image Matting
In this paper, we propose an image matting framework called Salient Image
Matting to estimate the per-pixel opacity value of the most salient foreground
in an image. To deal with a large amount of semantic diversity in images, a
trimap is conventionally required as it provides important guidance about
object semantics to the matting process. However, creating a good trimap is
often expensive and timeconsuming. The SIM framework simultaneously deals with
the challenge of learning a wide range of semantics and salient object types in
a fully automatic and an end to end manner. Specifically, our framework is able
to produce accurate alpha mattes for a wide range of foreground objects and
cases where the foreground class, such as human, appears in a very different
context than the train data directly from an RGB input. This is done by
employing a salient object detection model to produce a trimap of the most
salient object in the image in order to guide the matting model about
higher-level object semantics. Our framework leverages large amounts of coarse
annotations coupled with a heuristic trimap generation scheme to train the
trimap prediction network so it can produce trimaps for arbitrary foregrounds.
Moreover, we introduce a multi-scale fusion architecture for the task of
matting to better capture finer, low-level opacity semantics. With high-level
guidance provided by the trimap network, our framework requires only a fraction
of expensive matting data as compared to other automatic methods while being
able to produce alpha mattes for a diverse range of inputs. We demonstrate our
framework on a range of diverse images and experimental results show our
framework compares favourably against state of art matting methods without the
need for a trima
Boundary-Aware Network for Fast and High-Accuracy Portrait Segmentation
Compared with other semantic segmentation tasks, portrait segmentation
requires both higher precision and faster inference speed. However, this
problem has not been well studied in previous works. In this paper, we propose
a lightweight network architecture, called Boundary-Aware Network (BANet) which
selectively extracts detail information in boundary area to make high-quality
segmentation output with real-time( >25FPS) speed. In addition, we design a new
loss function called refine loss which supervises the network with image level
gradient information. Our model is able to produce finer segmentation results
which has richer details than annotations
Improved Image Matting via Real-time User Clicks and Uncertainty Estimation
Image matting is a fundamental and challenging problem in computer vision and
graphics. Most existing matting methods leverage a user-supplied trimap as an
auxiliary input to produce good alpha matte. However, obtaining high-quality
trimap itself is arduous, thus restricting the application of these methods.
Recently, some trimap-free methods have emerged, however, the matting quality
is still far behind the trimap-based methods. The main reason is that, without
the trimap guidance in some cases, the target network is ambiguous about which
is the foreground target. In fact, choosing the foreground is a subjective
procedure and depends on the user's intention. To this end, this paper proposes
an improved deep image matting framework which is trimap-free and only needs
several user click interactions to eliminate the ambiguity. Moreover, we
introduce a new uncertainty estimation module that can predict which parts need
polishing and a following local refinement module. Based on the computation
budget, users can choose how many local parts to improve with the uncertainty
guidance. Quantitative and qualitative results show that our method performs
better than existing trimap-free methods and comparably to state-of-the-art
trimap-based methods with minimal user effort.Comment: Accepted by IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 202
Alpha Matte Generation from Single Input for Portrait Matting
Portrait matting is an important research problem with a wide range of
applications, such as video conference app, image/video editing, and
post-production. The goal is to predict an alpha matte that identifies the
effect of each pixel on the foreground subject. Traditional approaches and most
of the existing works utilized an additional input, e.g., trimap, background
image, to predict alpha matte. However, providing additional input is not
always practical. Besides, models are too sensitive to these additional inputs.
In this paper, we introduce an additional input-free approach to perform
portrait matting using Generative Adversarial Nets (GANs). We divide the main
task into two subtasks. For this, we propose a segmentation network for the
person segmentation and the alpha generation network for alpha matte
prediction. While the segmentation network takes an input image and produces a
coarse segmentation map, the alpha generation network utilizes the same input
image as well as a coarse segmentation map that is produced by the segmentation
network to predict the alpha matte. Besides, we present a segmentation encoding
block to downsample the coarse segmentation map and provide feature
representation to the residual block. Furthermore, we propose border loss to
penalize only the borders of the subject separately which is more likely to be
challenging and we also adapt perceptual loss for portrait matting. To train
the proposed system, we combine two different popular training datasets to
improve the amount of data as well as diversity to address domain shift
problems in the inference time. We tested our model on three different
benchmark datasets, namely Adobe Image Matting dataset, Portrait Matting
dataset, and Distinctions dataset. The proposed method outperformed the MODNet
method that also takes a single input
Automatically Extract the Semi-transparent Motion-blurred Hand from a Single Image
When we use video chat, video game, or other video applications,
motion-blurred hands often appear. Accurately extracting these hands is very
useful for video editing and behavior analysis. However, existing
motion-blurred object extraction methods either need user interactions, such as
user supplied trimaps and scribbles, or need additional information, such as
background images. In this paper, a novel method which can automatically
extract the semi-transparent motion-blurred hand just according to the original
RGB image is proposed. The proposed method separates the extraction task into
two subtasks: alpha matte prediction and foreground prediction. These two
subtasks are implemented by Xception based encoder-decoder networks. The
extracted motion-blurred hand images can be calculated by multiplying the
predicted alpha mattes and foreground images. Experiments on synthetic and real
datasets show that the proposed method has promising performance
Bridging Composite and Real: Towards End-to-end Deep Image Matting
Extracting accurate foregrounds from natural images benefits many downstream
applications such as film production and augmented reality. However, the furry
characteristics and various appearance of the foregrounds, e.g., animal and
portrait, challenge existing matting methods, which usually require extra user
inputs such as trimap or scribbles. To resolve these problems, we study the
distinct roles of semantics and details for image matting and decompose the
task into two parallel sub-tasks: high-level semantic segmentation and
low-level details matting. Specifically, we propose a novel Glance and Focus
Matting network (GFM), which employs a shared encoder and two separate decoders
to learn both tasks in a collaborative manner for end-to-end natural image
matting. Besides, due to the limitation of available natural images in the
matting task, previous methods typically adopt composite images for training
and evaluation, which result in limited generalization ability on real-world
images. In this paper, we investigate the domain gap issue between composite
images and real-world images systematically by conducting comprehensive
analyses of various discrepancies between the foreground and background images.
We find that a carefully designed composition route RSSN that aims to reduce
the discrepancies can lead to a better model with remarkable generalization
ability. Furthermore, we provide a benchmark containing 2,000 high-resolution
real-world animal images and 10,000 portrait images along with their manually
labeled alpha mattes to serve as a test bed for evaluating matting model's
generalization ability on real-world images. Comprehensive empirical studies
have demonstrated that GFM outperforms state-of-the-art methods and effectively
reduces the generalization error. The code and the datasets will be released at
https://github.com/JizhiziLi/GFM.Comment: Accepted by the International Journal of Computer Vision (IJCV). Both
the datasets and source code are available at
https://github.com/JizhiziLi/GF
PP-Matting: High-Accuracy Natural Image Matting
Natural image matting is a fundamental and challenging computer vision task.
It has many applications in image editing and composition. Recently, deep
learning-based approaches have achieved great improvements in image matting.
However, most of them require a user-supplied trimap as an auxiliary input,
which limits the matting applications in the real world. Although some
trimap-free approaches have been proposed, the matting quality is still
unsatisfactory compared to trimap-based ones. Without the trimap guidance, the
matting models suffer from foreground-background ambiguity easily, and also
generate blurry details in the transition area. In this work, we propose
PP-Matting, a trimap-free architecture that can achieve high-accuracy natural
image matting. Our method applies a high-resolution detail branch (HRDB) that
extracts fine-grained details of the foreground with keeping feature resolution
unchanged. Also, we propose a semantic context branch (SCB) that adopts a
semantic segmentation subtask. It prevents the detail prediction from local
ambiguity caused by semantic context missing. In addition, we conduct extensive
experiments on two well-known benchmarks: Composition-1k and Distinctions-646.
The results demonstrate the superiority of PP-Matting over previous methods.
Furthermore, we provide a qualitative evaluation of our method on human matting
which shows its outstanding performance in the practical application. The code
and pre-trained models will be available at PaddleSeg:
https://github.com/PaddlePaddle/PaddleSeg