172 research outputs found
Fast Deep Matting for Portrait Animation on Mobile Phone
Image matting plays an important role in image and video editing. However,
the formulation of image matting is inherently ill-posed. Traditional methods
usually employ interaction to deal with the image matting problem with trimaps
and strokes, and cannot run on the mobile phone in real-time. In this paper, we
propose a real-time automatic deep matting approach for mobile devices. By
leveraging the densely connected blocks and the dilated convolution, a light
full convolutional network is designed to predict a coarse binary mask for
portrait images. And a feathering block, which is edge-preserving and matting
adaptive, is further developed to learn the guided filter and transform the
binary mask into alpha matte. Finally, an automatic portrait animation system
based on fast deep matting is built on mobile devices, which does not need any
interaction and can realize real-time matting with 15 fps. The experiments show
that the proposed approach achieves comparable results with the
state-of-the-art matting solvers.Comment: ACM Multimedia Conference (MM) 2017 camera-read
Edge-enhancing Filters with Negative Weights
In [DOI:10.1109/ICMEW.2014.6890711], a graph-based denoising is performed by
projecting the noisy image to a lower dimensional Krylov subspace of the graph
Laplacian, constructed using nonnegative weights determined by distances
between image data corresponding to image pixels. We~extend the construction of
the graph Laplacian to the case, where some graph weights can be negative.
Removing the positivity constraint provides a more accurate inference of a
graph model behind the data, and thus can improve quality of filters for
graph-based signal processing, e.g., denoising, compared to the standard
construction, without affecting the costs.Comment: 5 pages; 6 figures. Accepted to IEEE GlobalSIP 2015 conferenc
Object Discovery via Cohesion Measurement
Color and intensity are two important components in an image. Usually, groups
of image pixels, which are similar in color or intensity, are an informative
representation for an object. They are therefore particularly suitable for
computer vision tasks, such as saliency detection and object proposal
generation. However, image pixels, which share a similar real-world color, may
be quite different since colors are often distorted by intensity. In this
paper, we reinvestigate the affinity matrices originally used in image
segmentation methods based on spectral clustering. A new affinity matrix, which
is robust to color distortions, is formulated for object discovery. Moreover, a
Cohesion Measurement (CM) for object regions is also derived based on the
formulated affinity matrix. Based on the new Cohesion Measurement, a novel
object discovery method is proposed to discover objects latent in an image by
utilizing the eigenvectors of the affinity matrix. Then we apply the proposed
method to both saliency detection and object proposal generation. Experimental
results on several evaluation benchmarks demonstrate that the proposed CM based
method has achieved promising performance for these two tasks.Comment: 14 pages, 14 figure
초점 스택에서 3D 깊이 재구성 및 깊이 개선
학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2021. 2. 신영길.Three-dimensional (3D) depth recovery from two-dimensional images is a fundamental and challenging objective in computer vision, and is one of the most important prerequisites for many applications such as 3D measurement, robot location and navigation, self-driving, and so on. Depth-from-focus (DFF) is one of the important methods to reconstruct a 3D depth in the use of focus information. Reconstructing a 3D depth from texture-less regions is a typical issue associated with the conventional DFF. Further more, it is difficult for the conventional DFF reconstruction techniques to preserve depth edges and fine details while maintaining spatial consistency. In this dissertation, we address these problems and propose an DFF depth recovery framework which is robust over texture-less regions, and can reconstruct a depth image with clear edges and fine details.
The depth recovery framework proposed in this dissertation is composed of two processes: depth reconstruction and depth refinement. To recovery an accurate 3D depth, We first formulate the depth reconstruction as a maximum a posterior (MAP) estimation problem with the inclusion of matting Laplacian prior. The nonlocal principle is adopted during the construction stage of the matting Laplacian matrix to preserve depth edges and fine details. Additionally, a depth variance based confidence measure with the combination of the reliability measure of focus measure is proposed to maintain the spatial smoothness, such that the smooth depth regions in initial depth could have high confidence value and the reconstructed depth could be more derived from the initial depth. As the nonlocal principle breaks the spatial consistency, the reconstructed depth image is spatially inconsistent. Meanwhile, it suffers from texture-copy artifacts. To smooth the noise and suppress the texture-copy artifacts introduced in the reconstructed depth image, we propose a closed-form edge-preserving depth refinement algorithm that formulates the depth refinement as a MAP estimation problem using Markov random fields (MRFs). With the incorporation of pre-estimated depth edges and mutual structure information into our energy function and the specially designed smoothness weight, the proposed refinement method can effectively suppress noise and texture-copy artifacts while preserving depth edges. Additionally, with the construction of undirected weighted graph representing the energy function, a closed-form solution is obtained by using the Laplacian matrix corresponding to the graph.
The proposed framework presents a novel method of 3D depth recovery from a focal stack. The proposed algorithm shows the superiority in depth recovery over texture-less regions owing to the effective variance based confidence level computation and the matting Laplacian prior. Additionally, this proposed reconstruction method can obtain a depth image with clear edges and fine details due to the adoption of nonlocal principle in the construct]ion of matting Laplacian matrix. The proposed closed-form depth refinement approach shows that the ability in noise removal while preserving object structure with the usage of common edges. Additionally, it is able to effectively suppress texture-copy artifacts by utilizing mutual structure information. The proposed depth refinement provides a general idea for edge-preserving image smoothing, especially for depth related refinement such as stereo vision.
Both quantitative and qualitative experimental results show the supremacy of the proposed method in terms of robustness in texture-less regions, accuracy, and ability to preserve object structure while maintaining spatial smoothness.Chapter 1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 2 Related Works 9
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Principle of depth-from-focus . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Focus measure operators . . . . . . . . . . . . . . . . . . . 12
2.3 Depth-from-focus reconstruction . . . . . . . . . . . . . . . . . . 14
2.4 Edge-preserving image denoising . . . . . . . . . . . . . . . . . . 23
Chapter 3 Depth-from-Focus Reconstruction using Nonlocal Matting Laplacian Prior 38
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Image matting and matting Laplacian . . . . . . . . . . . . . . . 40
3.3 Depth-from-focus . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Depth reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . 47
3.4.2 Likelihood model . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.3 Nonlocal matting Laplacian prior model . . . . . . . . . . 50
3.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5.2 Data configuration . . . . . . . . . . . . . . . . . . . . . . 55
3.5.3 Reconstruction results . . . . . . . . . . . . . . . . . . . . 56
3.5.4 Comparison between reconstruction using local and nonlocal matting Laplacian . . . . . . . . . . . . . . . . . . . 56
3.5.5 Spatial consistency analysis . . . . . . . . . . . . . . . . . 59
3.5.6 Parameter setting and analysis . . . . . . . . . . . . . . . 59
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Chapter 4 Closed-form MRF-based Depth Refinement 63
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Closed-form solution . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 Edge preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5 Texture-copy artifacts suppression . . . . . . . . . . . . . . . . . 73
4.6 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Chapter 5 Evaluation 82
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.3 Evaluation on synthetic datasets . . . . . . . . . . . . . . . . . . 84
5.4 Evaluation on real scene datasets . . . . . . . . . . . . . . . . . . 89
5.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.6 Computational performances . . . . . . . . . . . . . . . . . . . . 93
Chapter 6 Conclusion 96
Bibliography 99Docto
CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer
Content affinity loss including feature and pixel affinity is a main problem
which leads to artifacts in photorealistic and video style transfer. This paper
proposes a new framework named CAP-VSTNet, which consists of a new reversible
residual network and an unbiased linear transform module, for versatile style
transfer. This reversible residual network can not only preserve content
affinity but not introduce redundant information as traditional reversible
networks, and hence facilitate better stylization. Empowered by Matting
Laplacian training loss which can address the pixel affinity loss problem led
by the linear transform, the proposed framework is applicable and effective on
versatile style transfer. Extensive experiments show that CAP-VSTNet can
produce better qualitative and quantitative results in comparison with the
state-of-the-art methods.Comment: CVPR 202
Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation
Natural image matting is an important problem in computer vision and
graphics. It is an ill-posed problem when only an input image is available
without any external information. While the recent deep learning approaches
have shown promising results, they only estimate the alpha matte. This paper
presents a context-aware natural image matting method for simultaneous
foreground and alpha matte estimation. Our method employs two encoder networks
to extract essential information for matting. Particularly, we use a matting
encoder to learn local features and a context encoder to obtain more global
context information. We concatenate the outputs from these two encoders and
feed them into decoder networks to simultaneously estimate the foreground and
alpha matte. To train this whole deep neural network, we employ both the
standard Laplacian loss and the feature loss: the former helps to achieve high
numerical performance while the latter leads to more perceptually plausible
results. We also report several data augmentation strategies that greatly
improve the network's generalization performance. Our qualitative and
quantitative experiments show that our method enables high-quality matting for
a single natural image. Our inference codes and models have been made publicly
available at https://github.com/hqqxyy/Context-Aware-Matting.Comment: This is the camera ready version of ICCV2019 pape
Segmentation by transduction
International audienceThis paper addresses the problem of segmenting an image into regions consistent with user-supplied seeds (e.g., a sparse set of broad brush strokes). We view this task as a statistical transductive inference, in which some pixels are already associated with given zones and the remaining ones need to be classified. Our method relies on the Laplacian graph regularizer, a powerful manifold learning tool that is based on the estimation of variants of the Laplace-Beltrami operator and is tightly related to diffusion processes. Segmentation is modeled as the task of finding matting coefficients for unclassified pixels given known matting coefficients for seed pixels. The proposed algorithm essentially relies on a high margin assumption in the space of pixel characteristics. It is simple, fast, and accurate, as demonstrated by qualitative results on natural images and a quantitative comparison with state-of-the-art methods on the Microsoft GrabCut segmentation database
- …