50 research outputs found
High-Resolution Deep Image Matting
Image matting is a key technique for image and video editing and composition.
Conventionally, deep learning approaches take the whole input image and an
associated trimap to infer the alpha matte using convolutional neural networks.
Such approaches set state-of-the-arts in image matting; however, they may fail
in real-world matting applications due to hardware limitations, since
real-world input images for matting are mostly of very high resolution. In this
paper, we propose HDMatt, a first deep learning based image matting approach
for high-resolution inputs. More concretely, HDMatt runs matting in a
patch-based crop-and-stitch manner for high-resolution inputs with a novel
module design to address the contextual dependency and consistency issues
between different patches. Compared with vanilla patch-based inference which
computes each patch independently, we explicitly model the cross-patch
contextual dependency with a newly-proposed Cross-Patch Contextual module (CPC)
guided by the given trimap. Extensive experiments demonstrate the effectiveness
of the proposed method and its necessity for high-resolution inputs. Our HDMatt
approach also sets new state-of-the-art performance on Adobe Image Matting and
AlphaMatting benchmarks and produce impressive visual results on more
real-world high-resolution images.Comment: AAAI 202
Deep Image Matting: A Comprehensive Survey
Image matting refers to extracting precise alpha matte from natural images,
and it plays a critical role in various downstream applications, such as image
editing. Despite being an ill-posed problem, traditional methods have been
trying to solve it for decades. The emergence of deep learning has
revolutionized the field of image matting and given birth to multiple new
techniques, including automatic, interactive, and referring image matting. This
paper presents a comprehensive review of recent advancements in image matting
in the era of deep learning. We focus on two fundamental sub-tasks: auxiliary
input-based image matting, which involves user-defined input to predict the
alpha matte, and automatic image matting, which generates results without any
manual intervention. We systematically review the existing methods for these
two tasks according to their task settings and network structures and provide a
summary of their advantages and disadvantages. Furthermore, we introduce the
commonly used image matting datasets and evaluate the performance of
representative matting methods both quantitatively and qualitatively. Finally,
we discuss relevant applications of image matting and highlight existing
challenges and potential opportunities for future research. We also maintain a
public repository to track the rapid development of deep image matting at
https://github.com/JizhiziLi/matting-survey
A Novel Solution of Using Mixed Reality in Bowel and Oral and Maxillofacial Surgical Telepresence: 3D Mean Value Cloning algorithm
Background and aim: Most of the Mixed Reality models used in the surgical
telepresence are suffering from discrepancies in the boundary area and
spatial-temporal inconsistency due to the illumination variation in the video
frames. The aim behind this work is to propose a new solution that helps
produce the composite video by merging the augmented video of the surgery site
and the virtual hand of the remote expertise surgeon. The purpose of the
proposed solution is to decrease the processing time and enhance the accuracy
of merged video by decreasing the overlay and visualization error and removing
occlusion and artefacts. Methodology: The proposed system enhanced the mean
value cloning algorithm that helps to maintain the spatial-temporal consistency
of the final composite video. The enhanced algorithm includes the 3D mean value
coordinates and improvised mean value interpolant in the image cloning process,
which helps to reduce the sawtooth, smudging and discolouration artefacts
around the blending region. Results: As compared to the state of the art
solution, the accuracy in terms of overlay error of the proposed solution is
improved from 1.01mm to 0.80mm whereas the accuracy in terms of visualization
error is improved from 98.8% to 99.4%. The processing time is reduced to 0.173
seconds from 0.211 seconds. Conclusion: Our solution helps make the object of
interest consistent with the light intensity of the target image by adding the
space distance that helps maintain the spatial consistency in the final merged
video.Comment: 27 page
How automated image analysis techniques help scientists in species identification and classification?
Identification of taxonomy at a specific level is time consuming and reliant upon expert ecologists. Hence the demand for automated species identification increÂased over the last two decades. Automation of data classification is primarily focussed on images while incorporating and analysing image data has recently become easier due to developments in computational technology. Research efÂforts on identification of species include specimens’ image processing, extraction of identical features, followed by classifying them into correct categories. In this paper, we discuss recent automated species identification systems, mainly for categorising and evaluating their methods. We reviewed and compared different methods in step by step scheme of automated identification and classification systems of species images. The selection of methods is influenced by many variables such as level of classification, number of training data and complexity of images. The aim of writing this paper is to provide researchers and scientists an extensive background study on work related to automated species identification, focusing on pattern recognition techniques in building such systems for biodiversity studies. (Folia Morphol 2018; 77, 2: 179–193
Efficient Deep Networks for Image Matting
Image matting is a fundamental technology serving downstream image editing tasks such as composition and harmonization. Given an image, its goal is to predict an accu- rate alpha matte with minimum manual e orts. Since matting applications are usually on PC or mobile devices, a high standard for e cient computation and storage is set. Thus, lightweight and e cient models are in demand. However, it is non-trivial to bal- ance the computation and the performance. We therefore investigate e cient model designs for image matting. We rst look into the common encoder-decoder architecture with a lightweight backbone and explore the skipped information and downsampling- upsampling operations, from which we notice the importance of indices kept in the encoder and recovered in the decoder. Based on the observations, we design data- dependant downsampling and upsampling operators conditioned on features from the encoder, which learn to index and show signi cant improvement against the baseline model while promising a lightweight structure. Then, considering a nity is widely used in both traditional and deep matting methods, we propose upsampling operators conditioned on the second-order a nity information, termed a nity-aware upsampling. Instead of modeling a nity in an additional module, we include it in the unavoidable upsampling stages for a compact architecture. Through implementing the operator by a low-rank bilinear model, we achieve signi cantly better results with only neglectable parameter increases. Further, we explore the robustness of matting algorithms and raise a more generalizable method. It includes designing a new framework assembling mul- tilevel context information and studying strong data augmentation strategies targeting matting. This method shows signi cantly higher robustness to various benchmarks, real-world images, and coarse-to- ne trimap precision compared with other methods while using less computation. Besides studying trimap-based image matting, we extend our lightweight matting architecture to portrait matting. Targeting portrait images, we propose a multi-task parameter sharing framework, where trimap generation and matting are treated as parallel tasks and help optimize each other. Compared with the conventional cascaded architecture, this design not only reduces the model capacity to a large margin but also presents more precise predictions.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202