50 research outputs found

    High-Resolution Deep Image Matting

    Full text link
    Image matting is a key technique for image and video editing and composition. Conventionally, deep learning approaches take the whole input image and an associated trimap to infer the alpha matte using convolutional neural networks. Such approaches set state-of-the-arts in image matting; however, they may fail in real-world matting applications due to hardware limitations, since real-world input images for matting are mostly of very high resolution. In this paper, we propose HDMatt, a first deep learning based image matting approach for high-resolution inputs. More concretely, HDMatt runs matting in a patch-based crop-and-stitch manner for high-resolution inputs with a novel module design to address the contextual dependency and consistency issues between different patches. Compared with vanilla patch-based inference which computes each patch independently, we explicitly model the cross-patch contextual dependency with a newly-proposed Cross-Patch Contextual module (CPC) guided by the given trimap. Extensive experiments demonstrate the effectiveness of the proposed method and its necessity for high-resolution inputs. Our HDMatt approach also sets new state-of-the-art performance on Adobe Image Matting and AlphaMatting benchmarks and produce impressive visual results on more real-world high-resolution images.Comment: AAAI 202

    Deep Image Matting: A Comprehensive Survey

    Full text link
    Image matting refers to extracting precise alpha matte from natural images, and it plays a critical role in various downstream applications, such as image editing. Despite being an ill-posed problem, traditional methods have been trying to solve it for decades. The emergence of deep learning has revolutionized the field of image matting and given birth to multiple new techniques, including automatic, interactive, and referring image matting. This paper presents a comprehensive review of recent advancements in image matting in the era of deep learning. We focus on two fundamental sub-tasks: auxiliary input-based image matting, which involves user-defined input to predict the alpha matte, and automatic image matting, which generates results without any manual intervention. We systematically review the existing methods for these two tasks according to their task settings and network structures and provide a summary of their advantages and disadvantages. Furthermore, we introduce the commonly used image matting datasets and evaluate the performance of representative matting methods both quantitatively and qualitatively. Finally, we discuss relevant applications of image matting and highlight existing challenges and potential opportunities for future research. We also maintain a public repository to track the rapid development of deep image matting at https://github.com/JizhiziLi/matting-survey

    A Unified Query-based Paradigm for Point Cloud Understanding

    Get PDF

    A Novel Solution of Using Mixed Reality in Bowel and Oral and Maxillofacial Surgical Telepresence: 3D Mean Value Cloning algorithm

    Full text link
    Background and aim: Most of the Mixed Reality models used in the surgical telepresence are suffering from discrepancies in the boundary area and spatial-temporal inconsistency due to the illumination variation in the video frames. The aim behind this work is to propose a new solution that helps produce the composite video by merging the augmented video of the surgery site and the virtual hand of the remote expertise surgeon. The purpose of the proposed solution is to decrease the processing time and enhance the accuracy of merged video by decreasing the overlay and visualization error and removing occlusion and artefacts. Methodology: The proposed system enhanced the mean value cloning algorithm that helps to maintain the spatial-temporal consistency of the final composite video. The enhanced algorithm includes the 3D mean value coordinates and improvised mean value interpolant in the image cloning process, which helps to reduce the sawtooth, smudging and discolouration artefacts around the blending region. Results: As compared to the state of the art solution, the accuracy in terms of overlay error of the proposed solution is improved from 1.01mm to 0.80mm whereas the accuracy in terms of visualization error is improved from 98.8% to 99.4%. The processing time is reduced to 0.173 seconds from 0.211 seconds. Conclusion: Our solution helps make the object of interest consistent with the light intensity of the target image by adding the space distance that helps maintain the spatial consistency in the final merged video.Comment: 27 page

    How automated image analysis techniques help scientists in species identification and classification?

    Get PDF
    Identification of taxonomy at a specific level is time consuming and reliant upon expert ecologists. Hence the demand for automated species identification incre­ased over the last two decades. Automation of data classification is primarily focussed on images while incorporating and analysing image data has recently become easier due to developments in computational technology. Research ef­forts on identification of species include specimens’ image processing, extraction of identical features, followed by classifying them into correct categories. In this paper, we discuss recent automated species identification systems, mainly for categorising and evaluating their methods. We reviewed and compared different methods in step by step scheme of automated identification and classification systems of species images. The selection of methods is influenced by many variables such as level of classification, number of training data and complexity of images. The aim of writing this paper is to provide researchers and scientists an extensive background study on work related to automated species identification, focusing on pattern recognition techniques in building such systems for biodiversity studies. (Folia Morphol 2018; 77, 2: 179–193

    Efficient Deep Networks for Image Matting

    Get PDF
    Image matting is a fundamental technology serving downstream image editing tasks such as composition and harmonization. Given an image, its goal is to predict an accu- rate alpha matte with minimum manual e orts. Since matting applications are usually on PC or mobile devices, a high standard for e cient computation and storage is set. Thus, lightweight and e cient models are in demand. However, it is non-trivial to bal- ance the computation and the performance. We therefore investigate e cient model designs for image matting. We rst look into the common encoder-decoder architecture with a lightweight backbone and explore the skipped information and downsampling- upsampling operations, from which we notice the importance of indices kept in the encoder and recovered in the decoder. Based on the observations, we design data- dependant downsampling and upsampling operators conditioned on features from the encoder, which learn to index and show signi cant improvement against the baseline model while promising a lightweight structure. Then, considering a nity is widely used in both traditional and deep matting methods, we propose upsampling operators conditioned on the second-order a nity information, termed a nity-aware upsampling. Instead of modeling a nity in an additional module, we include it in the unavoidable upsampling stages for a compact architecture. Through implementing the operator by a low-rank bilinear model, we achieve signi cantly better results with only neglectable parameter increases. Further, we explore the robustness of matting algorithms and raise a more generalizable method. It includes designing a new framework assembling mul- tilevel context information and studying strong data augmentation strategies targeting matting. This method shows signi cantly higher robustness to various benchmarks, real-world images, and coarse-to- ne trimap precision compared with other methods while using less computation. Besides studying trimap-based image matting, we extend our lightweight matting architecture to portrait matting. Targeting portrait images, we propose a multi-task parameter sharing framework, where trimap generation and matting are treated as parallel tasks and help optimize each other. Compared with the conventional cascaded architecture, this design not only reduces the model capacity to a large margin but also presents more precise predictions.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
    corecore