57 research outputs found

    Deep Image Matting: A Comprehensive Survey

    Full text link
    Image matting refers to extracting precise alpha matte from natural images, and it plays a critical role in various downstream applications, such as image editing. Despite being an ill-posed problem, traditional methods have been trying to solve it for decades. The emergence of deep learning has revolutionized the field of image matting and given birth to multiple new techniques, including automatic, interactive, and referring image matting. This paper presents a comprehensive review of recent advancements in image matting in the era of deep learning. We focus on two fundamental sub-tasks: auxiliary input-based image matting, which involves user-defined input to predict the alpha matte, and automatic image matting, which generates results without any manual intervention. We systematically review the existing methods for these two tasks according to their task settings and network structures and provide a summary of their advantages and disadvantages. Furthermore, we introduce the commonly used image matting datasets and evaluate the performance of representative matting methods both quantitatively and qualitatively. Finally, we discuss relevant applications of image matting and highlight existing challenges and potential opportunities for future research. We also maintain a public repository to track the rapid development of deep image matting at https://github.com/JizhiziLi/matting-survey

    User-assisted intrinsic images

    Get PDF
    For many computational photography applications, the lighting and materials in the scene are critical pieces of information. We seek to obtain intrinsic images, which decompose a photo into the product of an illumination component that represents lighting effects and a reflectance component that is the color of the observed material. This is an under-constrained problem and automatic methods are challenged by complex natural images. We describe a new approach that enables users to guide an optimization with simple indications such as regions of constant reflectance or illumination. Based on a simple assumption on local reflectance distributions, we derive a new propagation energy that enables a closed form solution using linear least-squares. We achieve fast performance by introducing a novel downsampling that preserves local color distributions. We demonstrate intrinsic image decomposition on a variety of images and show applications.National Science Foundation (U.S.) (NSF CAREER award 0447561)Institut national de recherche en informatique et en automatique (France) (Associate Research Team β€œFlexible Rendering”)Microsoft Research (New Faculty Fellowship)Alfred P. Sloan Foundation (Research Fellowship)Quanta Computer, Inc. (MIT-Quanta T Party

    ImageSpirit: Verbal Guided Image Parsing

    Get PDF
    Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixel. In this paper we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interests enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g. smart phones, Google Glass, living room devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the tradeoffs compared to traditional mouse based interactions, results are reported for both a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit

    Perceptually inspired image estimation and enhancement

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2009.Includes bibliographical references (p. 137-144).In this thesis, we present three image estimation and enhancement algorithms inspired by human vision. In the first part of the thesis, we propose an algorithm for mapping one image to another based on the statistics of a training set. Many vision problems can be cast as image mapping problems, such as, estimating reflectance from luminance, estimating shape from shading, separating signal and noise, etc. Such problems are typically under-constrained, and yet humans are remarkably good at solving them. Classic computational theories about the ability of the human visual system to solve such under-constrained problems attribute this feat to the use of some intuitive regularities of the world, e.g., surfaces tend to be piecewise constant. In recent years, there has been considerable interest in deriving more sophisticated statistical constraints from natural images, but because of the high-dimensional nature of images, representing and utilizing the learned models remains a challenge. Our techniques produce models that are very easy to store and to query. We show these techniques to be effective for a number of applications: removing noise from images, estimating a sharp image from a blurry one, decomposing an image into reflectance and illumination, and interpreting lightness illusions. In the second part of the thesis, we present an algorithm for compressing the dynamic range of an image while retaining important visual detail. The human visual system confronts a serious challenge with dynamic range, in that the physical world has an extremely high dynamic range, while neurons have low dynamic ranges.(cont.) The human visual system performs dynamic range compression by applying automatic gain control, in both the retina and the visual cortex. Taking inspiration from that, we designed techniques that involve multi-scale subband transforms and smooth gain control on subband coefficients, and resemble the contrast gain control mechanism in the visual cortex. We show our techniques to be successful in producing dynamic-range-compressed images without compromising the visibility of detail or introducing artifacts. We also show that the techniques can be adapted for the related problem of "companding", in which a high dynamic range image is converted to a low dynamic range image and saved using fewer bits, and later expanded back to high dynamic range with minimal loss of visual quality. In the third part of the thesis, we propose a technique that enables a user to easily localize image and video editing by drawing a small number of rough scribbles. Image segmentation, usually treated as an unsupervised clustering problem, is extremely difficult to solve. With a minimal degree of user supervision, however, we are able to generate selection masks with good quality. Our technique learns a classifier using the user-scribbled pixels as training examples, and uses the classifier to classify the rest of the pixels into distinct classes. It then uses the classification results as per-pixel data terms, combines them with a smoothness term that respects color discontinuities, and generates better results than state-of-art algorithms for interactive segmentation.by Yuanzhen Li.Ph.D

    κ°•μΈν•œ λŒ€ν™”ν˜• μ˜μƒ λΆ„ν•  μ•Œκ³ λ¦¬μ¦˜μ„ μœ„ν•œ μ‹œλ“œ 정보 ν™•μž₯ 기법에 λŒ€ν•œ 연ꡬ

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2021. 2. 이경무.Segmentation of an area corresponding to a desired object in an image is essential to computer vision problems. This is because most algorithms are performed in semantic units when interpreting or analyzing images. However, segmenting the desired object from a given image is an ambiguous issue. The target object varies depending on user and purpose. To solve this problem, an interactive segmentation technique has been proposed. In this approach, segmentation was performed in the desired direction according to interaction with the user. In this case, seed information provided by the user plays an important role. If the seed provided by a user contain abundant information, the accuracy of segmentation increases. However, providing rich seed information places much burden on the users. Therefore, the main goal of the present study was to obtain satisfactory segmentation results using simple seed information. We primarily focused on converting the provided sparse seed information to a rich state so that accurate segmentation results can be derived. To this end, a minimum user input was taken and enriched it through various seed enrichment techniques. A total of three interactive segmentation techniques was proposed based on: (1) Seed Expansion, (2) Seed Generation, (3) Seed Attention. Our seed enriching type comprised expansion of area around a seed, generation of new seed in a new position, and attention to semantic information. First, in seed expansion, we expanded the scope of the seed. We integrated reliable pixels around the initial seed into the seed set through an expansion step composed of two stages. Through the extended seed covering a wider area than the initial seed, the seed's scarcity and imbalance problems was resolved. Next, in seed generation, we created a seed at a new point, but not around the seed. We trained the system by imitating the user behavior through providing a new seed point in the erroneous region. By learning the user's intention, our model could e ciently create a new seed point. The generated seed helped segmentation and could be used as additional information for weakly supervised learning. Finally, through seed attention, we put semantic information in the seed. Unlike the previous models, we integrated both the segmentation process and seed enrichment process. We reinforced the seed information by adding semantic information to the seed instead of spatial expansion. The seed information was enriched through mutual attention with feature maps generated during the segmentation process. The proposed models show superiority compared to the existing techniques through various experiments. To note, even with sparse seed information, our proposed seed enrichment technique gave by far more accurate segmentation results than the other existing methods.μ˜μƒμ—μ„œ μ›ν•˜λŠ” 물체 μ˜μ—­μ„ μž˜λΌλ‚΄λŠ” 것은 컴퓨터 λΉ„μ „ λ¬Έμ œμ—μ„œ ν•„μˆ˜μ μΈ μš”μ†Œμ΄λ‹€. μ˜μƒμ„ ν•΄μ„ν•˜κ±°λ‚˜ 뢄석할 λ•Œ, λŒ€λΆ€λΆ„μ˜ μ•Œκ³ λ¦¬μ¦˜λ“€μ΄ 의미둠적인 λ‹¨μœ„ 기반으둜 λ™μž‘ν•˜κΈ° λ•Œλ¬Έμ΄λ‹€. κ·ΈλŸ¬λ‚˜ μ˜μƒμ—μ„œ 물체 μ˜μ—­μ„ λΆ„ν• ν•˜λŠ” 것은 λͺ¨ν˜Έν•œ λ¬Έμ œμ΄λ‹€. μ‚¬μš©μžμ™€ λͺ©μ μ— 따라 μ›ν•˜λŠ” 물체 μ˜μ—­μ΄ 달라지기 λ•Œλ¬Έμ΄λ‹€. 이λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ μ‚¬μš©μžμ™€μ˜ ꡐλ₯˜λ₯Ό 톡해 μ›ν•˜λŠ” λ°©ν–₯으둜 μ˜μƒ 뢄할을 μ§„ν–‰ν•˜λŠ” λŒ€ν™”ν˜• μ˜μƒ λΆ„ν•  기법이 μ‚¬μš©λœλ‹€. μ—¬κΈ°μ„œ μ‚¬μš©μžκ°€ μ œκ³΅ν•˜λŠ” μ‹œλ“œ 정보가 μ€‘μš”ν•œ 역할을 ν•œλ‹€. μ‚¬μš©μžμ˜ μ˜λ„λ₯Ό λ‹΄κ³  μžˆλŠ” μ‹œλ“œ 정보가 μ •ν™•ν• μˆ˜λ‘ μ˜μƒ λΆ„ν• μ˜ 정확도도 μ¦κ°€ν•˜κ²Œ λœλ‹€. κ·ΈλŸ¬λ‚˜ ν’λΆ€ν•œ μ‹œλ“œ 정보λ₯Ό μ œκ³΅ν•˜λŠ” 것은 μ‚¬μš©μžμ—κ²Œ λ§Žμ€ 뢀담을 주게 λœλ‹€. κ·ΈλŸ¬λ―€λ‘œ κ°„λ‹¨ν•œ μ‹œλ“œ 정보λ₯Ό μ‚¬μš©ν•˜μ—¬ λ§Œμ‘±ν• λ§Œν•œ λΆ„ν•  κ²°κ³Όλ₯Ό μ–»λŠ” 것이 μ£Όμš” λͺ©μ μ΄ λœλ‹€. μš°λ¦¬λŠ” 제곡된 ν¬μ†Œν•œ μ‹œλ“œ 정보λ₯Ό λ³€ν™˜ν•˜λŠ” μž‘μ—…μ— μ΄ˆμ μ„ λ‘μ—ˆλ‹€. λ§Œμ•½ μ‹œλ“œ 정보가 ν’λΆ€ν•˜κ²Œ λ³€ν™˜λœλ‹€λ©΄ μ •ν™•ν•œ μ˜μƒ λΆ„ν•  κ²°κ³Όλ₯Ό 얻을 수 있기 λ•Œλ¬Έμ΄λ‹€. κ·ΈλŸ¬λ―€λ‘œ λ³Έ ν•™μœ„ λ…Όλ¬Έμ—μ„œλŠ” μ‹œλ“œ 정보λ₯Ό ν’λΆ€ν•˜κ²Œ ν•˜λŠ” 기법듀을 μ œμ•ˆν•œλ‹€. μ΅œμ†Œν•œμ˜ μ‚¬μš©μž μž…λ ₯을 κ°€μ •ν•˜κ³  이λ₯Ό λ‹€μ–‘ν•œ μ‹œλ“œ ν™•μž₯ 기법을 톡해 λ³€ν™˜ν•œλ‹€. μš°λ¦¬λŠ” μ‹œλ“œ ν™•λŒ€, μ‹œλ“œ 생성, μ‹œλ“œ 주의 집쀑에 κΈ°λ°˜ν•œ 총 μ„Έ κ°€μ§€μ˜ λŒ€ν™”ν˜• μ˜μƒ λΆ„ν•  기법을 μ œμ•ˆν•œλ‹€. 각각 μ‹œλ“œ μ£Όλ³€μœΌλ‘œμ˜ μ˜μ—­ ν™•λŒ€, μƒˆλ‘œμš΄ 지점에 μ‹œλ“œ 생성, 의미둠적 정보에 μ£Όλͺ©ν•˜λŠ” ν˜•νƒœμ˜ μ‹œλ“œ ν™•μž₯ 기법을 μ‚¬μš©ν•œλ‹€. λ¨Όμ € μ‹œλ“œ ν™•λŒ€μ— κΈ°λ°˜ν•œ κΈ°λ²•μ—μ„œ μš°λ¦¬λŠ” μ‹œλ“œμ˜ μ˜μ—­ ν™•μž₯을 λͺ©ν‘œλ‘œ ν•œλ‹€. 두 λ‹¨κ³„λ‘œ κ΅¬μ„±λœ ν™•λŒ€ 과정을 톡해 처음 μ‹œλ“œ μ£Όλ³€μ˜ λΉ„μŠ·ν•œ 픽셀듀을 μ‹œλ“œ μ˜μ—­μœΌλ‘œ νŽΈμž…ν•œλ‹€. μ΄λ ‡κ²Œ ν™•μž₯된 μ‹œλ“œλ₯Ό μ‚¬μš©ν•¨μœΌλ‘œμ¨ μ‹œλ“œμ˜ ν¬μ†Œν•¨κ³Ό λΆˆκ· ν˜•μœΌλ‘œ μΈν•œ 문제λ₯Ό ν•΄κ²°ν•  수 μžˆλ‹€. λ‹€μŒμœΌλ‘œ μ‹œλ“œ 생성에 κΈ°λ°˜ν•œ κΈ°λ²•μ—μ„œ μš°λ¦¬λŠ” μ‹œλ“œ 주변이 μ•„λ‹Œ μƒˆλ‘œμš΄ 지점에 μ‹œλ“œλ₯Ό μƒμ„±ν•œλ‹€. μš°λ¦¬λŠ” μ˜€μ°¨κ°€ λ°œμƒν•œ μ˜μ—­μ— μ‚¬μš©μžκ°€ μƒˆλ‘œμš΄ μ‹œλ“œλ₯Ό μ œκ³΅ν•˜λŠ” λ™μž‘μ„ λͺ¨λ°©ν•˜μ—¬ μ‹œμŠ€ν…œμ„ ν•™μŠ΅ν•˜μ˜€λ‹€. μ‚¬μš©μžμ˜ μ˜λ„λ₯Ό ν•™μŠ΅ν•¨μœΌλ‘œμ¨ 효과적으둜 μ‹œλ“œλ₯Ό 생성할 수 μžˆλ‹€. μƒμ„±λœ μ‹œλ“œλŠ” μ˜μƒ λΆ„ν• μ˜ 정확도λ₯Ό 높일 뿐만 μ•„λ‹ˆλΌ μ•½μ§€λ„ν•™μŠ΅μ„ μœ„ν•œ λ°μ΄ν„°λ‘œμ¨ ν™œμš©λ  수 μžˆλ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ μ‹œλ“œ 주의 집쀑을 ν™œμš©ν•œ κΈ°λ²•μ—μ„œ μš°λ¦¬λŠ” 의미둠적 정보λ₯Ό μ‹œλ“œμ— λ‹΄λŠ”λ‹€. 기쑴에 μ œμ•ˆν•œ 기법듀과 달리 μ˜μƒ λΆ„ν•  λ™μž‘κ³Ό μ‹œλ“œ ν™•μž₯ λ™μž‘μ΄ ν†΅ν•©λœ λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. μ‹œλ“œ μ •λ³΄λŠ” μ˜μƒ λΆ„ν•  λ„€νŠΈμ›Œν¬μ˜ νŠΉμ§•λ§΅κ³Ό μƒν˜Έ ꡐλ₯˜ν•˜λ©° κ·Έ 정보가 풍뢀해진닀. μ œμ•ˆν•œ λͺ¨λΈλ“€μ€ λ‹€μ–‘ν•œ μ‹€ν—˜μ„ 톡해 κΈ°μ‘΄ 기법 λŒ€λΉ„ μš°μˆ˜ν•œ μ„±λŠ₯을 κΈ°λ‘ν•˜μ˜€λ‹€. 특히 μ‹œλ“œκ°€ λΆ€μ‘±ν•œ μƒν™©μ—μ„œ μ‹œλ“œ ν™•μž₯ 기법듀은 ν›Œλ₯­ν•œ λŒ€ν™”ν˜• μ˜μƒ λΆ„ν•  μ„±λŠ₯을 λ³΄μ˜€λ‹€.1 Introduction 1 1.1 Previous Works 2 1.2 Proposed Methods 4 2 Interactive Segmentation with Seed Expansion 9 2.1 Introduction 9 2.2 Proposed Method 12 2.2.1 Background 13 2.2.2 Pyramidal RWR 16 2.2.3 Seed Expansion 19 2.2.4 Re nement with Global Information 24 2.3 Experiments 27 2.3.1 Dataset 27 2.3.2 Implement Details 28 2.3.3 Performance 29 2.3.4 Contribution of Each Part 30 2.3.5 Seed Consistency 31 2.3.6 Running Time 33 2.4 Summary 34 3 Interactive Segmentation with Seed Generation 37 3.1 Introduction 37 3.2 Related Works 40 3.3 Proposed Method 41 3.3.1 System Overview 41 3.3.2 Markov Decision Process 42 3.3.3 Deep Q-Network 46 3.3.4 Model Architecture 47 3.4 Experiments 48 3.4.1 Implement Details 48 3.4.2 Performance 49 3.4.3 Ablation Study 53 3.4.4 Other Datasets 55 3.5 Summary 58 4 Interactive Segmentation with Seed Attention 61 4.1 Introduction 61 4.2 Related Works 64 4.3 Proposed Method 65 4.3.1 Interactive Segmentation Network 65 4.3.2 Bi-directional Seed Attention Module 67 4.4 Experiments 70 4.4.1 Datasets 70 4.4.2 Metrics 70 4.4.3 Implement Details 71 4.4.4 Performance 71 4.4.5 Ablation Study 76 4.4.6 Seed enrichment methods 79 4.5 Summary 82 5 Conclusions 87 5.1 Summary 89 Bibliography 90 ꡭ문초둝 103Docto

    ROAM: a Rich Object Appearance Model with Application to Rotoscoping

    Get PDF
    Rotoscoping, the detailed delineation of scene elements through a video shot, is a painstaking task of tremendous importance in professional post-production pipelines. While pixel-wise segmentation techniques can help for this task, professional rotoscoping tools rely on parametric curves that offer the artists a much better interactive control on the definition, editing and manipulation of the segments of interest. Sticking to this prevalent rotoscoping paradigm, we propose a novel framework to capture and track the visual aspect of an arbitrary object in a scene, given a first closed outline of this object. This model combines a collection of local foreground/background appearance models spread along the outline, a global appearance model of the enclosed object and a set of distinctive foreground landmarks. The structure of this rich appearance model allows simple initialization, efficient iterative optimization with exact minimization at each step, and on-line adaptation in videos. We demonstrate qualitatively and quantitatively the merit of this framework through comparisons with tools based on either dynamic segmentation with a closed curve or pixel-wise binary labelling

    Towards Generalizable Deep Image Matting: Decomposition, Interaction, and Merging

    Get PDF
    Image matting refers to extracting the precise alpha mattes from images, playing a critical role in many downstream applications. Despite extensive attention, key challenges persist and motivate the research presented in this thesis. One major challenge is the reliance of auxiliary inputs in previous methods, hindering real-time practicality. To address this, we introduce fully automatic image matting by decomposing the task into high-level semantic segmentation and low-level details matting. We then incorporate plug-in modules to enhance the interaction between the sub-tasks through feature integration. Furthermore, we propose an attention-based mechanism to guide the matting process through collaboration merging. Another challenge lies in limited matting datasets, resulting in reliance on composite images and inferior performance on images in the wild. In response, our research proposes a composition route to mitigate the discrepancies and result in remarkable generalization ability. Additionally, we construct numerous large datasets of high-quality real-world images with manually labeled alpha mattes, providing a solid foundation for training and evaluation. Moreover, our research uncovers new observations that warrant further investigation. Firstly, we systematically analyze and address privacy issues that have been neglected in previous portrait matting research. Secondly, we explore the adaptation of automatic matting methods to non-salient or transparent categories beyond salient ones. Furthermore, we collaborate with language modality to achieve a more controllable matting process, enabling specific target selection at a low cost. To validate our studies, we conduct extensive experiments and provide all codes and datasets through the link (https://github.com/JizhiziLi/). We believe that the analyses, methods, and datasets presented in this thesis will offer valuable insights for future research endeavors in the field of image matting
    • …
    corecore