74 research outputs found

    Weakly- and Self-Supervised Learning for Content-Aware Deep Image Retargeting

    Full text link
    This paper proposes a weakly- and self-supervised deep convolutional neural network (WSSDCNN) for content-aware image retargeting. Our network takes a source image and a target aspect ratio, and then directly outputs a retargeted image. Retargeting is performed through a shift map, which is a pixel-wise mapping from the source to the target grid. Our method implicitly learns an attention map, which leads to a content-aware shift map for image retargeting. As a result, discriminative parts in an image are preserved, while background regions are adjusted seamlessly. In the training phase, pairs of an image and its image-level annotation are used to compute content and structure losses. We demonstrate the effectiveness of our proposed method for a retargeting application with insightful analyses.Comment: 10 pages, 11 figures. To appear in ICCV 2017, Spotlight Presentatio

    Graph Spectral Image Processing

    Full text link
    Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this article, we overview recent graph spectral techniques in GSP specifically for image / video processing. The topics covered include image compression, image restoration, image filtering and image segmentation

    Um ambiente para desevonvoimento de algoritmos de amostragem e remoção de ruído

    Get PDF
    In the context of Monte Carlo rendering, although many sampling and denoising techniques have been proposed in the last few years, the case for which one should be used for a specific scene is still to be made. Moreover, developing a new technique has required selecting a particular rendering system, which makes the technique tightly coupled to the chosen renderer and limits the amount of scenes it can be tested on. In this work, we propose a renderer-agnostic framework for developing and benchmarking sampling and denoising techniques for Monte Carlo rendering. It decouples techniques from rendering systems by hiding the renderer details behind a general API. This improves productivity and allows for direct comparisons among techniques using scenes from different rendering systems. The proposed framework contains two main parts: a software development kit that helps users to develop and and test their techniques locally, and an online system that allows users to submit their techniques and have them automatically benchmarked on our servers. We demonstrate its effectiveness by using our API to instrument four rendering systems and a variety of Monte Carlo denoising techniques — including recent learning-based ones — and performing a benchmark across different rendering systems.No contexto de Monte Carlo rendering, apesar de diversas técnicas de amostragem e remoção de ruído tenham sido propostas nos últimos anos, aportar qual técnica deve ser usada para uma cena específica ainda é uma tarefa difícil. Além disso, desenvolver uma nova técnica requer escolher um renderizador em particular, o que torna a técnica dependente do renderizador escolhido e limita a quantidade de cenas disponíveis para testar a técnica. Neste trabalho, um framework para desenvolvimento e avaliação de técnicas de amostragem e remoção de ruído para Monte Carlo rendering é proposto. Ele permite desacoplar as técnicas dos renderizadores por meio de uma API genérica, promovendo a reprodutibilidade e permitindo comparações entre técnicas utilizando-se cenas de diferentes renderizadores. O sistema proposto contém duas partes principais: um kit de desenvolvimento de software que ajuda os usuários a desenvolver e testar suas técnicas localmente, e um sistema online que permite que usuários submetam técnicas para que as mesmas sejam automaticamente avaliadas no nosso servidor. Para demonstramos a efetividade do ambiante proposto, modificamos quatro renderizadores e várias técnicas de remoção de ruído — incluindo técnicas recentes baseadas em aprendizado de máquina — e efetuamos uma avaliação utilizando cenas de diferentes renderizadores

    Underwater image restoration: super-resolution and deblurring via sparse representation and denoising by means of marine snow removal

    Get PDF
    Underwater imaging has been widely used as a tool in many fields, however, a major issue is the quality of the resulting images/videos. Due to the light's interaction with water and its constituents, the acquired underwater images/videos often suffer from a significant amount of scatter (blur, haze) and noise. In the light of these issues, this thesis considers problems of low-resolution, blurred and noisy underwater images and proposes several approaches to improve the quality of such images/video frames. Quantitative and qualitative experiments validate the success of proposed algorithms

    이동 물체 감지 및 분진 영상 복원의 연구

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 자연과학대학 수리과학부, 2021. 2. 강명주.Robust principal component analysis(RPCA), a method used to decom- pose a matrix into the sum of a low-rank matrix and a sparse matrix, has been proven effective in modeling the static background of videos. However, because a dynamic background cannot be represented by a low-rank matrix, measures additional to the RPCA are required. In this thesis, we propose masked RPCA to process backgrounds containing moving textures. First- order Marcov random field (MRF) is used to generate a mask that roughly labels moving objects and backgrounds. To estimate the background, the rank minimization process is then applied with the mask multiplied. During the iteration, the background rank increases as the object mask expands, and the weight of the rank constraint term decreases, which increases the accuracy of the background. We compared the proposed method with state- of-art, end-to-end methods to demonstrate its advantages. Subsequently, we suggest novel dedusting method based on dust-optimized transmission map and deep image prior. This method consists of estimating atmospheric light and transmission in that order, which is similar to dark channel prior-based dehazing methods. However, existing atmospheric light estimating methods widely used in dehazing schemes give an overly bright estimation, which results in unrealistically dark dedusting results. To ad- dress this problem, we propose a segmentation-based method that gives new estimation in atmospheric light. Dark channel prior based transmission map with new atmospheric light gives unnatural intensity ordering and zero value at low transmission regions. Therefore, the transmission map is refined by scattering model based transformation and dark channel adaptive non-local total variation (NLTV) regularization. Parameter optimizing steps with deep image prior(DIP) gives the final dedusting result.강건 주성분 분석은 배경 감산을 통한 동영상의 전경 추출의 방법으로 이 용되어왔으나, 동적배경은저계수행렬로표현될수없기때문에동적배경 감산에성능적한계를가지고있었다. 우리는전경과배경을구분하는일계마 르코프연쇄를도입해정적배경을나타내는항과곱하고이것을이용한새로 운형태의강건주성분분석을제안하여동적배경감산문제를해결한다. 해당 최소화문제는반복적인교차최적화를통하여해결한다. 이어서대기중의미세 먼지에의해오염된영상을복원한다. 영상분할과암흑채널가정에기반하여 깊이지도를구하고, 비국소총변동최소화를통하여정제한다. 이후깊은영상 가정에기반한영상생성기를통하여최종적으로복원된영상을구한다. 실험을 통하여제안된방법을다른방법들과비교하고질적인측면과양적인측면모 두에서우수함을확인한다.Abstract i 1 Introduction 1 1.1 Moving Object Detection In Dynamic Backgrounds 1 1.2 Image Dedusting 2 2 Preliminaries 4 2.1 Moving Object Detection In Dynamic Backgrounds 4 2.1.1 Literature review 5 2.1.2 Robust principal component analysis(RPCA) and their application status 7 2.1.3 Graph cuts and α-expansion algorithm 14 2.2 Image Dedusting 16 2.2.1 Image dehazing methods 16 2.2.2 Dust model 18 2.2.3 Non-local total variation(NLTV) 19 3 Dynamic Background Subtraction With Masked RPCA 21 3.1 Motivation 21 3.1.1 Motivation of background modeling 21 3.1.2 Mask formulation 23 3.1.3 Model 24 3.2 Optimization 25 3.2.1 L-Subproblem 25 3.2.2 L˜-Subproblem 26 3.2.3 M-Subproblem 27 3.2.4 p-Subproblem 28 3.2.5 Adaptive parameter control 28 3.2.6 Convergence 29 3.3 Experimental results 31 3.3.1 Benchmark Algorithms And Videos 31 3.3.2 Implementation 32 3.3.3 Evaluation 32 4 Deep Image Dedusting With Dust-Optimized Transmission Map 41 4.1 Transmission estimation 41 4.1.1 Atmospheric light estimation 41 4.1.2 Transmission estimation 43 4.2 Scene radiance recovery 47 4.3 Experimental results 51 4.3.1 Implementation 51 4.3.2 Evaluation 52 5 Conclusion 58 Abstract (in Korean) 69 Acknowledgement (in Korean) 70Docto

    Perceptually inspired image estimation and enhancement

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2009.Includes bibliographical references (p. 137-144).In this thesis, we present three image estimation and enhancement algorithms inspired by human vision. In the first part of the thesis, we propose an algorithm for mapping one image to another based on the statistics of a training set. Many vision problems can be cast as image mapping problems, such as, estimating reflectance from luminance, estimating shape from shading, separating signal and noise, etc. Such problems are typically under-constrained, and yet humans are remarkably good at solving them. Classic computational theories about the ability of the human visual system to solve such under-constrained problems attribute this feat to the use of some intuitive regularities of the world, e.g., surfaces tend to be piecewise constant. In recent years, there has been considerable interest in deriving more sophisticated statistical constraints from natural images, but because of the high-dimensional nature of images, representing and utilizing the learned models remains a challenge. Our techniques produce models that are very easy to store and to query. We show these techniques to be effective for a number of applications: removing noise from images, estimating a sharp image from a blurry one, decomposing an image into reflectance and illumination, and interpreting lightness illusions. In the second part of the thesis, we present an algorithm for compressing the dynamic range of an image while retaining important visual detail. The human visual system confronts a serious challenge with dynamic range, in that the physical world has an extremely high dynamic range, while neurons have low dynamic ranges.(cont.) The human visual system performs dynamic range compression by applying automatic gain control, in both the retina and the visual cortex. Taking inspiration from that, we designed techniques that involve multi-scale subband transforms and smooth gain control on subband coefficients, and resemble the contrast gain control mechanism in the visual cortex. We show our techniques to be successful in producing dynamic-range-compressed images without compromising the visibility of detail or introducing artifacts. We also show that the techniques can be adapted for the related problem of "companding", in which a high dynamic range image is converted to a low dynamic range image and saved using fewer bits, and later expanded back to high dynamic range with minimal loss of visual quality. In the third part of the thesis, we propose a technique that enables a user to easily localize image and video editing by drawing a small number of rough scribbles. Image segmentation, usually treated as an unsupervised clustering problem, is extremely difficult to solve. With a minimal degree of user supervision, however, we are able to generate selection masks with good quality. Our technique learns a classifier using the user-scribbled pixels as training examples, and uses the classifier to classify the rest of the pixels into distinct classes. It then uses the classification results as per-pixel data terms, combines them with a smoothness term that respects color discontinuities, and generates better results than state-of-art algorithms for interactive segmentation.by Yuanzhen Li.Ph.D

    Grounding semantics in robots for Visual Question Answering

    Get PDF
    In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning
    corecore