17 research outputs found

    Patch-based graphical models for image restoration

    Get PDF

    Adaptive Representations for Image Restoration

    Get PDF
    In the �eld of image processing, building good representation models for natural images is crucial for various applications, such as image restora- tion, sampling, segmentation, etc. Adaptive image representation models are designed for describing the intrinsic structures of natural images. In the classical Bayesian inference, this representation is often known as the prior of the intensity distribution of the input image. Early image priors have forms such as total variation norm, Markov Random Fields (MRF), and wavelets. Recently, image priors obtained from machine learning tech- niques tend to be more adaptive, which aims at capturing the natural image models via learning from larger databases. In this thesis, we study adaptive representations of natural images for image restoration. The purpose of image restoration is to remove the artifacts which degrade an image. The degradation comes in many forms such as image blurs, noises, and artifacts from the codec. Take image denoising for an example. There are several classic representation methods which can generate state- of-the-art results. The �rst one is the assumption of image self-similarity. However, this representation has the issue that sometimes the self-similarity assumption would fail because of high noise levels or unique image contents. The second one is the wavelet based nonlocal representation, which also has a problem in that the �xed basis function is not adaptive enough for any arbitrary type of input images. The third is the sparse coding using over- complete dictionaries, which does not have the hierarchical structure that is similar to the one in human visual system and is therefore prone to denoising artifacts. My research started from image denoising. Through the thorough review and evaluation of state-of-the-art denoising methods, it was found that the representation of images is substantially important for the denoising tech- nique. At the same time, an improvement on one of the nonlocal denoising method was proposed, which improves the representation of images by the integration of Gaussian blur, clustering and Rotationally Invariant Block Matching. Enlightened by the successful application of sparse coding in compressive sensing, we exploited the image self-similarity by using a sparse representation based on wavelet coe�cients in a nonlocal and hierarchical way, which generates competitive results compared to the state-of-the-art denoising algorithms. Meanwhile, another adaptive local �lter learned by Genetic Programming (GP) was proposed for e�cient image denoising. In this work, we employed GP to �nd the optimal representations for local im- age patches through training on massive datasets, which yields competitive results compared to state-of-the-art local denoising �lters. After success- fully dealt with the denoising part, we moved to the parameter estimation for image degradation models. For instance, image blur identi�cation uses deep learning, which has recently been proposed as a popular image repre- sentation approach. This work has also been extended to blur estimation based on the fact that the second step of the framework has been replaced with general regression neural network. In a word, in this thesis, spatial cor- relations, sparse coding, genetic programming, deep learning are explored as adaptive image representation models for both image restoration and parameter estimation. We conclude this thesis by considering methods based on machine learning to be the best adaptive representations for natural images. We have shown that they can generate better results than conventional representation mod- els for the tasks of image denoising and deblurring

    Learning to Enhance RGB and Depth Images with Guidance

    Get PDF
    Image enhancement improves the visual quality of the input image to better identify key features and make it more suitable for other vision applications. Structure degradation remains a challenging problem in image enhancement, which refers to blurry edges or discontinuous structures due to unbalanced or inconsistent intensity transitions on structural regions. To overcome this issue, it is popular to make use of a guidance image to provide additional structural cues. In this thesis, we focus on two image enhancement tasks, i.e., RGB image smoothing and depth image completion. Through the two research problems, we aim to have a better understanding of what constitutes suitable guidance and how its proper use can benefit the reduction of structure degradation in image enhancement. Image smoothing retains salient structures and removes insignificant textures in an image. Structure degradation results from the difficulty in distinguishing structures and textures with low-level cues. Structures may be inevitably blurred if the filter tries to remove some strong textures that have high contrast. Moreover, these strong textures may also be mistakenly retained as structures. We address this issue by applying two forms of guidance for structures and textures respectively. We first design a kernel-based double-guided filter (DGF), where we adopt semantic edge detection as structure guidance, and texture decomposition as texture guidance. The DGF is the first kernel filter that simultaneously leverages structure guidance and texture guidance to be both ''structure-aware'' and ''texture-aware''. Considering that textures present high randomness and variations in spatial distribution and intensities, it is not robust to localize and identify textures with hand-crafted features. Hence, we take advantage of deep learning for richer feature extraction and better generalization. Specifically, we generate synthetic data by blending natural textures with clean structure-only images. With the data, we build a texture prediction network (TPN) that estimates the location and magnitude of textures. We then combine the texture prediction results from TPN with a semantic structure prediction network so that the final texture and structure aware filtering network (TSAFN) is able to distinguish structures and textures more effectively. Our model achieves superior smoothing results than existing filters. Depth completion recovers dense depth from sparse measurements, e.g., LiDAR. Existing depth-only methods use sparse depth as the only input and suffer from structure degradation, i.e., failing to recover semantically consistent boundaries or small/thin objects due to (1) the sparse nature of depth points and (2) the lack of images to provide structural cues. In the thesis, we deal with the structure degradation issue by using RGB image guidance in both supervised and unsupervised depth-only settings. For the supervised model, the unique design is that it simultaneously outputs a reconstructed image and a dense depth map. Specifically, we treat image reconstruction from sparse depth as an auxiliary task during training that is supervised by the image. For the unsupervised model, we regard dense depth as a reconstructed result of the sparse input, and formulate our model as an auto-encoder. To reduce structure degradation, we employ the image to guide latent features by penalizing their difference in the training process. The image guidance loss in both models enables them to acquire more dense and structural cues that are beneficial for producing more accurate and consistent depth values. For inference, the two models only take sparse depth as input and no image is required. On the KITTI Depth Completion Benchmark, we validate the effectiveness of the proposed image guidance through extensive experiments and achieve competitive performance over state-of-the-art supervised and unsupervised methods. Our approach is also applicable to indoor scenes

    Textural Difference Enhancement based on Image Component Analysis

    Get PDF
    In this thesis, we propose a novel image enhancement method to magnify the textural differences in the images with respect to human visual characteristics. The method is intended to be a preprocessing step to improve the performance of the texture-based image segmentation algorithms. We propose to calculate the six Tamura's texture features (coarseness, contrast, directionality, line-likeness, regularity and roughness) in novel measurements. Each feature follows its original understanding of the certain texture characteristic, but is measured by some local low-level features, e.g., direction of the local edges, dynamic range of the local pixel intensities, kurtosis and skewness of the local image histogram. A discriminant texture feature selection method based on principal component analysis (PCA) is then proposed to find the most representative characteristics in describing textual differences in the image. We decompose the image into pairwise components representing the texture characteristics strongly and weakly, respectively. A set of wavelet-based soft thresholding methods are proposed as the dictionaries of morphological component analysis (MCA) to sparsely highlight the characteristics strongly and weakly from the image. The wavelet-based thresholding methods are proposed in pair, therefore each of the resulted pairwise components can exhibit one certain characteristic either strongly or weakly. We propose various wavelet-based manipulation methods to enhance the components separately. For each component representing a certain texture characteristic, a non-linear function is proposed to manipulate the wavelet coefficients of the component so that the component is enhanced with the corresponding characteristic accentuated independently while having little effect on other characteristics. Furthermore, the above three methods are combined into a uniform framework of image enhancement. Firstly, the texture characteristics differentiating different textures in the image are found. Secondly, the image is decomposed into components exhibiting these texture characteristics respectively. Thirdly, each component is manipulated to accentuate the corresponding texture characteristics exhibited there. After re-combining these manipulated components, the image is enhanced with the textural differences magnified with respect to the selected texture characteristics. The proposed textural differences enhancement method is used prior to both grayscale and colour image segmentation algorithms. The convincing results of improving the performance of different segmentation algorithms prove the potential of the proposed textural difference enhancement method

    Textural Difference Enhancement based on Image Component Analysis

    Get PDF
    In this thesis, we propose a novel image enhancement method to magnify the textural differences in the images with respect to human visual characteristics. The method is intended to be a preprocessing step to improve the performance of the texture-based image segmentation algorithms. We propose to calculate the six Tamura's texture features (coarseness, contrast, directionality, line-likeness, regularity and roughness) in novel measurements. Each feature follows its original understanding of the certain texture characteristic, but is measured by some local low-level features, e.g., direction of the local edges, dynamic range of the local pixel intensities, kurtosis and skewness of the local image histogram. A discriminant texture feature selection method based on principal component analysis (PCA) is then proposed to find the most representative characteristics in describing textual differences in the image. We decompose the image into pairwise components representing the texture characteristics strongly and weakly, respectively. A set of wavelet-based soft thresholding methods are proposed as the dictionaries of morphological component analysis (MCA) to sparsely highlight the characteristics strongly and weakly from the image. The wavelet-based thresholding methods are proposed in pair, therefore each of the resulted pairwise components can exhibit one certain characteristic either strongly or weakly. We propose various wavelet-based manipulation methods to enhance the components separately. For each component representing a certain texture characteristic, a non-linear function is proposed to manipulate the wavelet coefficients of the component so that the component is enhanced with the corresponding characteristic accentuated independently while having little effect on other characteristics. Furthermore, the above three methods are combined into a uniform framework of image enhancement. Firstly, the texture characteristics differentiating different textures in the image are found. Secondly, the image is decomposed into components exhibiting these texture characteristics respectively. Thirdly, each component is manipulated to accentuate the corresponding texture characteristics exhibited there. After re-combining these manipulated components, the image is enhanced with the textural differences magnified with respect to the selected texture characteristics. The proposed textural differences enhancement method is used prior to both grayscale and colour image segmentation algorithms. The convincing results of improving the performance of different segmentation algorithms prove the potential of the proposed textural difference enhancement method

    Stochastic Optimization For Multi-Agent Statistical Learning And Control

    Get PDF
    The goal of this thesis is to develop a mathematical framework for optimal, accurate, and affordable complexity statistical learning among networks of autonomous agents. We begin by noting the connection between statistical inference and stochastic programming, and consider extensions of this setup to settings in which a network of agents each observes a local data stream and would like to make decisions that are good with respect to information aggregated across the entire network. There is an open-ended degree of freedom in this problem formulation, however: the selection of the estimator function class which defines the feasible set of the stochastic program. Our central contribution is the design of stochastic optimization tools in reproducing kernel Hilbert spaces that yield optimal, accurate, and affordable complexity statistical learning for a multi-agent network. To obtain this result, we first explore the relative merits and drawbacks of different function class selections. In Part I, we consider multi-agent expected risk minimization this problem setting for the case that each agent seems to learn a common globally optimal generalized linear models (GLMs) by developing a stochastic variant of Arrow-Hurwicz primal-dual method. We establish convergence to the primal-dual optimal pair when either consensus or ``proximity constraints encode the fact that we want all agents\u27 to agree, or nearby agents to make decisions that are close to one another. Empirically, we observe that these convergence results are substantiated but that convergence may not translate into statistical accuracy. More broadly, optimality within a given estimator function class is not the same as one that makes minimal inference errors. The optimality-accuracy tradeoff of GLMs motivates subsequent efforts to learn more sophisticated estimators based upon learned feature encodings of the data that is fed into the statistical model. The specific tool we turn to in Part II is dictionary learning, where we optimize both over regression weights and an encoding of the data, which yields a non-convex problem. We investigate the use of stochastic methods for online task-driven dictionary learning, and obtain promising performance for the task of a ground robot learning to anticipate control uncertainty based on its past experience. Heartened by this implementation, we then consider extensions of this framework for a multi-agent network to each learn globally optimal task-driven dictionaries based on stochastic primal-dual methods. However, it is here the non-convexity of the optimization problem causes problems: stringent conditions on stochastic errors and the duality gap limit the applicability of the convergence guarantees, and impractically small learning rates are required for convergence in practice. Thus, we seek to learn nonlinear statistical models while preserving convexity, which is possible through kernel methods ( Part III). However, the increased descriptive power of nonparametric estimation comes at the cost of infinite complexity. Thus, we develop a stochastic approximation algorithm in reproducing kernel Hilbert spaces (RKHS) that ameliorates this complexity issue while preserving optimality: we combine the functional generalization of stochastic gradient method (FSGD) with greedily constructed low-dimensional subspace projections based on matching pursuit. We establish that the proposed method yields a controllable trade-off between optimality and memory, and yields highly accurate parsimonious statistical models in practice. % Then, we develop a multi-agent extension of this method by proposing a new node-separable penalty function and applying FSGD together with low-dimensional subspace projections. This extension allows a network of autonomous agents to learn a memory-efficient approximation to the globally optimal regression function based only on their local data stream and message passing with neighbors. In practice, we observe agents are able to stably learn highly accurate and memory-efficient nonlinear statistical models from streaming data. From here, we shift focus to a more challenging class of problems, motivated by the fact that true learning is not just revising predictions based upon data but augmenting behavior over time based on temporal incentives. This goal may be described by Markov Decision Processes (MDPs): at each point, an agent is in some state of the world, takes an action and then receives a reward while randomly transitioning to a new state. The goal of the agent is to select the action sequence to maximize its long-term sum of rewards, but determining how to select this action sequence when both the state and action spaces are infinite has eluded researchers for decades. As a precursor to this feat, we consider the problem of policy evaluation in infinite MDPs, in which we seek to determine the long-term sum of rewards when starting in a given state when actions are chosen according to a fixed distribution called a policy. We reformulate this problem as a RKHS-valued compositional stochastic program and we develop a functional extension of stochastic quasi-gradient algorithm operating in tandem with the greedy subspace projections mentioned above. We prove convergence with probability 1 to the Bellman fixed point restricted to this function class, and we observe a state of the art trade off in memory versus Bellman error for the proposed method on the Mountain Car driving task, which bodes well for incorporating policy evaluation into more sophisticated, provably stable reinforcement learning techniques, and in time, developing optimal collaborative multi-agent learning-based control systems

    Structure-aware image denoising, super-resolution, and enhancement methods

    Get PDF
    Denoising, super-resolution and structure enhancement are classical image processing applications. The motive behind their existence is to aid our visual analysis of raw digital images. Despite tremendous progress in these fields, certain difficult problems are still open to research. For example, denoising and super-resolution techniques which possess all the following properties, are very scarce: They must preserve critical structures like corners, should be robust to the type of noise distribution, avoid undesirable artefacts, and also be fast. The area of structure enhancement also has an unresolved issue: Very little efforts have been put into designing models that can tackle anisotropic deformations in the image acquisition process. In this thesis, we design novel methods in the form of partial differential equations, patch-based approaches and variational models to overcome the aforementioned obstacles. In most cases, our methods outperform the existing approaches in both quality and speed, despite being applicable to a broader range of practical situations.Entrauschen, Superresolution und Strukturverbesserung sind klassische Anwendungen der Bildverarbeitung. Ihre Existenz bedingt sich in dem Bestreben, die visuelle Begutachtung digitaler Bildrohdaten zu unterstützen. Trotz erheblicher Fortschritte in diesen Feldern bedürfen bestimmte schwierige Probleme noch weiterer Forschung. So sind beispielsweise Entrauschungsund Superresolutionsverfahren, welche alle der folgenden Eingenschaften besitzen, sehr selten: die Erhaltung wichtiger Strukturen wie Ecken, Robustheit bezüglich der Rauschverteilung, Vermeidung unerwünschter Artefakte und niedrige Laufzeit. Auch im Gebiet der Strukturverbesserung liegt ein ungelöstes Problem vor: Bisher wurde nur sehr wenig Forschungsaufwand in die Entwicklung von Modellen investieret, welche anisotrope Deformationen in bildgebenden Verfahren bewältigen können. In dieser Arbeit entwerfen wir neue Methoden in Form von partiellen Differentialgleichungen, patch-basierten Ansätzen und Variationsmodellen um die oben erwähnten Hindernisse zu überwinden. In den meisten Fällen übertreffen unsere Methoden nicht nur qualitativ die bisher verwendeten Ansätze, sondern lösen die gestellten Aufgaben auch schneller. Zudem decken wir mit unseren Modellen einen breiteren Bereich praktischer Fragestellungen ab

    Multimedia Forensics

    Get PDF
    This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field

    Multimedia Forensics

    Get PDF
    This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered
    corecore