13 research outputs found

    Foveated Non-Local Means Denoising of Color Images, with Cross-Channel Paradigm.

    Get PDF
    Foveation, a peculiarity of the HVS, is characterized by a sharp image having maximal acuity at the central part of the retina, the fovea. The acuity rapidly decreases towards the periphery of the visual field. Foveated imaging was recently investigated for the purpose of image denoising in the Foveated Non-local Means (FNLM) algorithm, and it was shown that for natural images the foveated self-similarity is a far more effective regularization prior than the conventional windowed self-similarity. Color images exhibit spectral redundancy across the R, G and B channels which can be exploited to reduce the effects of noise. We extend the FNLM algorithm to the removal of additive white Gaussian noise from color images. The proposed Color-mixed Foveated NL-means algorithm, denominated as C-FNLM, implements the concept of foveated self-similarity, along with a cross-channel paradigm to exploit the correlation between color channels. The patch similarity is measured through an updated foveated distance for color images. In C-FNLM, we derive the explicit construction of an unified operator which explores the spatially variant nature of color perception in the HVS. We develop a framework for designing the linear operator that simultaneously performs foveation and color mixing. Within this framework, we construct several parametrized families of the color-mixing operation. Our analysis shows that the color-mixed foveation is a far more effective regularity assumption than the windowing conventionally used in NL-means, especially for color image denoising where substantial improvement was observed in terms of contrast and sharpness. Moreover, the unified operator is introduced at a negligible cost in terms of the computational complexity

    A Non-Local Structure Tensor Based Approach for Multicomponent Image Recovery Problems

    Full text link
    Non-Local Total Variation (NLTV) has emerged as a useful tool in variational methods for image recovery problems. In this paper, we extend the NLTV-based regularization to multicomponent images by taking advantage of the Structure Tensor (ST) resulting from the gradient of a multicomponent image. The proposed approach allows us to penalize the non-local variations, jointly for the different components, through various 1,p\ell_{1,p} matrix norms with p1p \ge 1. To facilitate the choice of the hyper-parameters, we adopt a constrained convex optimization approach in which we minimize the data fidelity term subject to a constraint involving the ST-NLTV regularization. The resulting convex optimization problem is solved with a novel epigraphical projection method. This formulation can be efficiently implemented thanks to the flexibility offered by recent primal-dual proximal algorithms. Experiments are carried out for multispectral and hyperspectral images. The results demonstrate the interest of introducing a non-local structure tensor regularization and show that the proposed approach leads to significant improvements in terms of convergence speed over current state-of-the-art methods

    A Comparison of Image Denoising Methods

    Full text link
    The advancement of imaging devices and countless images generated everyday pose an increasingly high demand on image denoising, which still remains a challenging task in terms of both effectiveness and efficiency. To improve denoising quality, numerous denoising techniques and approaches have been proposed in the past decades, including different transforms, regularization terms, algebraic representations and especially advanced deep neural network (DNN) architectures. Despite their sophistication, many methods may fail to achieve desirable results for simultaneous noise removal and fine detail preservation. In this paper, to investigate the applicability of existing denoising techniques, we compare a variety of denoising methods on both synthetic and real-world datasets for different applications. We also introduce a new dataset for benchmarking, and the evaluations are performed from four different perspectives including quantitative metrics, visual effects, human ratings and computational cost. Our experiments demonstrate: (i) the effectiveness and efficiency of representative traditional denoisers for various denoising tasks, (ii) a simple matrix-based algorithm may be able to produce similar results compared with its tensor counterparts, and (iii) the notable achievements of DNN models, which exhibit impressive generalization ability and show state-of-the-art performance on various datasets. In spite of the progress in recent years, we discuss shortcomings and possible extensions of existing techniques. Datasets, code and results are made publicly available and will be continuously updated at https://github.com/ZhaomingKong/Denoising-Comparison.Comment: In this paper, we intend to collect and compare various denoising methods to investigate their effectiveness, efficiency, applicability and generalization ability with both synthetic and real-world experiment

    Efficient and accurate stereo matching for cloth manipulation

    Get PDF
    Due to the recent development of robotic techniques, researching robots that can assist in everyday household tasks, especially robotic cloth manipulation has become popular in recent years. Stereo matching forms a crucial part of the robotic vision and aims to derive depth information from image pairs captured by the stereo cameras. Although stereo robotic vision is widely adopted for cloth manipulation robots in the research community, this remains a challenging research task. Robotic vision requires very accurate depth output in a relatively short timespan in order to successfully perform cloth manipulation in real-time. In this thesis, we mainly aim to develop a robotic stereo matching based vision system that is both efficient and effective for the task of robotic cloth manipulation. Effectiveness refers to the accuracy of the depth map generated from the stereo matching algorithms for the robot to grasp the required details to achieve the given task on cloth materials while efficiency emphasizes the required time for the stereo matching to process the images. With respect to efficiency, firstly, by exploring a variety of different hardware architectures such as multi-core CPU and graphic processors (GPU) to accelerate stereo matching, we demonstrate that the parallelised stereo-matching algorithm can be significantly accelerated, achieving 12X and 176X speed-ups respectively for multi-core CPU and GPU, compared with SISD (Single Instruction, Single Data) single-thread CPU. In terms of effectiveness, due to the fact that there are no cloth based testbeds with depth map ground-truths for evaluating the accuracy of stereo matching performance in this context, we created five different testbeds to facilitate evaluation of stereo matching in the context of cloth manipulation. In addition, we adapted a guided filtering algorithm into a pyramidical stereo matching framework that works directly for unrectified images, and evaluate its accuracy utilizing the created cloth testbeds. We demonstrate that our proposed approach is not only efficient, but also accurate and suits well to the characteristics of the task of cloth manipulations. This also shows that rather than relying on image rectification, directly applying stereo matching to unrectified images is effective and efficient. Finally, we further explore whether we can improve efficiency while maintaining reasonable accuracy for robotic cloth manipulations (i.e.~trading off accuracy for efficiency). We use a foveated matching algorithm, inspired by biological vision systems, and found that it is effective in trading off accuracy for efficiency, achieving almost the same level of accuracy for both cloth grasping and flattening tasks with two to three fold acceleration. We also demonstrate that with the robot we can use machine learning techniques to predict the optimal foveation level in order to accomplish the robotic cloth manipulation tasks successfully and much more efficiently. To summarize, in this thesis, we extensively study stereo matching, contributing to the long-term goal of developing effective ways for efficient whilst accurate robotic stereo matching for cloth manipulation

    Models and Methods for Estimation and Filtering of Signal-Dependent Noise in Imaging

    Get PDF
    The work presented in this thesis focuses on Image Processing, that is the branch of Signal Processing that centers its interest on images, sequences of images, and videos. It has various applications: imaging for traditional cameras, medical imaging, e.g., X-ray and magnetic resonance imaging (MRI), infrared imaging (thermography), e.g., for security purposes, astronomical imaging for space exploration, three-dimensional (video+depth) signal processing, and many more.This thesis covers a small but relevant slice that is transversal to this vast pool of applications: noise estimation and denoising. To appreciate the relevance of this thesis it is essential to understand why noise is such an important part of Image Processing. Every acquisition device, and every measurement is subject to interferences that causes random fluctuations in the acquired signals. If not taken into consideration with a suitable mathematical approach, these fluctuations might invalidate any use of the acquired signal. Consider, for example, an MRI used to detect a possible condition; if not suitably processed and filtered, the image could lead to a wrong diagnosis. Therefore, before any acquired image is sent to an end-user (machine or human), it undergoes several processing steps. Noise estimation and denoising are usually parts of these fundamental steps.Some sources of noise can be removed by suitably modeling the acquisition process of the camera, and developing hardware based on that model. Other sources of noise are instead inevitable: high/low light conditions of the acquired scene, hardware imperfections, temperature of the device, etc. To remove noise from an image, the noise characteristics have to be first estimated. The branch of image processing that fulfills this role is called noise estimation. Then, it is possible to remove the noise artifacts from the acquired image. This process is referred to as denoising.For practical reasons, it is convenient to model noise as random variables. In this way, we assume that the noise fluctuations take values whose probabilities follow specific distributions characterized only by few parameters. These are the parameters that we estimate. We focus our attention on noise modeled by Gaussian distributions, Poisson distributions, or a combination of these. These distributions are adopted for modeling noise affecting images from digital cameras, microscopes, telescopes, radiography systems, thermal cameras, depth-sensing cameras, etc. The parameters that define a Gaussian distribution are its mean and its variance, while a Poisson distribution depends only on its mean, since its variance is equal to the mean (signal-dependent variance). Consequently, the parameters of a Poisson-Gaussian distribution describe the relation between the intensity of the noise-free signal and the variance of the noise affecting it. Degradation models of this kind are referred to as signal-dependent noise.Estimation of signal-dependent noise is commonly performed by processing, individually, groups of pixels with equal intensity in order to sample the aforementioned relation between signal mean and noise variance. Such sampling is often subject to outliers; we propose a robust estimation model where the noise parameters are estimated optimizing a likelihood function that models the local variance estimates from each group of pixels as mixtures of Gaussian and Cauchy distributions. The proposed model is general and applicable to a variety of signal-dependent noise models, including also possible clipping of the data. We also show that, under certain hypotheses, the relation between signal mean and noise variance can also be effectively sampled from groups of pixels of possibly different intensities.Then, we propose a spatially adaptive transform to improve the denoising performance of a specific class of filters, namely nonlocal transformdomain collaborative filters. In particular, the proposed transform exploits the spatial coordinates of nonlocal similar features from an image to better decorrelate the data, and consequently to improve the filtering. Unlike non-adaptive transforms, the proposed spatially adaptive transform is capable of representing spatially smooth coarse-scale variations in the similar features of the image. Further, based on the same paradigm, we propose a method that adaptively enhances the local image features depending on their orientation with respect to the relative coordinates of other similar features at other locations in the image.An established approach for removing Poisson noise utilizes so-called variance-stabilizing transformations (VST) to make the noise variance independent of the mean of the signal, hence enabling denoising by a standard denoiser for additive Gaussian noise. Within this framework, we propose an iterative method where at each iteration the previous estimate is summed back to the noisy image in order to improve the stabilizing performance of the transformation, and consequently to improve the denoising results. The proposed iterative procedure allows to circumvent the typical drawbacks that VSTs experience at very low intensities, and thus allows us to apply the standard denoiser effectively even at extremely low counts.The developed methods achieve state-of-the-art results in their respective field of application

    Adaptive Nonlocal Signal Restoration and Enhancement Techniques for High-Dimensional Data

    Get PDF
    The large number of practical applications involving digital images has motivated a significant interest towards restoration solutions that improve the visual quality of the data under the presence of various acquisition and compression artifacts. Digital images are the results of an acquisition process based on the measurement of a physical quantity of interest incident upon an imaging sensor over a specified period of time. The quantity of interest depends on the targeted imaging application. Common imaging sensors measure the number of photons impinging over a dense grid of photodetectors in order to produce an image similar to what is perceived by the human visual system. Different applications focus on the part of the electromagnetic spectrum not visible by the human visual system, and thus require different sensing technologies to form the image. In all cases, even with the advance of technology, raw data is invariably affected by a variety of inherent and external disturbing factors, such as the stochastic nature of the measurement processes or challenging sensing conditions, which may cause, e.g., noise, blur, geometrical distortion and color aberration. In this thesis we introduce two filtering frameworks for video and volumetric data restoration based on the BM3D grouping and collaborative filtering paradigm. In its general form, the BM3D paradigm leverages the correlation present within a nonlocal emph{group} composed of mutually similar basic filtering elements, e.g., patches, to attain an enhanced sparse representation of the group in a suitable transform domain where the energy of the meaningful part of the signal can be thus separated from that of the noise through coefficient shrinkage. We argue that the success of this approach largely depends on the form of the used basic filtering elements, which in turn define the subsequent spectral representation of the nonlocal group. Thus, the main contribution of this thesis consists in tailoring specific basic filtering elements to the the inherent characteristics of the processed data at hand. Specifically, we embed the local spatial correlation present in volumetric data through 3-D cubes, and the local spatial and temporal correlation present in videos through 3-D spatiotemporal volumes, i.e. sequences of 2-D blocks following a motion trajectory. The foundational aspect of this work is the analysis of the particular spectral representation of these elements. Specifically, our frameworks stack mutually similar 3-D patches along an additional fourth dimension, thus forming a 4-D data structure. By doing so, an effective group spectral description can be formed, as the phenomena acting along different dimensions in the data can be precisely localized along different spectral hyperplanes, and thus different filtering shrinkage strategies can be applied to different spectral coefficients to achieve the desired filtering results. This constitutes a decisive difference with the shrinkage traditionally employed in BM3D-algorithms, where different hyperplanes of the group spectrum are shrunk subject to the same degradation model. Different image processing problems rely on different observation models and typically require specific algorithms to filter the corrupted data. As a consequent contribution of this thesis, we show that our high-dimensional filtering model allows to target heterogeneous noise models, e.g., characterized by spatial and temporal correlation, signal-dependent distributions, spatially varying statistics, and non-white power spectral densities, without essential modifications to the algorithm structure. As a result, we develop state-of-the-art methods for a variety of fundamental image processing problems, such as denoising, deblocking, enhancement, deflickering, and reconstruction, which also find practical applications in consumer, medical, and thermal imaging

    Interactive Evolutionary Algorithms for Image Enhancement and Creation

    Get PDF
    Image enhancement and creation, particularly for aesthetic purposes, are tasks for which the use of interactive evolutionary algorithms would seem to be well suited. Previous work has concentrated on the development of various aspects of the interactive evolutionary algorithms and their application to various image enhancement and creation problems. Robust evaluation of algorithmic design options in interactive evolutionary algorithms and the comparison of interactive evolutionary algorithms to alternative approaches to achieving the same goals is generally less well addressed. The work presented in this thesis is primarily concerned with different interactive evolutionary algorithms, search spaces, and operators for setting the input values required by image processing and image creation tasks. A secondary concern is determining when the use of the interactive evolutionary algorithm approach to image enhancement problems is warranted and how it compares with alternative approaches. Various interactive evolutionary algorithms were implemented and compared in a number of specifically devised experiments using tasks of varying complexity. A novel aspect of this thesis, with regards to other work in the study of interactive evolutionary algorithms, was that statistical analysis of the data gathered from the experiments was performed. This analysis demonstrated, contrary to popular assumption, that the choice of algorithm parameters, operators, search spaces, and even the underlying evolutionary algorithm has little effect on the quality of the resulting images or the time it takes to develop them. It was found that the interaction methods chosen when implementing the user interface of the interactive evolutionary algorithms had a greater influence on the performances of the algorithms

    Model-Based Environmental Visual Perception for Humanoid Robots

    Get PDF
    The visual perception of a robot should answer two fundamental questions: What? and Where? In order to properly and efficiently reply to these questions, it is essential to establish a bidirectional coupling between the external stimuli and the internal representations. This coupling links the physical world with the inner abstraction models by sensor transformation, recognition, matching and optimization algorithms. The objective of this PhD is to establish this sensor-model coupling

    Inverse problem theory in shape and action modeling

    Get PDF
    In this thesis we consider shape and action modeling problems under the perspective of inverse problem theory. Inverse problem theory proposes a mathematical framework for solving model parameter estimation problems. Inverse problems are typically ill-posed, which makes their solution challenging. Regularization theory and Bayesian statistical methods, which are proposed in the context of inverse problem theory, provide suitable methods for dealing with ill-posed problems. Regarding the application of inverse problem theory in shape and action modeling, we first discuss the problem of saliency prediction, considering a model proposed by the coherence theory of attention. According to coherence theory, salience regions emerge via proto-objects which we model using harmonic functions (thin-membranes). We also discuss the modeling of the 3D scene, as it is fundamental for extracting suitable scene features, which guide the generation of proto-objects. The next application we consider is the problem of image fusion. In this context, we propose a variational image fusion framework, based on confidence driven total variation regularization, and we consider its application to the problem of depth image fusion, which is an important step in the dense 3D scene reconstruction pipeline. The third problem we encounter regards action modeling, and in particular the recognition of human actions based on 3D data. Here, we employ a Bayesian nonparametric model to capture the idiosyncratic motions of the different body parts. Recognition is achieved by comparing the motion behaviors of the subject to a dictionary of behaviors for each action, learned by examples collected from other subjects. Next, we consider the 3D modeling of articulated objects from images taken from the web, with application to the 3D modeling of animals. By decomposing the full object in rigid components and by considering different aspects of these components, we model the object up this hierarchy, in order to obtain a 3D model of the entire object. Single view 3D modeling as well as model registration is performed, based on regularization methods. The last problem we consider, is the modeling of 3D specular (non-Lambertian) surfaces from a single image. To solve this challenging problem we propose a Bayesian non-parametric model for estimating the normal field of the surface from its appearance, by identifying the material of the surface. After computing an initial model of the surface, we apply regularization of its normal field considering also a photo-consistency constraint, in order to estimate the final shape of the surface. Finally, we conclude this thesis by summarizing the most significant results and by suggesting future directions regarding the application of inverse problem theory to challenging computer vision problems, as the ones encountered in this work

    The role of phonology in visual word recognition: evidence from Chinese

    Get PDF
    Posters - Letter/Word Processing V: abstract no. 5024The hypothesis of bidirectional coupling of orthography and phonology predicts that phonology plays a role in visual word recognition, as observed in the effects of feedforward and feedback spelling to sound consistency on lexical decision. However, because orthography and phonology are closely related in alphabetic languages (homophones in alphabetic languages are usually orthographically similar), it is difficult to exclude an influence of orthography on phonological effects in visual word recognition. Chinese languages contain many written homophones that are orthographically dissimilar, allowing a test of the claim that phonological effects can be independent of orthographic similarity. We report a study of visual word recognition in Chinese based on a mega-analysis of lexical decision performance with 500 characters. The results from multiple regression analyses, after controlling for orthographic frequency, stroke number, and radical frequency, showed main effects of feedforward and feedback consistency, as well as interactions between these variables and phonological frequency and number of homophones. Implications of these results for resonance models of visual word recognition are discussed.postprin
    corecore