571 research outputs found

    Artifact reduction for separable non-local means

    Full text link
    It was recently demonstrated [J. Electron. Imaging, 25(2), 2016] that one can perform fast non-local means (NLM) denoising of one-dimensional signals using a method called lifting. The cost of lifting is independent of the patch length, which dramatically reduces the run-time for large patches. Unfortunately, it is difficult to directly extend lifting for non-local means denoising of images. To bypass this, the authors proposed a separable approximation in which the image rows and columns are filtered using lifting. The overall algorithm is significantly faster than NLM, and the results are comparable in terms of PSNR. However, the separable processing often produces vertical and horizontal stripes in the image. This problem was previously addressed by using a bilateral filter-based post-smoothing, which was effective in removing some of the stripes. In this letter, we demonstrate that stripes can be mitigated in the first place simply by involving the neighboring rows (or columns) in the filtering. In other words, we use a two-dimensional search (similar to NLM), while still using one-dimensional patches (as in the previous proposal). The novelty is in the observation that one can use lifting for performing two-dimensional searches. The proposed approach produces artifact-free images, whose quality and PSNR are comparable to NLM, while being significantly faster.Comment: To appear in Journal of Electronic Imagin

    Optimal prefilters for display enhancement

    Get PDF
    Creating images from a set of discrete samples is arguably the most common operation in computer graphics and image processing, lying, for example, at the heart of rendering and image downscaling techniques. Traditional tools for this task are based on classic sampling theory and are modeled under mathematical conditions which are, in most cases, unrealistic; for example, sinc reconstruction – required by Shannon theorem in order to recover a signal exactly – is impossible to achieve in practice because LCD displays perform a box-like interpolation of the samples. Moreover, when an image is made for a human to look at, it will necessarily undergo some modifications due to the human optical system and all the neural processes involved in vision. Finally, image processing practitioners noticed that sinc prefiltering – also required by Shannon theorem – often leads to visually unpleasant images. From these facts, we can deduce that we cannot guarantee, via classic sampling theory, that the signal we see in a display is the best representation of the original image we had in first place. In this work, we propose a novel family of image prefilters based on modern sampling theory, and on a simple model of how the human visual system perceives an image on a display. The use of modern sampling theory guarantees us that the perceived image, based on this model, is indeed the best representation possible, and at virtually no computational overhead. We analyze the spectral properties of these prefilters, showing that they offer the possibility of trading-off aliasing and ringing, while guaranteeing that images look sharper then those generated with both classic and state-of-the-art filters. Finally, we compare it against other solutions in a selection of applications which include Monte Carlo rendering and image downscaling, also giving directions on how to apply it in different contexts.Exibir imagens a partir de um conjunto discreto de amostras é certamente uma das operações mais comuns em computação gráfica e processamento de imagens. Ferramentas tradicionais para essa tarefa são baseadas no teorema de Shannon e são modeladas em condições matemáticas que são, na maior parte dos casos, irrealistas; por exemplo, reconstrução com sinc – necessária pelo teorema de Shannon para recuperar um sinal exatamente – é impossível na prática, já que displays LCD realizam uma reconstrução mais próxima de uma interpolação com kernel box. Além disso, profissionais em processamento de imagem perceberam que prefiltragem com sinc – também requerida pelo teorema de Shannon – em geral leva a imagens visualmente desagradáveis devido ao fenômeno de ringing: oscilações próximas a regiões de descontinuidade nas imagens. Desses fatos, deduzimos que não é possível garantir, via ferramentas tradicionais de amostragem e reconstrução, que a imagem que observamos em um display digital é a melhor representação para a imagem original. Neste trabalho, propomos uma família de prefiltros baseada em teoria de amostragem generalizada e em um modelo de como o sistema ótico do olho humano modifica uma imagem. Proposta por Unser and Aldroubi (1994), a teoria de amostragem generalizada é mais geral que o teorema proposto por Shannon, e mostra como é possível pré-filtrar e reconstruir sinais usando kernels diferentes do sinc. Modelamos o sistema ótico do olho como uma câmera com abertura finita e uma lente delgada, o que apesar de ser simples é suficiente para os nossos propósitos. Além de garantir aproximação ótima quando reconstruindo as amostras por um display e filtrando a imagem com o modelo do sistema ótico humano, a teoria de amostragem generalizada garante que essas operações são extremamente eficientes, todas lineares no número de pixels de entrada. Também, analisamos as propriedades espectrais desses filtros e de técnicas semelhantes na literatura, mostrando que é possível obter um bom tradeoff entre aliasing e ringing (principais artefatos quando lidamos com amostragem e reconstrução de imagens), enquanto garantimos que as imagens finais são mais nítidas que aquelas geradas por técnicas existentes na literatura. Finalmente, mostramos algumas aplicações da nossa técnica em melhoria de imagens, adaptação à distâncias de visualização diferentes, redução de imagens e renderização de imagens sintéticas por método de Monte Carlo

    Signal processing for improved MPEG-based communication systems

    Get PDF

    Design of a High-Speed Architecture for Stabilization of Video Captured Under Non-Uniform Lighting Conditions

    Get PDF
    Video captured in shaky conditions may lead to vibrations. A robust algorithm to immobilize the video by compensating for the vibrations from physical settings of the camera is presented in this dissertation. A very high performance hardware architecture on Field Programmable Gate Array (FPGA) technology is also developed for the implementation of the stabilization system. Stabilization of video sequences captured under non-uniform lighting conditions begins with a nonlinear enhancement process. This improves the visibility of the scene captured from physical sensing devices which have limited dynamic range. This physical limitation causes the saturated region of the image to shadow out the rest of the scene. It is therefore desirable to bring back a more uniform scene which eliminates the shadows to a certain extent. Stabilization of video requires the estimation of global motion parameters. By obtaining reliable background motion, the video can be spatially transformed to the reference sequence thereby eliminating the unintended motion of the camera. A reflectance-illuminance model for video enhancement is used in this research work to improve the visibility and quality of the scene. With fast color space conversion, the computational complexity is reduced to a minimum. The basic video stabilization model is formulated and configured for hardware implementation. Such a model involves evaluation of reliable features for tracking, motion estimation, and affine transformation to map the display coordinates of a stabilized sequence. The multiplications, divisions and exponentiations are replaced by simple arithmetic and logic operations using improved log-domain computations in the hardware modules. On Xilinx\u27s Virtex II 2V8000-5 FPGA platform, the prototype system consumes 59% logic slices, 30% flip-flops, 34% lookup tables, 35% embedded RAMs and two ZBT frame buffers. The system is capable of rendering 180.9 million pixels per second (mpps) and consumes approximately 30.6 watts of power at 1.5 volts. With a 1024×1024 frame, the throughput is equivalent to 172 frames per second (fps). Future work will optimize the performance-resource trade-off to meet the specific needs of the applications. It further extends the model for extraction and tracking of moving objects as our model inherently encapsulates the attributes of spatial distortion and motion prediction to reduce complexity. With these parameters to narrow down the processing range, it is possible to achieve a minimum of 20 fps on desktop computers with Intel Core 2 Duo or Quad Core CPUs and 2GB DDR2 memory without a dedicated hardware

    Image Processing Using FPGAs

    Get PDF
    This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs

    A volume filtering and rendering system for an improved visual balance of feature preservation and noise suppression in medical imaging

    Get PDF
    Preserving or enhancing salient features whilst effectively suppressing noise-derived artifacts and extraneous detail have been two consistent yet competing objectives in volumetric medical image processing. Illustrative techniques (and methods inspired by them) can help to enhance and, if desired, isolate the depiction of specific regions of interest whilst retaining overall context. However, highlighting or enhancing specific features can have the undesirable side-effect of highlighting noise. Second-derivative based methods can be employed effectively in both the rendering and volume filtering stages of a visualisation pipeline to enhance the depiction of feature detail whilst minimising noise-based artifacts. We develop a new 3D anisotropic-diffusion PDE for an improved balance of feature-retention and noise reduction; furthermore, we present a feature-enhancing visualisation pipeline that can be applied to multiple modalities and has been shown to be particularly effective in the context of 3D ultrasound

    Convolutional Dictionary Learning: Acceleration and Convergence

    Full text link
    Convolutional dictionary learning (CDL or sparsifying CDL) has many applications in image processing and computer vision. There has been growing interest in developing efficient algorithms for CDL, mostly relying on the augmented Lagrangian (AL) method or the variant alternating direction method of multipliers (ADMM). When their parameters are properly tuned, AL methods have shown fast convergence in CDL. However, the parameter tuning process is not trivial due to its data dependence and, in practice, the convergence of AL methods depends on the AL parameters for nonconvex CDL problems. To moderate these problems, this paper proposes a new practically feasible and convergent Block Proximal Gradient method using a Majorizer (BPG-M) for CDL. The BPG-M-based CDL is investigated with different block updating schemes and majorization matrix designs, and further accelerated by incorporating some momentum coefficient formulas and restarting techniques. All of the methods investigated incorporate a boundary artifacts removal (or, more generally, sampling) operator in the learning model. Numerical experiments show that, without needing any parameter tuning process, the proposed BPG-M approach converges more stably to desirable solutions of lower objective values than the existing state-of-the-art ADMM algorithm and its memory-efficient variant do. Compared to the ADMM approaches, the BPG-M method using a multi-block updating scheme is particularly useful in single-threaded CDL algorithm handling large datasets, due to its lower memory requirement and no polynomial computational complexity. Image denoising experiments show that, for relatively strong additive white Gaussian noise, the filters learned by BPG-M-based CDL outperform those trained by the ADMM approach.Comment: 21 pages, 7 figures, submitted to IEEE Transactions on Image Processin

    Computer vision in target pursuit using a UAV

    Get PDF
    Research in target pursuit using Unmanned Aerial Vehicle (UAV) has gained attention in recent years, this is primarily due to decrease in cost and increase in demand of small UAVs in many sectors. In computer vision, target pursuit is a complex problem as it involves the solving of many sub-problems which are typically concerned with the detection, tracking and following of the object of interest. At present, the majority of related existing methods are developed using computer simulation with the assumption of ideal environmental factors, while the remaining few practical methods are mainly developed to track and follow simple objects that contain monochromatic colours with very little texture variances. Current research in this topic is lacking of practical vision based approaches. Thus the aim of this research is to fill the gap by developing a real-time algorithm capable of following a person continuously given only a photo input. As this research considers the whole procedure as an autonomous system, therefore the drone is activated automatically upon receiving a photo of a person through Wi-Fi. This means that the whole system can be triggered by simply emailing a single photo from any device anywhere. This is done by first implementing image fetching to automatically connect to WIFI, download the image and decode it. Then, human detection is performed to extract the template from the upper body of the person, the intended target is acquired using both human detection and template matching. Finally, target pursuit is achieved by tracking the template continuously while sending the motion commands to the drone. In the target pursuit system, the detection is mainly accomplished using a proposed human detection method that is capable of detecting, extracting and segmenting the human body figure robustly from the background without prior training. This involves detecting face, head and shoulder separately, mainly using gradient maps. While the tracking is mainly accomplished using a proposed generic and non-learning template matching method, this involves combining intensity template matching with colour histogram model and employing a three-tier system for template management. A flight controller is also developed, it supports three types of controls: keyboard, mouse and text messages. Furthermore, the drone is programmed with three different modes: standby, sentry and search. To improve the detection and tracking of colour objects, this research has also proposed several colour related methods. One of them is a colour model for colour detection which consists of three colour components: hue, purity and brightness. Hue represents the colour angle, purity represents the colourfulness and brightness represents intensity. It can be represented in three different geometric shapes: sphere, hemisphere and cylinder, each of these shapes also contains two variations. Experimental results have shown that the target pursuit algorithm is capable of identifying and following the target person robustly given only a photo input. This can be evidenced by the live tracking and mapping of the intended targets with different clothing in both indoor and outdoor environments. Additionally, the various methods developed in this research could enhance the performance of practical vision based applications especially in detecting and tracking of objects
    corecore