4 research outputs found

    High-Performance Architecture for Color Error Diffusion

    Get PDF
    Error diffusion is one of the most widely used algorithms for halftoning gray scale and color images. It works by distributing the thresholding error of each pixel to unprocessed neighboring pixels, while maintaining the average value of the image. Error diffusion results in inter-pixel data dependencies that prohibit a simplistic data pipelining processing approach and increase the memory requirements of the system. In this paper, we present a multiprocessing approach to overcome these difficulties, which results in a novel architecture for high performance hardware implementation of error diffusion algorithms. The proposed architecture is scalable, flexible, cost effective, and may be adopted for processing gray scale or color images. The key idea in this approach is to simultaneously process pixels in separate rows and columns in a diagonal arrangement, so that data dependencies across processing elements are avoided. The processor was realized using an FPGA implementation and may be used for real-time image rendering in high-speed scanning or printing. The entire system runs at the input clock rate, allowing the performance to scale linearly with the clock rate. Higher data rate applications required by future applications will automatically be supported using more advanced high-speed FPGA technologies

    Multimedia processor-based implementation of an error-diffusion halftoning algorithm exploiting subword parallelism

    Get PDF
    Multimedia processor-based implementation of digital image processing algorithms has become important since several multimedia processors, such as the Intel Pentium MMX, are now available and can replace special-purpose hardware- based systems because of their flexibility. Multimedia processors increase throughput by processing multiple pixels simultaneously using a subword-parallel arithmetic and logic unit architecture. The error-diffusion halftoning algorithm employs feedback of quantized output signals to faithfully convert a multi-level image to a binary image or to one with fewer levels of quantization. This makes it difficult to achieve speedup by utilizing the multimedia extension. In this study, the error-diffusion halftoning algorithm is implemented for a multimedia processor using three methods: single-pixel, single-line, and multiple-line processing. The single-pixel approach is the closest to conventional implementations, but the multimedia extension is used only in the filter kernel. The single-line approach computes multiple pixels in one scan-line simultaneously, but requires a complex algorithm transformation to remove dependencies between pixels. The multiple-line method exploits parallelism by employing a skewed data structure and processing multiple pixels in different scan-lines. The Pentium MMX instruction set is used for quantitative performance evaluation including run-time overheads and misaligned memory accesses. A speedup of more than ten times is achieved compared to the software (integer C) implementation on a conventional processor for the structurally sequential error-diffusion halftoning algorithm

    Media processor implementations of image rendering algorithms

    Get PDF
    Demands for fast execution of image processing are a driving force for today\u27s computing market. Many image processing applications require intense numeric calculations to be done on large sets of data with minimal overhead time. To meet this challenge, several approaches have been used. Custom-designed hardware devices are very fast implementations used in many systems today. However, these devices are very expensive and inflexible. General purpose computers with enhanced multimedia instructions offer much greater flexibility but process data at a much slower rate than the custom-hardware devices. Digital signal processors (DSP\u27s) and media processors, such as the MAP-CA created by Equator Technologies, Inc., may be an efficient alternative that provides a low-cost combination of speed and flexibility. Today, DSP\u27s and media processors are commonly used in image and video encoding and decoding, including JPEG and MPEG processing techniques. Little work has been done to determine how well these processors can perform other image process ing techniques, specifically image rendering for printing. This project explores various image rendering algorithms and the performance achieved by running them on a me dia processor to determine if this type of processor is a viable competitor in the image rendering domain. Performance measurements obtained when implementing rendering algorithms on the MAP-CA show that a 4.1 speedup can be achieved with neighborhood-type processes, while point-type processes achieve an average speedup of 21.7 as compared to general purpose processor implementations

    FPGA BASED PARALLEL IMPLEMENTATION OF STACKED ERROR DIFFUSION ALGORITHM

    Get PDF
    Digital halftoning is a crucial technique used in digital printers to convert a continuoustone image into a pattern of black and white dots. Halftoning is used since printers have a limited availability of inks and cannot reproduce all the color intensities in a continuous image. Error Diffusion is an algorithm in halftoning that iteratively quantizes pixels in a neighborhood dependent fashion. This thesis focuses on the development and design of a parallel scalable hardware architecture for high performance implementation of a high quality Stacked Error Diffusion algorithm. The algorithm is described in โ€˜Cโ€™ and requires a significant processing time when implemented on a conventional CPU. Thus, a new hardware processor architecture is developed to implement the algorithm and is implemented to and tested on a Xilinx Virtex 5 FPGA chip. There is an extraordinary decrease in the run time of the algorithm when run on the newly proposed parallel architecture implemented to FPGA technology compared to execution on a single CPU. The new parallel architecture is described using the Verilog Hardware Description Language. Post-synthesis and post-implementation, performance based Hardware Description Language (HDL), simulation validation of the new parallel architecture is achieved via use of the ModelSim CAD simulation tool
    corecore