47,507 research outputs found

    Optimal processor assignment for pipeline computations

    Get PDF
    The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered

    Generalized Methodology for Array Processor Design of Real-time Systems

    Get PDF
    Many techniques and design tools have been developed for mapping algorithms to array processors. Linear mapping is usually used for regular algorithms. Large and complex problems are not regular by nature and regularization may cause a computational overhead which prevents the ability to meet real-time deadlines. In this paper, a systematic design methodology for mapping partially-regular as well as regular Dependence Graphs is presented. In this approach the set of all optimal solutions is generated under the given constraints. Due to nature of the problem and the tight timing constraints of real-time systems the set of alternative solutions is limited. An image processing example is discusse

    FPGA Implementations Comparison of Neuro-cortical Inspired Convolution Processors for Spiking Systems

    Get PDF
    Image convolution operations in digital computer systems are usually very expensive operations in terms of resource consumption (processor resources and processing time) for an efficient Real-Time application. In these scenarios the visual information is divided in frames and each one has to be completely processed before the next frame arrives. Recently a new method for computing convolutions based on the neuro-inspired philosophy of spiking systems (Address-Event-Representation systems, AER) is achieving high performances. In this paper we present two FPGA implementations of AERbased convolution processors that are able to work with 64x64 images and programmable kernels of up to 11x11 elements. The main difference is the use of RAM for integrators in one solution and the absence of integrators in the second solution that is based on mapping operations. The maximum equivalent operation rate is 163.51 MOPS for 11x11 kernels, in a Xilinx Spartan 3 400 FPGA with a 50MHz clock. Formulations, hardware architecture, operation examples and performance comparison with frame-based convolution processors are presented and discussed.Ministerio de Ciencia e Innovación TEC2006-11730-C03-02Junta de Andalucía P06-TIC-0141

    Hierarchical stack filtering : a bitplane-based algorithm for massively parallel processors

    Get PDF
    With the development of novel parallel architectures for image processing, the implementation of well-known image operators needs to be reformulated to take advantage of the so-called massive parallelism. In this work, we propose a general algorithm that implements a large class of nonlinear filters, called stack filters, with a 2D-array processor. The proposed method consists of decomposing an image into bitplanes with the bitwise decomposition, and then process every bitplane hierarchically. The filtered image is reconstructed by simply stacking the filtered bitplanes according to their order of significance. Owing to its hierarchical structure, our algorithm allows us to trade-off between image quality and processing time, and to significantly reduce the computation time of low-entropy images. Also, experimental tests show that the processing time of our method is substantially lower than that of classical methods when using large structuring elements. All these features are of interest to a variety of real-time applications based on morphological operations such as video segmentation and video enhancement

    On the AER Convolution Processors for FPGA

    Get PDF
    Image convolution operations in digital computer systems are usually very expensive operations in terms of resource consumption (processor resources and processing time) for an efficient Real-Time application. In these scenarios the visual information is divided into frames and each one has to be completely processed before the next frame arrives in order to warranty the real-time. A spike-based philosophy for computing convolutions based on the neuro-inspired Address-Event- Representation (AER) is achieving high performances. In this paper we present two FPGA implementations of AER-based convolution processors for relatively small Xilinx FPGAs (Spartan-II 200 and Spartan-3 400), which process 64x64 images with 11x11 convolution kernels. The maximum equivalent operation rate that can be reached is 163.51 MOPS for 11x11 kernels, in a Xilinx Spartan 3 400 FPGA with a 50MHz clock. Formulations, hardware architecture, operation examples and performance comparison with frame-based convolution processors are presented and discussed.Ministerio de Ciencia e Innovación TEC2006-11730-C03-02Ministerio de Ciencia e Innovación TEC2009-10639-C04-02Junta de Andalucía P06-TIC-0141

    Formal and Informal Methods for Multi-Core Design Space Exploration

    Full text link
    We propose a tool-supported methodology for design-space exploration for embedded systems. It provides means to define high-level models of applications and multi-processor architectures and evaluate the performance of different deployment (mapping, scheduling) strategies while taking uncertainty into account. We argue that this extension of the scope of formal verification is important for the viability of the domain.Comment: In Proceedings QAPL 2014, arXiv:1406.156

    A survey of parallel algorithms for fractal image compression

    Get PDF
    This paper presents a short survey of the key research work that has been undertaken in the application of parallel algorithms for Fractal image compression. The interest in fractal image compression techniques stems from their ability to achieve high compression ratios whilst maintaining a very high quality in the reconstructed image. The main drawback of this compression method is the very high computational cost that is associated with the encoding phase. Consequently, there has been significant interest in exploiting parallel computing architectures in order to speed up this phase, whilst still maintaining the advantageous features of the approach. This paper presents a brief introduction to fractal image compression, including the iterated function system theory upon which it is based, and then reviews the different techniques that have been, and can be, applied in order to parallelize the compression algorithm

    An SMP Soft Classification Algorithm for Remote Sensing

    Get PDF
    This work introduces a symmetric multiprocessing (SMP) version of the continuous iterative guided spectral class rejection (CIGSCR) algorithm, a semiautomated classification algorithm for remote sensing (multispectral) images. The algorithm uses soft data clusters to produce a soft classification containing inherently more information than a comparable hard classification at an increased computational cost. Previous work suggests that similar algorithms achieve good parallel scalability, motivating the parallel algorithm development work here. Experimental results of applying parallel CIGSCR to an image with approximately 10^8 pixels and six bands demonstrate superlinear speedup. A soft two class classification is generated in just over four minutes using 32 processors
    corecore