18,345 research outputs found

    Benchmarking a many-core neuromorphic platform with an MPI-based DNA sequence matching algorithm

    Get PDF
    SpiNNaker is a neuromorphic globally asynchronous locally synchronous (GALS)multi-core architecture designed for simulating a spiking neural network (SNN) in real-time. Several studies have shown that neuromorphic platforms allow flexible and efficient simulations of SNN by exploiting the efficient communication infrastructure optimised for transmitting small packets across the many cores of the platform. However, the effectiveness of neuromorphic platforms in executing massively parallel general-purpose algorithms, while promising, is still to be explored. In this paper, we present an implementation of a parallel DNA sequence matching algorithm implemented by using the MPI programming paradigm ported to the SpiNNaker platform. In our implementation, all cores available in the board are configured for executing in parallel an optimised version of the Boyer-Moore (BM) algorithm. Exploiting this application, we benchmarked the SpiNNaker platform in terms of scalability and synchronisation latency. Experimental results indicate that the SpiNNaker parallel architecture allows a linear performance increase with the number of used cores and shows better scalability compared to a general-purpose multi-core computing platform

    Efficient Architecture and Implementation of Vector Median Filter in Co-Design Context

    Get PDF
    This work presents an efficient fast parallel architecture of the Vector Median Filter (VMF) using combined hardware/software (HW/SW) implementation. The hardware part of the system is implemented using VHDL language, whereas the software part is developed using C/C++ language. The software part of the embedded system uses the NIOS-II softcore processor and the operating system used is ÎĽClinux. The comparison between the software and HW/SW solutions shows that adding a hardware part in the design attempts to speed up the filtering process compared to the software solution. This efficient embedded system implementation can perform well in several image processing applications

    A general framework for efficient FPGA implementation of matrix product

    Get PDF
    Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe

    VLSI implementation of a massively parallel wavelet based zerotree coder for the intelligent pixel array

    Get PDF
    In the span of a few years, mobile multimedia communication has rapidly become a significant area of research and development constantly challenging boundaries on a variety of technologic fronts. Mobile video communications in particular encompasses a number of technical hurdles that generally steer technological advancements towards devices that are low in complexity, low in power usage yet perform the given task efficiently. Devices of this nature have been made available through the use of massively parallel processing arrays such as the Intelligent Pixel Processing Array. The Intelligent Pixel Processing array is a novel concept that integrates a parallel image capture mechanism, a parallel processing component and a parallel display component into a single chip solution geared toward mobile communications environments, be it a PDA based system or the video communicator wristwatch portrayed in Dick Tracy episodes. This thesis details work performed to provide an efficient, low power, low complexity solution surrounding the massively parallel implementation of a zerotree entropy codec for the Intelligent Pixel Array

    Optical control and switching of excitation transfer in nano-arrays

    Get PDF
    The possibility of influencing resonance energy transfer through the input of off-resonant pulses of laser radiation is the subject of recent research. Attention is now focused on systems in which resonance energy transfer is designedly precluded by geometric configuration. Here, through an optically nonlinear mechanism - optically controlled resonance energy transfer - the throughput of non-resonant pulses can facilitate energy transfer that is, in their absence, completely forbidden. The system thus functions as an optical buffer, with excitation throughput switched on by the secondary beam. For applications, a system based on two parallel nano-arrays is envisaged. This paper will establish and discuss the principles - those that can be exploited to enhance switching characteristics and efficiency, and others (such as off-axis excitation transfer) that may represent cross-talk limitations. Principles to be explored in detail are the interplay between geometric features, including the array architecture and repeat distance (lattice constant), the array spacing and translational symmetry, the orientations of the transition dipoles, and the magnitude of the relevant components of the nonlinear response tensors. The aim is, through a determination of key parameters, to inform a program of optimization that can deliver specific criteria for realizing the most efficient systems for implementation

    Implementation of multirate time integration methods for air pollution modelling

    Get PDF
    Explicit time integration methods are characterised by a small numerical effort per time step. In the application to multiscale problems in atmospheric modelling, this benefit is often more than compensated by stability problems and step size restrictions resulting from stiff chemical reaction terms and from a locally varying Courant-Friedrichs-Lewy (CFL) condition for the advection terms. Splitting methods may be applied to efficiently combine implicit and explicit methods (IMEX splitting). Complementarily multirate time integration schemes allow for a local adaptation of the time step size to the grid size. In combination, these approaches lead to schemes which are efficient in terms of evaluations of the right-hand side. Special challenges arise when these methods are to be implemented. For an efficient implementation, it is crucial to locate and exploit redundancies. Furthermore, the more complex programme flow may lead to computational overhead which, in the worst case, more than compensates the theoretical gain in efficiency. We present a general splitting approach which allows both for IMEX splittings and for local time step adaptation. The main focus is on an efficient implementation of this approach for parallel computation on computer clusters

    Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments

    Full text link
    Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. Here we show that SpGEMM also yields efficient algorithms for general sparse-matrix indexing in distributed memory, provided that the underlying SpGEMM implementation is sufficiently flexible and scalable. We demonstrate that our parallel SpGEMM methods, which use two-dimensional block data distributions with serial hypersparse kernels, are indeed highly flexible, scalable, and memory-efficient in the general case. This algorithm is the first to yield increasing speedup on an unbounded number of processors; our experiments show scaling up to thousands of processors in a variety of test scenarios
    • …
    corecore