4 research outputs found

    Acceleration of Spiking Neural Networks on Multicore Architectures

    Get PDF
    The human cortex is the seat of learning and cognition. Biological scale implementations of cortical models have the potential to provide significantly more power problem solving capabilities than traditional computing algorithms. The large scale implementation and design of these models has attracted significant attention recently. High performance implementations of the models are needed to enable such large scale designs. This thesis examines the acceleration of the spiking neural network class of cortical models on several modern multicore processors. These include the Izhikevich, Wilson, Morris-Lecar, and Hodgkin-Huxley models. The architectures examined are the STI Cell, Sun UltraSPARC T2+, and Intel Xeon E5345. Results indicate that these modern multicore processors can provide significant speed-ups and thus are useful in developing large scale cortical models. The models are then implemented on a 50 TeraFLOPS 336 node PlayStation 3 cluster. Results indicate that the models scale well on this cluster and can emulate 108 neurons and 1010 synapses. These numbers are comparable to the large scale cortical model implementation studies performed by IBM using the Blue Gene/L supercomputer. This study indicates that a cluster of PlayStation 3s can provide an economical, yet powerful, platform for simulating large scale biological models

    PERFORMANCE ANALYSIS AND FITNESS OF GPGPU AND MULTICORE ARCHITECTURES FOR SCIENTIFIC APPLICATIONS

    Get PDF
    Recent trends in computing architecture development have focused on exploiting task- and data-level parallelism from applications. Major hardware vendors are experimenting with novel parallel architectures, such as the Many Integrated Core (MIC) from Intel that integrates 50 or more x86 processors on a single chip, the Accelerated Processing Unit from AMD that integrates a multicore x86 processor with a graphical processing unit (GPU), and many other initiatives from other hardware vendors that are underway. Therefore, various types of architectures are available to developers for accelerating an application. A performance model that predicts the suitability of the architecture for accelerating an application would be very helpful prior to implementation. Thus, in this research, a Fitness model that ranks the potential performance of accelerators for an application is proposed. Then the Fitness model is extended using statistical multiple regression to model both the runtime performance of accelerators and the impact of programming models on accelerator performance with high degree of accuracy. We have validated both performance models for all the case studies. The error rate of these models, calculated using the experimental performance data, is tolerable in the high-performance computing field. In this research, to develop and validate the two performance models we have also analyzed the performance of several multicore CPUs and GPGPU architectures and the corresponding programming models using multiple case studies. The first case study used in this research is a matrix-matrix multiplication algorithm. By varying the size of the matrix from a small size to a very large size, the performance of the multicore and GPGPU architectures are studied. The second case study used in this research is a biological spiking neural network (SNN), implemented with four neuron models that have varying requirements for communication and computation making them useful for performance analysis of the hardware platforms. We report and analyze the performance variation of the four popular accelerators (Intel Xeon, AMD Opteron, Nvidia Fermi, and IBM PS3) and four advanced CPU architectures (Intel 32 core, AMD 32 core, IBM 16 core, and SUN 32 core) with problem size (matrix and network size) scaling, available optimization techniques and execution configuration. This thorough analysis provides insight regarding how the performance of an accelerator is affected by problem size, optimization techniques, and accelerator configuration. We have analyzed the performance impact of four popular multicore parallel programming models, POSIX-threading, Open Multi-Processing (OpenMP), Open Computing Language (OpenCL), and Concurrency Runtime on an Intel i7 multicore architecture; and, two GPGPU programming models, Compute Unified Device Architecture (CUDA) and OpenCL, on a NVIDIA GPGPU. With the broad study conducted using a wide range of application complexity, multiple optimizations, and varying problem size, it was found that according to their achievable performance, the programming models for the x86 processor cannot be ranked across all applications, whereas the programming models for GPGPU can be ranked conclusively. We also have qualitatively and quantitatively ranked all the six programming models in terms of their perceived programming effort. The results and analysis in this research indicate and are supported by the proposed performance models that for a given hardware system, the best performance for an application is obtained with a proper match of programming model and architecture

    Efficient multiprocessing architectures for spiking neural network emulation based on configurable devices

    Get PDF
    The exploration of the dynamics of bioinspired neural networks has allowed neuroscientists to understand some clues and structures of the brain. Electronic neural network implementations are useful tools for this exploration. However, appropriate architectures are necessary due to the extremely high complexity of those networks. There has been an extraordinary development in reconfigurable computing devices within a short period of time especially in their resource availability, speed, and reconfigurability (FPGAs), which makes these devices suitable to emulate those networks. Reconfigurable parallel hardware architecture is proposed in this thesis in order to emulate in real time complex and biologically realistic spiking neural networks (SNNs). Some relevant SNN models and their hardware approaches have been studied, and analyzed in order to create an architecture that supports the implementation of these SNN models efficiently. The key factors, which involve flexibility in algorithm programmability, high performance processing, low area and power consumption, have been taken into account. In order to boost the performance of the proposed architecture, several techniques have been developed: time to space mapping, neural virtualization, flexible synapse-neuron mapping, specific learning and execution modes, among others. Besides this, an interface unit has been developed in order to build a bio-inspired system, which can process sensory information from the environment. The spiking-neuron-based system combines analog and digital multi-processor implementations. Several applications have been developed as a proof-of-concept in order to show the capabilities of the proposed architecture for processing this type of information.L'estudi de la dinàmica de les xarxes neuronals bio-inspirades ha permès als neurocientífics entendre alguns processos i estructures del cervell. Les implementacions electròniques d'aquestes xarxes neuronals són eines útils per dur a terme aquest tipus d'estudi. No obstant això, l'alta complexitat de les xarxes neuronals requereix d'una arquitectura apropiada que pugui simular aquest tipus de xarxes. Emular aquest tipus de xarxes en dispositius configurables és possible a causa del seu extraordinari desenvolupament respecte a la seva disponibilitat de recursos, velocitat i capacitat de reconfiguració (FPGAs ). En aquesta tesi es proposa una arquitectura maquinari paral·lela i configurable per emular les complexes i realistes xarxes neuronals tipus spiking en temps real. S'han estudiat i analitzat alguns models de neurones tipus spiking rellevants i les seves implementacions en maquinari , amb la finalitat de crear una arquitectura que suporti la implementació d'aquests models de manera eficient . S'han tingut en compte diversos factors clau, incloent flexibilitat en la programació d'algorismes, processament d'alt rendiment, baix consum d'energia i àrea. S'han aplicat diverses tècniques en l'arquitectura desenvolupada amb el propòsit d'augmentar la seva capacitat de processament. Aquestes tècniques són: mapejat de temps a espai, virtualització de les neurones, mapeig flexible de neurones i sinapsis, modes d'execució, i aprenentatge específic, entre d'altres. A més, s'ha desenvolupat una unitat d'interfície de dades per tal de construir un sistema bio-inspirat, que pot processar informació sensorial del medi ambient. Aquest sistema basat en neurones tipus spiking combina implementacions analògiques i digitals. S'han desenvolupat diverses aplicacions usant aquest sistema com a prova de concepte, per tal de mostrar les capacitats de l'arquitectura proposada per al processament d'aquest tipus d'informació
    corecore