43,700 research outputs found

    CUDA Based Performance Evaluation of the Computational Efficiency of the DCT Image Compression Technique on Both the CPU and GPU

    Full text link
    Recent advances in computing such as the massively parallel GPUs (Graphical Processing Units),coupled with the need to store and deliver large quantities of digital data especially images, has brought a number of challenges for Computer Scientists, the research community and other stakeholders. These challenges, such as prohibitively large costs to manipulate the digital data amongst others, have been the focus of the research community in recent years and has led to the investigation of image compression techniques that can achieve excellent results. One such technique is the Discrete Cosine Transform, which helps separate an image into parts of differing frequencies and has the advantage of excellent energy-compaction. This paper investigates the use of the Compute Unified Device Architecture (CUDA) programming model to implement the DCT based Cordic based Loeffler algorithm for efficient image compression. The computational efficiency is analyzed and evaluated under both the CPU and GPU. The PSNR (Peak Signal to Noise Ratio) is used to evaluate image reconstruction quality in this paper. The results are presented and discussed.Comment: 15 Pages, 11 Figures, 4 Tables, Advanced Computing: An International Journal (ACIJ), Three Author Pictures (with little Bio for each) at last pag

    A mixed signal architecture for convolutional neural networks

    Full text link
    Deep neural network (DNN) accelerators with improved energy and delay are desirable for meeting the requirements of hardware targeted for IoT and edge computing systems. Convolutional neural networks (CoNNs) belong to one of the most popular types of DNN architectures. This paper presents the design and evaluation of an accelerator for CoNNs. The system-level architecture is based on mixed-signal, cellular neural networks (CeNNs). Specifically, we present (i) the implementation of different layers, including convolution, ReLU, and pooling, in a CoNN using CeNN, (ii) modified CoNN structures with CeNN-friendly layers to reduce computational overheads typically associated with a CoNN, (iii) a mixed-signal CeNN architecture that performs CoNN computations in the analog and mixed signal domain, and (iv) design space exploration that identifies what CeNN-based algorithm and architectural features fare best compared to existing algorithms and architectures when evaluated over common datasets -- MNIST and CIFAR-10. Notably, the proposed approach can lead to 8.7×\times improvements in energy-delay product (EDP) per digit classification for the MNIST dataset at iso-accuracy when compared with the state-of-the-art DNN engine, while our approach could offer 4.3×\times improvements in EDP when compared to other network implementations for the CIFAR-10 dataset.Comment: 25 page

    Enhanced computation method of topological smoothing on shared memory parallel machines

    Full text link
    To prepare images for better segmentation, we need preprocessing applications, such as smoothing, to reduce noise. In this paper, we present an enhanced computation method for smoothing 2D object in binary case. Unlike existing approaches, proposed method provides a parallel computation and better memory management, while preserving the topology (number of connected components) of the original image by using homotopic transformations defined in the framework of digital topology. We introduce an adapted parallelization strategy called split, distribute and merge (SDM) strategy which allows efficient parallelization of a large class of topological operators. To achieve a good speedup and better memory allocation, we cared about task scheduling and managing. Distributed work during smoothing process is done by a variable number of threads. Tests on 2D grayscale image (512*512), using shared memory parallel machine (SMPM) with 8 CPU cores (2 Xeon E5405 running at frequency of 2 GHz), showed an enhancement of 5.2 with cache success rate of 70%

    INsight: A Neuromorphic Computing System for Evaluation of Large Neural Networks

    Full text link
    Deep neural networks have been demonstrated impressive results in various cognitive tasks such as object detection and image classification. In order to execute large networks, Von Neumann computers store the large number of weight parameters in external memories, and processing elements are timed-shared, which leads to power-hungry I/O operations and processing bottlenecks. This paper describes a neuromorphic computing system that is designed from the ground up for the energy-efficient evaluation of large-scale neural networks. The computing system consists of a non-conventional compiler, a neuromorphic architecture, and a space-efficient microarchitecture that leverages existing integrated circuit design methodologies. The compiler factorizes a trained, feedforward network into a sparsely connected network, compresses the weights linearly, and generates a time delay neural network reducing the number of connections. The connections and units in the simplified network are mapped to silicon synapses and neurons. We demonstrate an implementation of the neuromorphic computing system based on a field-programmable gate array that performs the MNIST hand-written digit classification with 97.64% accuracy

    Proposal For Neuromorphic Hardware Using Spin Devices

    Full text link
    We present a design-scheme for ultra-low power neuromorphic hardware using emerging spin-devices. We propose device models for 'neuron', based on lateral spin valves and domain wall magnets that can operate at ultra-low terminal voltage of ~20 mV, resulting in small computation energy. Magnetic tunnel junctions are employed for interfacing the spin-neurons with charge-based devices like CMOS, for large-scale networks. Device-circuit co-simulation-framework is used for simulating such hybrid designs, in order to evaluate system-level performance. We present the design of different classes of neuromorphic architectures using the proposed scheme that can be suitable for different applications like, analog-data-sensing, data-conversion, cognitive-computing, associative memory, programmable-logic and analog and digital signal processing. We show that the spin-based neuromorphic designs can achieve 15X-300X lower computation energy for these applications; as compared to state of art CMOS designs

    High Performance Reconfigurable Computing Systems

    Full text link
    The rapid progress and advancement in electronic chips technology provide a variety of new implementation options for system engineers. The choice varies between the flexible programs running on a general-purpose processor (GPP) and the fixed hardware implementation using an application specific integrated circuit (ASIC). Many other implementation options present, for instance, a system with a RISC processor and a DSP core. Other options include graphics processors and microcontrollers. Specialist processors certainly improve performance over general-purpose ones, but this comes as a quid pro quo for flexibility. Combining the flexibility of GPPs and the high performance of ASICs leads to the introduction of reconfigurable computing (RC) as a new implementation option with a balance between versatility and speed. The focus of this chapter is on introducing reconfigurable computers as modern super computing architectures. The chapter also investigates the main reasons behind the current advancement in the development of RC-systems. Furthermore, a technical survey of various RC-systems is included laying common grounds for comparisons. In addition, this chapter mainly presents case studies implemented under the MorphoSys RC-system. The selected case studies belong to different areas of application, such as, computer graphics and information coding. Parallel versions of the studied algorithms are developed to match the topologies supported by the MorphoSys. Performance evaluation and results analyses are included for implementations with different characteristics.Comment: 53 pages, 14 tables, 15 figure

    Resource Efficient LDPC Decoders for Multimedia Communication

    Full text link
    Achieving high image quality is an important aspect in an increasing number of wireless multimedia applications. These applications require resource efficient error correction hardware to detect and correct errors introduced by the communication channel. This paper presents an innovative flexible architecture for error correction using Low-Density Parity-Check (LDPC) codes. The proposed partially-parallel decoder architecture utilizes a novel code construction technique based on multi-level Hierarchical Quasi-Cyclic (HQC) matrix with innovative layering of random sub-matrices. Simulation of a high-level MATLAB model shows that the proposed HQC matrices have bit error rate (BER) performance close to that of unstructured random matrices. The proposed decoder has been implemented on FPGA. It is very resource efficient and provides very high throughput compared to other decoders reported to date. Performance evaluation of the decoder has been carried out by transmitting JPEG images over an AWGN channel and comparing the quality of the reconstructed images with those from other decoders.Comment: 10 pages, 12 figures, 4 tables, submitted to Journa

    Comparative Performance Analysis of Intel Xeon Phi, GPU, and CPU

    Full text link
    We investigate and characterize the performance of an important class of operations on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained from whole slide tissue specimens using microscopy image scanners. We identify the data access and computation patterns of operations in object segmentation and feature computation categories. We systematically implement and evaluate the performance of these core operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application. Our results show that (1) the data access pattern and parallelization strategy employed by the operations strongly affect their performance. While the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU; (2) GPUs are significantly more efficient than MICs for operations and algorithms that irregularly access data. This is a result of the low performance of the latter when it comes to random data access; (3) adequate coordinated execution on MICs and CPUs using a performance aware task scheduling strategy improves about 1.29x over a first-come-first-served strategy. The example application attained an efficiency of 84% in an execution with of 192 nodes (3072 CPU cores and 192 MICs).Comment: 11 pages, 2 figure

    Memristor nanodevice for unconventional computing:review and applications

    Full text link
    A memristor is a two-terminal nanodevice that its properties attract a wide community of researchers from various domains such as physics, chemistry, electronics, computer and neuroscience.The simple structure for manufacturing, small scalability, nonvolatility and potential of using inlow power platforms are outstanding characteristics of this emerging nanodevice. In this report,we review a brief literature of memristor from mathematic model to the physical realization. Wediscuss different classes of memristors based on the material used for its manufacturing. Thepotential applications of memristor are presented and a wide domain of applications are explainedand classified

    A Survey of Neuromorphic Computing and Neural Networks in Hardware

    Full text link
    Neuromorphic computing has come to refer to a variety of brain-inspired computers, devices, and models that contrast the pervasive von Neumann computer architecture. This biologically inspired approach has created highly connected synthetic neurons and synapses that can be used to model neuroscience theories as well as solve challenging machine learning problems. The promise of the technology is to create a brain-like ability to learn and adapt, but the technical challenges are significant, starting with an accurate neuroscience model of how the brain works, to finding materials and engineering breakthroughs to build devices to support these models, to creating a programming framework so the systems can learn, to creating applications with brain-like capabilities. In this work, we provide a comprehensive survey of the research and motivations for neuromorphic computing over its history. We begin with a 35-year review of the motivations and drivers of neuromorphic computing, then look at the major research areas of the field, which we define as neuro-inspired models, algorithms and learning approaches, hardware and devices, supporting systems, and finally applications. We conclude with a broad discussion on the major research topics that need to be addressed in the coming years to see the promise of neuromorphic computing fulfilled. The goals of this work are to provide an exhaustive review of the research conducted in neuromorphic computing since the inception of the term, and to motivate further work by illuminating gaps in the field where new research is needed
    • …
    corecore