4 research outputs found

    Accelerating Stochastic Random Projection Neural Networks

    Get PDF
    Artificial Neural Network (ANN), a computational model based on the biological neural networks, has a recent resurgence in machine intelligence with breakthrough results in pattern recognition, speech recognition, and mapping. This has led to a growing interest in designing dedicated hardware substrates for ANNs with a goal of achieving energy efficiency, high network connectivity and better computational capabilities that are typically not optimized in software ANN stack. Using stochastic computing is a natural choice to reduce the total system energy, where a signal is expressed through the statistical distribution of the logical values as a random bit stream. Generally, the accuracy of these systems is correlated with the stochastic bit stream length and requires long compute times. In this work, a framework is proposed to accelerate the long compute times in stochastic ANNs. A GPU acceleration framework has been developed to validate two random projection networks to test the efficacy of these networks prior to custom hardware design. The networks are stochastic extreme learning machine, a supervised feed-forward neural network and stochastic echo state network, a recurrent neural network with online learning. The framework also provisions identifying optimal values for various network parameters like learning rate, number of hidden layers and stochastic number length. The proposed stochastic extreme learning machine design is validated for two standardized datasets, MNIST dataset and orthopedic dataset. The proposed stochastic echo state network is validated on the time series EEG dataset. The CPU models were developed for each of these networks to calculate the relative performance boost. The design knobs for performance boost include stochastic bit stream generation, activation function, reservoir layer and training unit of the networks. Proposed stochastic extreme learning machine and stochastic echo state network achieved a performance boost of 60.61x for Orthopedic dataset and 42.03x for EEG dataset with 2^12 bit stream length when tested on an Nvidia GeForce1050 Ti

    Comparative study of tool-flows for rapid prototyping of software-defined radio digital signal processing

    Get PDF
    This dissertation is a comparative study of tool-flows for rapid prototyping of SDR DSP operations on programmable hardware platforms. The study is divided into two parts, focusing on high-level tool-flows for implementing SDR DSP operations on FPGA and GPU platforms respectively. In this dissertation, the term ‘tool-flow’ refers to a tool or a chain of tools that facilitate the mapping of an application description specified in a programming language into one or more programmable hardware platforms. High-level tool-flows use different techniques, such as high-level synthesis to allow the designer to specify the application from a high level of abstraction and achieve improved productivity without significant degradation in the design’s performance. SDR is an emerging communications technology that is driven by - among other factors – increasing demands for high-speed, interoperable and versatile communications systems. The key idea in SDR is the need to implement as many as possible of the radio functions that were traditionally defined in fixed hardware, in software on programmable hardware processors instead. The most commonly used processors are based on complex parallel computing architectures in order to support the high-speed processing demands of SDR applications, and they include FPGAs, GPUs and multicore general-purpose processors (GPPs) and DSPs. The architectural complexity of these processors results in a corresponding increase in programming methodologies which however impedes their wider adoption in suitable applications domains, including SDR DSP. In an effort to address this, a plethora of different high-level tool-flows have been developed. Several comparative studies of these tool-flows have been done to help – among other benefits – designers in choosing high-level tools to use. However, there are few studies that focus on SDR DSP operations, and most existing comparative studies are not based on well-defined comparison criteria. The approach implemented in this dissertation is to use a system engineering design process, firstly, to define the qualitative comparison criteria in the form of a specification for an ideal high-level SDR DSP tool-flow and, secondly, to implement a FIR filter case study in each of the tool-flows to enable a quantitative comparison in terms of programming effort and performance. The study considers Migen- and MyHDL-based open-source tool-flows for FPGA targets, and CUDA and Open Computing Language (OpenCL) for GPU targets. The ideal high-level SDR DSP tool-flow specification was defined and used to conduct a comparative study of the tools across three main design categories, which included high-level modelling, verification and implementation. For tool-flows targeting GPU platforms, the FIR case study was implemented using each of the tools; it was compiled, executed on a GPU server consisting of 2 GTX Titan-X GPUs and an Intel Core i7 GPP, and lastly profiled. The tools were moreover compared in terms of programming effort, memory transfers cost and overall operation time. With regard to tool-flows with FPGA targets, the FIR case study was developed by using each tool, and then implemented on a Xilinx 7 FPGA and compared in terms of programming effort, logic utilization and timing performance

    Understanding NVIDIA GPGPU Hardware

    No full text
    This chapter presents NVIDIA general purpose graphical processing unit (GPGPU) architecture, by detailing both hardware and software concepts. The evolution of GPGPUs from the beginning to the most modern GPGPUs is presented in order to illustrate the trends that motivate the changes that occurred during this evolution. This allows us to anticipate future changes as well as to identify the stable features on which programmers can rely. This chapter starts with a brief history of these chips, then details architectural elements such as the GPGPU core structuration, the memory hierarchy and the hardware scheduling. Software concepts are also presented such as thread organization and correct usage of scheduling

    Understanding NVIDIA GPGPU Hardware

    No full text
    corecore