4 research outputs found

    Optimization of Deep Neural Networks Using SoCs with OpenCL

    Full text link
    [EN] In the optimization of deep neural networks (DNNs) via evolutionary algorithms (EAs) and the implementation of the training necessary for the creation of the objective function, there is often a trade-off between efficiency and flexibility. Pure software solutions implemented on general-purpose processors tend to be slow because they do not take advantage of the inherent parallelism of these devices, whereas hardware realizations based on heterogeneous platforms (combining central processing units (CPUs), graphics processing units (GPUs) and/or field-programmable gate arrays (FPGAs)) are designed based on different solutions using methodologies supported by different languages and using very different implementation criteria. This paper first presents a study that demonstrates the need for a heterogeneous (CPU-GPU-FPGA) platform to accelerate the optimization of artificial neural networks (ANNs) using genetic algorithms. Second, the paper presents implementations of the calculations related to the individuals evaluated in such an algorithm on different (CPU- and FPGA-based) platforms, but with the same source files written in OpenCL. The implementation of individuals on remote, low-cost FPGA systems on a chip (SoCs) is found to enable the achievement of good efficiency in terms of performance per watt.This research was funded by Spanish Agency of Research grant number FPA2016-78595-C3-3-R.Gadea Gironés, R.; Colom Palero, RJ.; Herrero Bosch, V. (2018). Optimization of Deep Neural Networks Using SoCs with OpenCL. Sensors. 18(5). https://doi.org/10.3390/s18051384S18

    An Architecture for Configuring an Efficient Scan Path for a Subset of Elements

    Get PDF
    LaTeX4Web 1.4 OUTPUT Field Programmable Gate Arrays (FPGAs) have many modern applications. A feature of FPGAs is that they can be reconfigured to suit the computation. One such form of reconfiguration, called partial reconfiguration (PR), allows part of the chip to be altered. The smallest part that can be reconfigured is called a frame. To reconfigure a frame, a fixed number of configuration bits are input (typically from outside) to the frame. Thus PR involves (a) selecting a subset C Í S of k out of n frames to configure and (b) inputting the configuration bits for these k frames. The, recently proposed, MU-Decoder has made it possible to select the subset C quickly. This thesis involves mechanisms to input the configuration bits to the selected frames. Specifically, we propose a class of architectures that, for any subset C Í S (set of frames), constructs a path connecting only the k frames of C through which the configuration bits can be scanned in. We introduce a Basic Network that runs in Q (k log n) time, where k is the number of frames selected out of the total number n of available frames; we assume the number of configuration bits per frame is constant. The Basic Network does not exploit any locality or other structure in the subset of frames selected. We show that for certain structures (such as frames that are relatively close to each other) the speed of reconfiguration can be improved. We introduce an addition to the Basic Network that suggests the fastest clock speed that can be employed for a given set of frames. This enhancement decreases configuration time to O(k log k) for certain cases. We then introduce a second enhancement, called shortcuts, that for certain cases reduces the time to an optimal O(k). All the proposed architectures require an optimal Q(n) number of gates. We implement our networks on the CAD tools and show that the theoretical predictions are a good reflection of the network¢s performance. Our work, although directed to FPGAs, may also apply to other applications; for example hardware testing and novel memory accesses

    Power efficient and power attacks resistant system design and analysis using aggressive scaling with timing speculation

    Get PDF
    Growing usage of smart and portable electronic devices demands embedded system designers to provide solutions with better performance and reduced power consumption. Due to the new development of IoT and embedded systems usage, not only power and performance of these devices but also security of them is becoming an important design constraint. In this work, a novel aggressive scaling based on timing speculation is proposed to overcome the drawbacks of traditional DVFS and provide security from power analysis attacks at the same time. Dynamic voltage and frequency scaling (DVFS) is proven to be the most suitable technique for power efficiency in processor designs. Due to its promising benefits, the technique is still getting researchers attention to trade off power and performance of modern processor designs. The issues of traditional DVFS are: 1) Due to its pre-calculated operating points, the system is not able to suit to modern process variations. 2) Since Process Voltage and Temperature (PVT) variations are not considered, large timing margins are added to guarantee a safe operation in the presence of variations. The research work presented here addresses these issues by employing aggressive scaling mechanisms to achieve more power savings with increased performance. This approach uses in-situ timing error monitoring and recovering mechanisms to reduce extra timing margins and to account for process variations. A novel timing error detection and correction mechanism, to achieve more power savings or high performance, is presented. This novel technique has also been shown to improve security of processors against differential power analysis attacks technique. Differential power analysis attacks can extract secret information from embedded systems without knowing much details about the internal architecture of the device. Simulated and experimental data show that the novel technique can provide a performance improvement of 24% or power savings of 44% while occupying less area and power overhead. Overall, the proposed aggressive scaling technique provides an improvement in power consumption and performance while increasing the security of processors from power analysis attacks.N/
    corecore