5,243 research outputs found

    An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

    Full text link
    Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data security issues. To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supporting software programmability for regular computing tasks. The Fulmine SoC, fabricated in 65nm technology, consumes less than 20mW on average at 0.8V achieving an efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to 25MIPS/mW in software. As a strong argument for real-life flexible application of our platform, we show experimental results for three secure analytics use cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74pJ/op; and seizure detection with encrypted data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE Transactions on Circuits and Systems - I: Regular Paper

    First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

    Full text link
    Recent innovations focused around {\em parallel} processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA's Tesla Graphics Processing Unit (GPU) or Intel's \xphi, in the High Level Trigger. These accelerators have the potential to provide faster or more energy efficient event selection, thus opening up possibilities for new complex triggers that were not previously feasible. At the same time, it is crucial to explore the performance limits achievable on the latest generation multicore CPUs with the use of the best software optimization methods. In this article, a new tracking algorithm based on the Hough transform will be evaluated for the first time on a multi-core Intel Xeon E5-2697v2 CPU, an NVIDIA Tesla K20c GPU, and an Intel \xphi\ 7120 coprocessor. Preliminary time performance will be presented.Comment: 13 pages, 4 figures, Accepted to JINS

    Transparent control flow transfer between CPU and Intel FPGAs

    Get PDF
    The possibility of accelerating software using dedicated hardware is one of the advantages that heterogeneous computing platforms provide, and can greatly increase computing efficiency. Despite being more efficient, applications need to be rewritten to effectively exploit the dedicated accelerators, using a complex and tedious process.The goal of this work is to research transparent mechanisms that would allow the transfer of the control flow between a CPU and an FPGA, which houses an accelerator. Such a mechanism could be coupled with transparent software profiling and translation to hardware, which would allow for regular software to take advantage of hardware acceleration in a transparent manner. Implementation is done over Intel's new Xeon+FPGA hybrid platform which combines a Xeon processor and an Arria 10 FPGA in the same package while sharing the main system memory.A prototype was achieved and tested on real hardware, in which the Linux ptrace system call is used to control the process to be accelerated and transfer its execution between the CPU and the FPGA. An AES encryption kernel was accelerated and speedups of 8x were recorded when large data sizes were used
    corecore