6,919 research outputs found
An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics
Near-sensor data analytics is a promising direction for IoT endpoints, as it
minimizes energy spent on communication and reduces network load - but it also
poses security concerns, as valuable data is stored or sent over the network at
various stages of the analytics pipeline. Using encryption to protect sensitive
data at the boundary of the on-chip analytics engine is a way to address data
security issues. To cope with the combined workload of analytics and encryption
in a tight power envelope, we propose Fulmine, a System-on-Chip based on a
tightly-coupled multi-core cluster augmented with specialized blocks for
compute-intensive data processing and encryption functions, supporting software
programmability for regular computing tasks. The Fulmine SoC, fabricated in
65nm technology, consumes less than 20mW on average at 0.8V achieving an
efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to
25MIPS/mW in software. As a strong argument for real-life flexible application
of our platform, we show experimental results for three secure analytics use
cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN
consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with
secured remote recognition in 5.74pJ/op; and seizure detection with encrypted
data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE
Transactions on Circuits and Systems - I: Regular Paper
HSTREAM: A directive-based language extension for heterogeneous stream computing
Big data streaming applications require utilization of heterogeneous parallel
computing systems, which may comprise multiple multi-core CPUs and many-core
accelerating devices such as NVIDIA GPUs and Intel Xeon Phis. Programming such
systems require advanced knowledge of several hardware architectures and
device-specific programming models, including OpenMP and CUDA. In this paper,
we present HSTREAM, a compiler directive-based language extension to support
programming stream computing applications for heterogeneous parallel computing
systems. HSTREAM source-to-source compiler aims to increase the programming
productivity by enabling programmers to annotate the parallel regions for
heterogeneous execution and generate target specific code. The HSTREAM runtime
automatically distributes the workload across CPUs and accelerating devices. We
demonstrate the usefulness of HSTREAM language extension with various
applications from the STREAM benchmark. Experimental evaluation results show
that HSTREAM can keep the same programming simplicity as OpenMP, and the
generated code can deliver performance beyond what CPUs-only and GPUs-only
executions can deliver.Comment: Preprint, 21st IEEE International Conference on Computational Science
and Engineering (CSE 2018
GPU-based Real-time Triggering in the NA62 Experiment
Over the last few years the GPGPU (General-Purpose computing on Graphics
Processing Units) paradigm represented a remarkable development in the world of
computing. Computing for High-Energy Physics is no exception: several works
have demonstrated the effectiveness of the integration of GPU-based systems in
high level trigger of different experiments. On the other hand the use of GPUs
in the low level trigger systems, characterized by stringent real-time
constraints, such as tight time budget and high throughput, poses several
challenges. In this paper we focus on the low level trigger in the CERN NA62
experiment, investigating the use of real-time computing on GPUs in this
synchronous system. Our approach aimed at harvesting the GPU computing power to
build in real-time refined physics-related trigger primitives for the RICH
detector, as the the knowledge of Cerenkov rings parameters allows to build
stringent conditions for data selection at trigger level. Latencies of all
components of the trigger chain have been analyzed, pointing out that
networking is the most critical one. To keep the latency of data transfer task
under control, we devised NaNet, an FPGA-based PCIe Network Interface Card
(NIC) with GPUDirect capabilities. For the processing task, we developed
specific multiple ring trigger algorithms to leverage the parallel architecture
of GPUs and increase the processing throughput to keep up with the high event
rate. Results obtained during the first months of 2016 NA62 run are presented
and discussed
- …