16 research outputs found

    A Hardware Perspective on the ChaCha Ciphers: Scalable Chacha8/12/20 Implementations Ranging from 476 Slices to Bitrates of 175 Gbit/s

    Get PDF
    AES (Advanced Encryption Standard) accelerators are commonly used in high-throughput applications, but they have notable resource requirements. We investigate replacing the AES cipher with ChaCha ciphers and propose the first ChaCha FPGA implementations optimized for data throughput. In consequence, we compare implementations of three different system architectures and analyze which aspects dominate the performance of those.Our experimental results indicate that a bandwidth of 175 Gbit/s can be reached with as little as 2982 slices, whereas comparable state of the art AES accelerators require 10 times as many slices. Taking advantage of the flexibility inherent in the ChaCha cipher, we also demonstrate how our implementation scales to even higher throughputs or lower resource usage (down to 476 slices), benefiting applications which previously could not employ cryptography because of resource limitations

    EFFECT: An End-to-End Framework for Evaluating Strategies for Parallel AI Anomaly Detection

    Get PDF
    Neural networks achieve high accuracy in tasks like image recognition or segmentation. However, their application in safety-critical domains is limited due to their black-box nature and vulnerability to specific types of attacks. To mitigate this, methods detecting out-of-distribution or adversarial attacks in parallel to the network inference were introduced. These methods are hard to compare because they were developed for different use cases, datasets, and networks. To fill this gap, we introduce EFFECT, an end-to-end framework to evaluate and compare new methods for anomaly detection, without the need for retraining and by using traces of intermediate inference results. The presented workflow works with every preexisting neural network architecture and evaluates the considered anomaly detection methods in terms of accuracy and computational complexity. We demonstrate EFFECT\u27s capabilities, by creating new detectors for ShuffleNet and MobileNetV2 for anomaly detection as well as fault origin detection. EFFECT allows us to design an anomaly detector, based on the Mahalanobis distance as well as CNN based detectors. For both use cases, we achieve accuracies of over 85 %, classifying inferences as normal or abnormal, and thus beating existing methods

    CNNParted: An open source framework for efficient Convolutional Neural Network inference partitioning in embedded systems

    Get PDF
    Applications such as autonomous driving or assistive robotics heavily rely on the usage of Deep Neural Networks. In particular, Convolutional Neural Networks (CNNs) provide precise and reliable results in image processing tasks like camera-based object detection or semantic segmentation. However, to achieve even better results, CNNs are becoming more and more complex. Deploying these networks in distributed embedded systems thereby imposes new challenges, due to additional constraints regarding performance and energy consumption in the near-sensor compute platforms, i.e. the sensor nodes. Processing all data in the central node, however, is disadvantageous since raw data of camera consumes large bandwidth and running CNN inference of multiple tasks requires certain performance. Moreover, sending raw data over the interconnect is not advisable for privacy reasons. Hence, offloading CNN workload to the sensor nodes in the system can lead to reduced traffic on the link and a higher level of data security. However, due to the limited hardware-resources on the sensor nodes, partitioning CNNs has to be done carefully to meet overall latency requirements and energy constraints. Therefore, we present CNNParted, an open-source framework for efficient, hardware-aware CNN inference partitioning targeting embedded AI applications. It automatically searches for potential partitioning points in the CNN to find a beneficial workload distribution between sensor nodes and a central edge node. Thereby, CNNParted not only analyzes the CNN architecture but also takes hardware components, such as dedicated hardware accelerators and memories, into consideration to evaluate inference partitioning regarding latency and energy consumption. Exemplary, we apply CNNParted to three commonly used feed forward CNNs in embedded systems. Thereby, the framework first searches for several potential partitioning points and then evaluates the latter regarding inference latency and energy consumption. Based on the results, beneficial partitioning points can be identified depending on the system constraints. Using the framework, we are able to find and evaluate 10 potential partitioning points for FCN ResNet-50, 13 partitioning points for GoogLeNet, and 8 partitioning points for SqueezeNet V1.1 within 520 s, 330 s, and 140 s, respectively, on an AMD EPYC 7702P running 8 concurrent threads. For GoogLeNet, we determine two partitioning points that provide a good trade-off between required bandwidth, latency and energy consumption. We also provide insights into further interesting findings that can be derived from the evaluation results

    Message from IEEE SOCC Technical Chairs

    No full text

    Hardware-aware Partitioning of Convolutional Neural Network Inference for Embedded AI Applications

    No full text
    Embedded image processing applications like multicamera-based object detection or semantic segmentation are often based on Convolutional Neural Networks (CNNs) to provide precise and reliable results. The deployment of CNNs in embedded systems, however, imposes additional constraints such as latency restrictions and limited energy consumption in the sensor platform. These requirements have to be considered during hardware/software co-design of embedded Artifical Intelligence (AI) applications. In addition, the transmission of uncompressed image data from the sensor to a central edge node requires large bandwidth on the link, which must also be taken into account during the design phase.Therefore, we present a simulation toolchain for fast evaluation of hardware-aware CNN partitioning for embedded AI applications. This approach explores an efficient workload distribution between sensor nodes and a central edge node. Neither processing all layers close to the sensor nor transmitting all uncompressed raw data to the edge node is an optimal solution for each use case. Hence, our proposed simulation toolchain evaluates power and performance metrics for each reasonable partitioning point in a CNN. In contrast to the state of the art, our approach does not only consider the neural network architecture. In the evaluation, our simulation toolchain additionally takes into account hardware components such as special accelerators and memories that are implemented in the sensor node.Exemplary, we show the simulation results for three commonly used CNNs in embedded systems. Thereby, we identify advantageous partitioning points regarding inference latency and energy consumption. With the support of the toolchain, we are able to identify three beneficial partitioning points for FCN ResNet-50 and two for GoogLeNet as well as for SqueezeNet V1.1

    Towards the on-device Handwriting Trajectory Reconstruction of the Sensor Enhanced Pen

    No full text
    International audiencePerforming handwriting trajectory regression from inertial data using Deep Neural Network (DNN) on an embedded device is a very challenging task, since the network accuracy is prone to imperfections in the weights and needs a significant amount of parameters to be able to regress. In this work, we apply and compare different quantization techniques and Mitchell logarithmic multiplication approximation in order to enable the on-device inference. We show that it is possible to perform the inference of the TCN-based regression model using only 8-bit fixed-point quantization without significant reconstruction precision loss and that the accuracy degradation of the approximate multiplication can be partially compensated with Quantizationaware Training (QAT). Finally, we demonstrate that the compressed models can be integrated into an off-the-shelf commercial Systems-on-Chip with minimal use of FPU and requiring only 460 KB of the ROM size for the TCN-49 configuration

    Towards the on-device Handwriting Trajectory Reconstruction of the Sensor Enhanced Pen

    No full text
    International audiencePerforming handwriting trajectory regression from inertial data using Deep Neural Network (DNN) on an embedded device is a very challenging task, since the network accuracy is prone to imperfections in the weights and needs a significant amount of parameters to be able to regress. In this work, we apply and compare different quantization techniques and Mitchell logarithmic multiplication approximation in order to enable the on-device inference. We show that it is possible to perform the inference of the TCN-based regression model using only 8-bit fixed-point quantization without significant reconstruction precision loss and that the accuracy degradation of the approximate multiplication can be partially compensated with Quantizationaware Training (QAT). Finally, we demonstrate that the compressed models can be integrated into an off-the-shelf commercial Systems-on-Chip with minimal use of FPU and requiring only 460 KB of the ROM size for the TCN-49 configuration
    corecore