1,053 research outputs found
CNNParted: An open source framework for efficient Convolutional Neural Network inference partitioning in embedded systems
Applications such as autonomous driving or assistive robotics heavily rely on the usage of Deep Neural Networks. In particular, Convolutional Neural Networks (CNNs) provide precise and reliable results in image processing tasks like camera-based object detection or semantic segmentation. However, to achieve even better results, CNNs are becoming more and more complex. Deploying these networks in distributed embedded systems thereby imposes new challenges, due to additional constraints regarding performance and energy consumption in the near-sensor compute platforms, i.e. the sensor nodes. Processing all data in the central node, however, is disadvantageous since raw data of camera consumes large bandwidth and running CNN inference of multiple tasks requires certain performance. Moreover, sending raw data over the interconnect is not advisable for privacy reasons. Hence, offloading CNN workload to the sensor nodes in the system can lead to reduced traffic on the link and a higher level of data security. However, due to the limited hardware-resources on the sensor nodes, partitioning CNNs has to be done carefully to meet overall latency requirements and energy constraints. Therefore, we present CNNParted, an open-source framework for efficient, hardware-aware CNN inference partitioning targeting embedded AI applications. It automatically searches for potential partitioning points in the CNN to find a beneficial workload distribution between sensor nodes and a central edge node. Thereby, CNNParted not only analyzes the CNN architecture but also takes hardware components, such as dedicated hardware accelerators and memories, into consideration to evaluate inference partitioning regarding latency and energy consumption. Exemplary, we apply CNNParted to three commonly used feed forward CNNs in embedded systems. Thereby, the framework first searches for several potential partitioning points and then evaluates the latter regarding inference latency and energy consumption. Based on the results, beneficial partitioning points can be identified depending on the system constraints. Using the framework, we are able to find and evaluate 10 potential partitioning points for FCN ResNet-50, 13 partitioning points for GoogLeNet, and 8 partitioning points for SqueezeNet V1.1 within 520 s, 330 s, and 140 s, respectively, on an AMD EPYC 7702P running 8 concurrent threads. For GoogLeNet, we determine two partitioning points that provide a good trade-off between required bandwidth, latency and energy consumption. We also provide insights into further interesting findings that can be derived from the evaluation results
Horizontally distributed inference of deep neural networks for AI-enabled IoT
Motivated by the pervasiveness of artificial intelligence (AI) and the Internet of Things (IoT) in the current “smart everything” scenario, this article provides a comprehensive overview of the most recent research at the intersection of both domains, focusing on the design and development of specific mechanisms for enabling a collaborative inference across edge devices towards the in situ execution of highly complex state-of-the-art deep neural networks (DNNs), despite the resource-constrained nature of such infrastructures. In particular, the review discusses the most salient approaches conceived along those lines, elaborating on the specificities of the partitioning schemes and the parallelism paradigms explored, providing an organized and schematic discussion of the underlying workflows and associated communication patterns, as well as the architectural aspects of the DNNs that have driven the design of such techniques, while also highlighting both the primary challenges encountered at the design and operational levels and the specific adjustments or enhancements explored in response to them.Agencia Estatal de Investigación | Ref. DPI2017-87494-RMinisterio de Ciencia e Innovación | Ref. PDC2021-121644-I00Xunta de Galicia | Ref. ED431C 2022/03-GR
Edge Video Analytics: A Survey on Applications, Systems and Enabling Techniques
Video, as a key driver in the global explosion of digital information, can
create tremendous benefits for human society. Governments and enterprises are
deploying innumerable cameras for a variety of applications, e.g., law
enforcement, emergency management, traffic control, and security surveillance,
all facilitated by video analytics (VA). This trend is spurred by the rapid
advancement of deep learning (DL), which enables more precise models for object
classification, detection, and tracking. Meanwhile, with the proliferation of
Internet-connected devices, massive amounts of data are generated daily,
overwhelming the cloud. Edge computing, an emerging paradigm that moves
workloads and services from the network core to the network edge, has been
widely recognized as a promising solution. The resulting new intersection, edge
video analytics (EVA), begins to attract widespread attention. Nevertheless,
only a few loosely-related surveys exist on this topic. The basic concepts of
EVA (e.g., definition, architectures) were not fully elucidated due to the
rapid development of this domain. To fill these gaps, we provide a
comprehensive survey of the recent efforts on EVA. In this paper, we first
review the fundamentals of edge computing, followed by an overview of VA. The
EVA system and its enabling techniques are discussed next. In addition, we
introduce prevalent frameworks and datasets to aid future researchers in the
development of EVA systems. Finally, we discuss existing challenges and foresee
future research directions. We believe this survey will help readers comprehend
the relationship between VA and edge computing, and spark new ideas on EVA.Comment: 31 pages, 13 figure
Can Differential Privacy Practically Protect Collaborative Deep Learning Inference for the Internet of Things?
Collaborative inference has recently emerged as an attractive framework for
applying deep learning to Internet of Things (IoT) applications by splitting a
DNN model into several subpart models among resource-constrained IoT devices
and the cloud. However, the reconstruction attack was proposed recently to
recover the original input image from intermediate outputs that can be
collected from local models in collaborative inference. For addressing such
privacy issues, a promising technique is to adopt differential privacy so that
the intermediate outputs are protected with a small accuracy loss. In this
paper, we provide the first systematic study to reveal insights regarding the
effectiveness of differential privacy for collaborative inference against the
reconstruction attack. We specifically explore the privacy-accuracy trade-offs
for three collaborative inference models with four datasets (SVHN, GTSRB,
STL-10, and CIFAR-10). Our experimental analysis demonstrates that differential
privacy can practically be applied to collaborative inference when a dataset
has small intra-class variations in appearance. With the (empirically)
optimized privacy budget parameter in our study, the differential privacy
technique incurs accuracy loss of 0.476%, 2.066%, 5.021%, and 12.454% on SVHN,
GTSRB, STL-10, and CIFAR-10 datasets, respectively, while thwarting the
reconstruction attack.Comment: Accepted in Wireless Network
- …