263 research outputs found
Distributed deep learning inference in fog networks
Today's smart devices are equipped with powerful integrated chips and built-in heterogeneous sensors that can leverage their potential to execute heavy computation and produce a large amount of sensor data. For instance, modern smart cameras integrate artificial intelligence to capture images that detect any objects in the scene and change parameters, such as contrast and color based on environmental conditions. The accuracy of the object recognition and classification achieved by intelligent applications has improved due to recent advancements in artificial intelligence (AI) and machine learning (ML), particularly, deep neural networks (DNNs).
Despite the capability to carry out some AI/ML computation, smart devices have limited battery power and computing resources. Therefore, DNN computation is generally offloaded to powerful computing nodes such as cloud servers. However, it is challenging to satisfy latency, reliability, and bandwidth constraints in cloud-based AI. Thus, in recent years, AI services and tasks have been pushed closer to the end-users by taking advantage of the fog computing paradigm to meet these requirements. Generally, the trained DNN models are offloaded to the fog devices for DNN inference. This is accomplished by partitioning the DNN and distributing the computation in fog networks.
This thesis addresses offloading DNN inference by dividing and distributing a pre-trained network onto heterogeneous embedded devices. Specifically, it implements the adaptive partitioning and offloading algorithm based on matching theory proposed in an article, titled "Distributed inference acceleration with adaptive dnn partitioning and offloading". The implementation was evaluated in a fog testbed, including Nvidia Jetson nano devices. The obtained results show that the adaptive solution outperforms other schemes (Random and Greedy) with respect to computation time and communication latency
AdaMEC: Towards a Context-Adaptive and Dynamically-Combinable DNN Deployment Framework for Mobile Edge Computing
With the rapid development of deep learning, recent research on intelligent
and interactive mobile applications (e.g., health monitoring, speech
recognition) has attracted extensive attention. And these applications
necessitate the mobile edge computing scheme, i.e., offloading partial
computation from mobile devices to edge devices for inference acceleration and
transmission load reduction. The current practices have relied on collaborative
DNN partition and offloading to satisfy the predefined latency requirements,
which is intractable to adapt to the dynamic deployment context at runtime.
AdaMEC, a context-adaptive and dynamically-combinable DNN deployment framework
is proposed to meet these requirements for mobile edge computing, which
consists of three novel techniques. First, once-for-all DNN pre-partition
divides DNN at the primitive operator level and stores partitioned modules into
executable files, defined as pre-partitioned DNN atoms. Second,
context-adaptive DNN atom combination and offloading introduces a graph-based
decision algorithm to quickly search the suitable combination of atoms and
adaptively make the offloading plan under dynamic deployment contexts. Third,
runtime latency predictor provides timely latency feedback for DNN deployment
considering both DNN configurations and dynamic contexts. Extensive experiments
demonstrate that AdaMEC outperforms state-of-the-art baselines in terms of
latency reduction by up to 62.14% and average memory saving by 55.21%
Privacy-preserving Security Inference Towards Cloud-Edge Collaborative Using Differential Privacy
Cloud-edge collaborative inference approach splits deep neural networks
(DNNs) into two parts that run collaboratively on resource-constrained edge
devices and cloud servers, aiming at minimizing inference latency and
protecting data privacy. However, even if the raw input data from edge devices
is not directly exposed to the cloud, state-of-the-art attacks targeting
collaborative inference are still able to reconstruct the raw private data from
the intermediate outputs of the exposed local models, introducing serious
privacy risks. In this paper, a secure privacy inference framework for
cloud-edge collaboration is proposed, termed CIS, which supports adaptively
partitioning the network according to the dynamically changing network
bandwidth and fully releases the computational power of edge devices. To
mitigate the influence introduced by private perturbation, CIS provides a way
to achieve differential privacy protection by adding refined noise to the
intermediate layer feature maps offloaded to the cloud. Meanwhile, with a given
total privacy budget, the budget is reasonably allocated by the size of the
feature graph rank generated by different convolution filters, which makes the
inference in the cloud robust to the perturbed data, thus effectively trade-off
the conflicting problem between privacy and availability. Finally, we construct
a real cloud-edge collaborative inference computing scenario to verify the
effectiveness of inference latency and model partitioning on
resource-constrained edge devices. Furthermore, the state-of-the-art cloud-edge
collaborative reconstruction attack is used to evaluate the practical
availability of the end-to-end privacy protection mechanism provided by CIS
- …