398 research outputs found
Edge enhanced deep learning system for large-scale video stream analytics.
Applying deep learning models to large-scale IoT data is a compute-intensive task and needs significant computational resources. Existing approaches transfer this big data from IoT devices to a central cloud where inference is performed using a machine learning model. However, the network connecting the data capture source and the cloud platform can become a bottleneck. We address this problem by distributing the deep learning pipeline across edge and cloudlet/fog resources. The basic processing stages and trained models are distributed towards the edge of the network and on in-transit and cloud resources. The proposed approach performs initial processing of the data close to the data source at edge and fog nodes, resulting in significant reduction in the data that is transferred and stored in the cloud. Results on an object recognition scenario show 71\% efficiency gain in the throughput of the system by employing a combination of edge, in-transit and cloud resources when compared to a cloud-only approach.N/
Orchestrating Service Migration for Low Power MEC-Enabled IoT Devices
Multi-Access Edge Computing (MEC) is a key enabling technology for Fifth
Generation (5G) mobile networks. MEC facilitates distributed cloud computing
capabilities and information technology service environment for applications
and services at the edges of mobile networks. This architectural modification
serves to reduce congestion, latency, and improve the performance of such edge
colocated applications and devices. In this paper, we demonstrate how reactive
service migration can be orchestrated for low-power MEC-enabled Internet of
Things (IoT) devices. Here, we use open-source Kubernetes as container
orchestration system. Our demo is based on traditional client-server system
from user equipment (UE) over Long Term Evolution (LTE) to the MEC server. As
the use case scenario, we post-process live video received over web real-time
communication (WebRTC). Next, we integrate orchestration by Kubernetes with S1
handovers, demonstrating MEC-based software defined network (SDN). Now, edge
applications may reactively follow the UE within the radio access network
(RAN), expediting low-latency. The collected data is used to analyze the
benefits of the low-power MEC-enabled IoT device scheme, in which end-to-end
(E2E) latency and power requirements of the UE are improved. We further discuss
the challenges of implementing such schemes and future research directions
therein
Demo: An experimental environment based on mini-PCs for federated learning research
There is a growing research interest in Federated Learning (FL), a promising approach for data privacy preservation and proximity of training to the network edge, where data is generated. Resource consumption for Machine Learning (ML) training and inference is important for edge nodes, but most of the proposed protocols and algorithms for FL are evaluated by simulations. In this demo paper, we present an environment based on distributed mini-PCs to enable experimental study of FL protocols and algorithms. We have installed low-capacity mini-PCs within a wireless city-level mesh network and deployed container-based FL components on these nodes. We show the deployed FL clients and server at different nodes in the city and demonstrate how an FL experiment can be set and run in a real environment.This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871582 — NGIatlantic.eu and was partially supported by the Spanish Government under contracts PID2019-106774RB-C21, PCI2019-111851-2 (LeadingEdge CHIST-ERA), PCI2019-111850-2 (DiPET CHIST-ERA). The work of C.-H. Liu was supported in part by the U.S. National Science Foundation (NSF) under Award CNS-2006453 and in part by Mississippi State University under Grant ORED 253551-060702. The work of L. Wei is supported in part by the U.S. National Science Foundation (#2150486 and #2006612). I Koutsopoulos acknowledges support from the CHIST-ERA grant CHIST-ERA-18-SDCDN-004 (GSRI grant number T11EPA4-00056).Peer ReviewedPostprint (author's final draft
BePOCH: Improving federated learning performance in resource-constrained computing devices
Inference with trained machine learning models is now possible with small computing devices while only a few years ago it was run mostly in the cloud only. The recent technique of Federated Learning offers now a way to do also the training of the machine learning models on small devices by distributing the computing effort needed for the training over many distributed machines. But, the training on these low-capacity devices takes a long time and often consumes all the available CPU resource of the device. Therefore, for Federated Learning to be done by low-capacity devices in practical environments, the training process must not only target for the highest accuracy, but also on reducing the training time and the resource consumption. In this paper, we present an approach which uses a dynamic epoch parameter in the model training. We propose the BePOCH (Best Epoch) algorithm to identify what is the best number of epochs per training round in Federated Learning. We show in experiments with medical datasets how with the BePOCH suggested number of epochs, the training time and resource consumption decreases while keeping the level of accuracy. Thus, BePOCH makes machine learning model training on low-capacity devices more feasible and furthermore, decreases the overall resource consumption of the training process, which is an important asnect towards greener machine learning techniques.This work was partially funded by the Spanish Government under contracts PID2019-106774RB-C21, PCI2019-111850- 2 (DiPET CHIST-ERA), PCI2019-111851-2 (LeadingEdge CHIST-ERA), and the Generalitat de Catalunya as Consolidated Research Group 2017-SGR-990. Suport was given also by the Agency for Electronic Communications (AEK) of North Macedonia.Peer ReviewedPostprint (author's final draft
Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform
Advances in detectors and computational technologies provide new
opportunities for applied research and the fundamental sciences. Concurrently,
dramatic increases in the three Vs (Volume, Velocity, and Variety) of
experimental data and the scale of computational tasks produced the demand for
new real-time processing systems at experimental facilities. Recently, this
demand was addressed by the Spark-MPI approach connecting the Spark
data-intensive platform with the MPI high-performance framework. In contrast
with existing data management and analytics systems, Spark introduced a new
middleware based on resilient distributed datasets (RDDs), which decoupled
various data sources from high-level processing algorithms. The RDD middleware
significantly advanced the scope of data-intensive applications, spreading from
SQL queries to machine learning to graph processing. Spark-MPI further extended
the Spark ecosystem with the MPI applications using the Process Management
Interface. The paper explores this integrated platform within the context of
online ptychographic and tomographic reconstruction pipelines.Comment: New York Scientific Data Summit, August 6-9, 201
Hyper: Distributed Cloud Processing for Large-Scale Deep Learning Tasks
Training and deploying deep learning models in real-world applications
require processing large amounts of data. This is a challenging task when the
amount of data grows to a hundred terabytes, or even, petabyte-scale. We
introduce a hybrid distributed cloud framework with a unified view to multiple
clouds and an on-premise infrastructure for processing tasks using both CPU and
GPU compute instances at scale. The system implements a distributed file system
and failure-tolerant task processing scheduler, independent of the language and
Deep Learning framework used. It allows to utilize unstable cheap resources on
the cloud to significantly reduce costs. We demonstrate the scalability of the
framework on running pre-processing, distributed training, hyperparameter
search and large-scale inference tasks utilizing 10,000 CPU cores and 300 GPU
instances with the overall processing power of 30 petaflops
- …