55,646 research outputs found
A Random Greedy based Design Time Tool for AI Applications Component Placement and Resource Selection in Computing Continua
Artificial Intelligence (AI) and Deep Learning (DL) are pervasive today, with applications spanning from personal assistants to healthcare. Nowadays, the accelerated migration towards mobile computing and Internet of Things, where a huge amount of data is generated by widespread end devices, is determining the rise of the edge computing paradigm, where computing resources are distributed among devices with highly heterogeneous capacities. In this fragmented scenario, efficient component placement and resource allocation algorithms are crucial to orchestrate at best the computing continuum resources. In this paper, we propose a tool to effectively address the component placement problem for AI applications at design time. Through a randomized greedy algorithm, our approach identifies the placement of minimum cost providing performance guarantees across heterogeneous resources including edge devices, cloud GPU-based Virtual Machines and Function as a Service solutions. Finally, we compare the random greedy method with the HyperOpt framework and demonstrate that our proposed approach converges to a near-optimal solution much faster, especially in large scale systems
NetGPT: A Native-AI Network Architecture Beyond Provisioning Personalized Generative Services
Large language models (LLMs) have triggered tremendous success to empower
daily life by generative information, and the personalization of LLMs could
further contribute to their applications due to better alignment with human
intents. Towards personalized generative services, a collaborative cloud-edge
methodology sounds promising, as it facilitates the effective orchestration of
heterogeneous distributed communication and computing resources. In this
article, after discussing the pros and cons of several candidate cloud-edge
collaboration techniques, we put forward NetGPT to capably deploy appropriate
LLMs at the edge and the cloud in accordance with their computing capacity. In
addition, edge LLMs could efficiently leverage location-based information for
personalized prompt completion, thus benefiting the interaction with cloud
LLMs. After deploying representative open-source LLMs (e.g., GPT-2-base and
LLaMA model) at the edge and the cloud, we present the feasibility of NetGPT on
the basis of low-rank adaptation-based light-weight fine-tuning. Subsequently,
we highlight substantial essential changes required for a native artificial
intelligence (AI) network architecture towards NetGPT, with special emphasis on
deeper integration of communications and computing resources and careful
calibration of logical AI workflow. Furthermore, we demonstrate several
by-product benefits of NetGPT, given edge LLM's astonishing capability to
predict trends and infer intents, which possibly leads to a unified solution
for intelligent network management \& orchestration. In a nutshell, we argue
that NetGPT is a promising native-AI network architecture beyond provisioning
personalized generative services
In-situ Model Downloading to Realize Versatile Edge AI in 6G Mobile Networks
The sixth-generation (6G) mobile networks are expected to feature the
ubiquitous deployment of machine learning and AI algorithms at the network
edge. With rapid advancements in edge AI, the time has come to realize
intelligence downloading onto edge devices (e.g., smartphones and sensors). To
materialize this version, we propose a novel technology in this article, called
in-situ model downloading, that aims to achieve transparent and real-time
replacement of on-device AI models by downloading from an AI library in the
network. Its distinctive feature is the adaptation of downloading to
time-varying situations (e.g., application, location, and time), devices'
heterogeneous storage-and-computing capacities, and channel states. A key
component of the presented framework is a set of techniques that dynamically
compress a downloaded model at the depth-level, parameter-level, or bit-level
to support adaptive model downloading. We further propose a virtualized 6G
network architecture customized for deploying in-situ model downloading with
the key feature of a three-tier (edge, local, and central) AI library.
Furthermore, experiments are conducted to quantify 6G connectivity requirements
and research opportunities pertaining to the proposed technology are discussed.Comment: The paper has been submitted to IEEE for possible publicatio
Distributed Machine Learning through Heterogeneous Edge Systems
Many emerging AI applications request distributed machine learning (ML) among
edge systems (e.g., IoT devices and PCs at the edge of the Internet), where
data cannot be uploaded to a central venue for model training, due to their
large volumes and/or security/privacy concerns. Edge devices are intrinsically
heterogeneous in computing capacity, posing significant challenges to parameter
synchronization for parallel training with the parameter server (PS)
architecture. This paper proposes ADSP, a parameter synchronization scheme for
distributed machine learning (ML) with heterogeneous edge systems. Eliminating
the significant waiting time occurring with existing parameter synchronization
models, the core idea of ADSP is to let faster edge devices continue training,
while committing their model updates at strategically decided intervals. We
design algorithms that decide time points for each worker to commit its model
update, and ensure not only global model convergence but also faster
convergence. Our testbed implementation and experiments show that ADSP
outperforms existing parameter synchronization models significantly in terms of
ML model convergence time, scalability and adaptability to large heterogeneity.Comment: Copyright 2020, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserve
High performance platform to detect faults in the Smart Grid by Artificial Intelligence inference
Inferring faults throughout the power grid involves fast calculation, large scale of data, and low latency. Our heterogeneous architecture in the edge offers such high computing performance and throughput using an Artificial Intelligence (AI) core deployed in the Alveo accelerator. In addition, we have described the process of porting standard AI models to Vitis AI and discussed its limitations and possible implications. During validation, we designed and trained some AI models for fast fault detection in Smart Grids. However, the AI framework is standard, and adapting the models to Field Programmable Gate Arrays (FPGA) has demanded a series of transformation processes. Compared with the Graphics Processing Unit platform, our implementation on the FPGA accelerator consumes less energy and achieves lower latency. Finally, our system balances inference accuracy, on-chip resources consumed, computing performance, and throughput. Even with grid data sampling rates as high as 800,000 per second, our hardware architecture can simultaneously process up to 7 data streams.10.13039/501100000780-European Commission (Grant Number: FEDER)
10.13039/501100003086-Eusko Jaurlaritza (Grant Number: ZE-2020/00022 and ZE-2021/00931)
10.13039/100015866-Hezkuntza, Hizkuntza Politika Eta Kultura Saila, Eusko Jaurlaritza (Grant Number: IT1440-22)
10.13039/501100004837-Ministerio de Ciencia e Innovación (Grant Number: IDI-20201264 and IDI-20220543
SMOTEC: An Edge Computing Testbed for Adaptive Smart Mobility Experimentation
Smart mobility becomes paramount for meeting net-zero targets. However,
autonomous, self-driving and electric vehicles require more than ever before an
efficient, resilient and trustworthy computational offloading backbone that
expands throughout the edge-to-cloud continuum. Utilizing on-demand
heterogeneous computational resources for smart mobility is challenging and
often cost-ineffective. This paper introduces SMOTEC, a novel open-source
testbed for adaptive smart mobility experimentation with edge computing. SMOTEC
provides for the first time a modular end-to-end instrumentation for
prototyping and optimizing placement of intelligence services on edge devices
such as augmented reality and real-time traffic monitoring. SMOTEC supports a
plug-and-play Docker container integration of the SUMO simulator for urban
mobility, Raspberry Pi edge devices communicating via ZeroMQ and EPOS for an
AI-based decentralized load balancing across edge-to-cloud. All components are
orchestrated by the K3s lightweight Kubernetes. A proof-of-concept of
self-optimized service placements for traffic monitoring from Munich
demonstrates in practice the applicability and cost-effectiveness of SMOTEC.Comment: 6 pages and 6 figure
Towards Deterministic End-to-end Latency for Medical AI Systems in NVIDIA Holoscan
The introduction of AI and ML technologies into medical devices has
revolutionized healthcare diagnostics and treatments. Medical device
manufacturers are keen to maximize the advantages afforded by AI and ML by
consolidating multiple applications onto a single platform. However, concurrent
execution of several AI applications, each with its own visualization
components, leads to unpredictable end-to-end latency, primarily due to GPU
resource contentions. To mitigate this, manufacturers typically deploy separate
workstations for distinct AI applications, thereby increasing financial,
energy, and maintenance costs. This paper addresses these challenges within the
context of NVIDIA's Holoscan platform, a real-time AI system for streaming
sensor data and images. We propose a system design optimized for heterogeneous
GPU workloads, encompassing both compute and graphics tasks. Our design
leverages CUDA MPS for spatial partitioning of compute workloads and isolates
compute and graphics processing onto separate GPUs. We demonstrate significant
performance improvements across various end-to-end latency determinism metrics
through empirical evaluation with real-world Holoscan medical device
applications. For instance, the proposed design reduces maximum latency by
21-30% and improves latency distribution flatness by 17-25% for up to five
concurrent endoscopy tool tracking AI applications, compared to a single-GPU
baseline. Against a default multi-GPU setup, our optimizations decrease maximum
latency by 35% for up to six concurrent applications by improving GPU
utilization by 42%. This paper provides clear design insights for AI
applications in the edge-computing domain including medical systems, where
performance predictability of concurrent and heterogeneous GPU workloads is a
critical requirement
Optimization and Prediction Techniques for Self-Healing and Self-Learning Applications in a Trustworthy Cloud Continuum
The current IT market is more and more dominated by the “cloud continuum”. In the “traditional” cloud, computing resources are typically homogeneous in order to facilitate economies of scale. In contrast, in edge computing, computational resources are widely diverse, commonly with scarce capacities and must be managed very efficiently due to battery constraints or other limitations. A combination of resources and services at the edge (edge computing), in the core (cloud computing), and along the data path (fog computing) is needed through a trusted cloud continuum. This requires novel solutions for the creation, optimization, management, and automatic operation of such infrastructure through new approaches such as infrastructure as code (IaC). In this paper, we analyze how artificial intelligence (AI)-based techniques and tools can enhance the operation of complex applications to support the broad and multi-stage heterogeneity of the infrastructural layer in the “computing continuum” through the enhancement of IaC optimization, IaC self-learning, and IaC self-healing. To this extent, the presented work proposes a set of tools, methods, and techniques for applications’ operators to seamlessly select, combine, configure, and adapt computation resources all along the data path and support the complete service lifecycle covering: (1) optimized distributed application deployment over heterogeneous computing resources; (2) monitoring of execution platforms in real time including continuous control and trust of the infrastructural services; (3) application deployment and adaptation while optimizing the execution; and (4) application self-recovery to avoid compromising situations that may lead to an unexpected failure.This research was funded by the European project PIACERE (Horizon 2020 research and innovation Program, under grant agreement no 101000162)
- …