Search CORE

12 research outputs found

DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs (Extended)

Author: Elsken Thomas
Kim
Li Cheng
Szegedy Christian
Zhu Hongyu
Publication venue
Publication date: 11/03/2020
Field of study

The past few years have seen a surge of applying Deep Learning (DL) models for a wide array of tasks such as image classification, object detection, machine translation, etc. While DL models provide an opportunity to solve otherwise intractable tasks, their adoption relies on them being optimized to meet latency and resource requirements. Benchmarking is a key step in this process but has been hampered in part due to the lack of representative and up-to-date benchmarking suites. This is exacerbated by the fast-evolving pace of DL models. This paper proposes DLBricks, a composable benchmark generation design that reduces the effort of developing, maintaining, and running DL benchmarks on CPUs. DLBricks decomposes DL models into a set of unique runnable networks and constructs the original model's performance using the performance of the generated benchmarks. DLBricks leverages two key observations: DL layers are the performance building blocks of DL models and layers are extensively repeated within and across DL models. Since benchmarks are generated automatically and the benchmarking time is minimized, DLBricks can keep up-to-date with the latest proposed models, relieving the pressure of selecting representative DL models. Moreover, DLBricks allows users to represent proprietary models within benchmark suites. We evaluate DLBricks using

50

MXNet models spanning

5

DL tasks on

4

representative CPU systems. We show that DLBricks provides an accurate performance estimate for the DL models and reduces the benchmarking time across systems (e.g. within

95\%

accuracy and up to

4.4\times

benchmarking time speedup on Amazon EC2 c5.xlarge)

arXiv.org e-Print Archive

Crossref

Exploring the Impact of Serverless Computing on Peer To Peer Training Machine Learning

Author: Barrak Amine
Jaafar Fehmi
Petrillo Fabio
Trabelsi Ranim
Publication venue
Publication date: 25/09/2023
Field of study

The increasing demand for computational power in big data and machine learning has driven the development of distributed training methodologies. Among these, peer-to-peer (P2P) networks provide advantages such as enhanced scalability and fault tolerance. However, they also encounter challenges related to resource consumption, costs, and communication overhead as the number of participating peers grows. In this paper, we introduce a novel architecture that combines serverless computing with P2P networks for distributed training and present a method for efficient parallel gradient computation under resource constraints. Our findings show a significant enhancement in gradient computation time, with up to a 97.34\% improvement compared to conventional P2P distributed training methods. As for costs, our examination confirmed that the serverless architecture could incur higher expenses, reaching up to 5.4 times more than instance-based architectures. It is essential to consider that these higher costs are associated with marked improvements in computation time, particularly under resource-constrained scenarios. Despite the cost-time trade-off, the serverless approach still holds promise due to its pay-as-you-go model. Utilizing dynamic resource allocation, it enables faster training times and optimized resource utilization, making it a promising candidate for a wide range of machine learning applications

arXiv.org e-Print Archive

Function-as-a-Service for the Cloud-to-Thing Continuum: A Systematic Mapping Study

Author: Barišić Ankica
da Rocha Atslands Rego
da Silva Oliveira Bárbara
Dautov Rustem
Ferry Nicolas
Song Hui
Publication venue: SciTePress
Publication date: 01/01/2023
Field of study

Until recently, Internet of Things applications were mainly seen as a means to gather sensor data for further processing in the Cloud. Nowadays, with the advent of Edge and Fog Computing, digital services are dragged closer to the physical world, with data processing and storage tasks distributed across the whole Cloud-to-Thing continuum. Function-as-a-Service (FaaS) is gaining momentum as one of the promising programming models for such digital services. This work investigates the current research landscape of applying FaaS over the Cloud-to-Thing continuum. In particular, we investigate the support offered by existing FaaS platforms for the deployment, placement, orchestration, and execution of functions across the whole continuum using the Systematic Mapping Study methodology. We selected 33 primary studies and analyzed their data, bringing a broad view on the current research landscape in the area.acceptedVersio

SINTEF Open

INRIA a CCSD electronic archive server

Sharing and caring of data at the edge

Author: Bal Henri
Iosup Alexandru
Trivedi Animesh
Wang Lin
Publication venue
Publication date: 01/01/2020
Field of study

VU Research Portal

Energy-Efficient Service Placement for Latency-Sensitive Applications in Edge Computing

Author: Ghaddar Bissan
Premsankar Gopika
Publication venue
Publication date: 15/09/2022
Field of study

Edge computing is a promising solution to host artificial intelligence (AI) applications that enable real-time insights on user-generated and device-generated data. This requires edge computing resources (storage and compute) to be widely deployed close to end devices. Such edge deployments require a large amount of energy to run as edge resources are typically overprovisioned to flexibly meet the needs of time-varying user demand with a low latency. Moreover, AI applications rely on deep neural network (DNN) models that are increasingly larger in size to support high accuracy. These DNN models must be efficiently stored and transferred, so as to minimize their energy consumption. In this article, we model the problem of energy-efficient placement of services (namely, DNN models) for AI applications as a multiperiod optimization problem. The formulation jointly places services and schedules requests such that the overall energy consumption is minimized and latency is low. We propose a heuristic that efficiently solves the problem while taking into account the impact of placing services across time periods. We assess the quality of the proposed heuristic by comparing its solution to a lower bound of the problem, obtained by formulating and solving a Lagrangian relaxation of the original problem. Extensive simulations show that our proposed heuristic outperforms baseline approaches in achieving a low energy consumption by packing services on a minimal number of edge nodes, while at the same time keeping the average latency of served requests below a configured threshold in nearly all time periods.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

A Data-parallel Approach for Efficient Resource Utilization in Distributed Serverless Deep Learning

Author: Assogba Kevin Tunder Elom
Publication venue: RIT Scholar Works
Publication date: 01/08/2020
Field of study

Serverless computing is an integral part of the recent success of cloud computing, offering cost and performance efficiency for small and large scale distributed systems. Owing to the increasing interest of developers in integrating distributed computing techniques into deep learning frameworks for better performance, serverless infrastructures have been the choice of many to host their applications. However, this computing architecture bears resource limitations which challenge the successful completion of many deep learning jobs. In our research, an approach is presented to address timeout and memory resource limitations which are two key issues to deep learning on serverless infrastructures. Focusing on Apache OpenWhisk as severless platform, and TensorFlow as deep learning framework, our solution follows an in-depth assessment of the former and failed attempts at tackling resource constraints through system-level modifications. The proposed approach employs data parallelism and ensures the concurrent execution of separate cloud functions. A weighted averaging of intermediate models is afterwards applied to build an ensemble model ready for evaluation. Through a fine-grained system design, our solution executed and completed deep learning workflows on OpenWhisk with a 0% failure rate. Moreover, the comparison with a traditional deployment on OpenWhisk shows that our approach uses 45% less memory and reduces the execution time by 58%

RIT Scholar Works

Federated Learning for Medical Applications: A Taxonomy, Current Trends, Challenges, and Future Research Directions

Author: Bagci Ulas
Hagos Desta Haileselassie
Håkegård Jan Erik
Jha Debesh
Rauniyar Ashish
Rawat Danda B.
Vlassov Vladimir
Publication venue
Publication date: 29/10/2023
Field of study

With the advent of the IoT, AI, ML, and DL algorithms, the landscape of data-driven medical applications has emerged as a promising avenue for designing robust and scalable diagnostic and prognostic models from medical data. This has gained a lot of attention from both academia and industry, leading to significant improvements in healthcare quality. However, the adoption of AI-driven medical applications still faces tough challenges, including meeting security, privacy, and quality of service (QoS) standards. Recent developments in \ac{FL} have made it possible to train complex machine-learned models in a distributed manner and have become an active research domain, particularly processing the medical data at the edge of the network in a decentralized way to preserve privacy and address security concerns. To this end, in this paper, we explore the present and future of FL technology in medical applications where data sharing is a significant challenge. We delve into the current research trends and their outcomes, unravelling the complexities of designing reliable and scalable \ac{FL} models. Our paper outlines the fundamental statistical issues in FL, tackles device-related problems, addresses security challenges, and navigates the complexity of privacy concerns, all while highlighting its transformative potential in the medical field. Our study primarily focuses on medical applications of \ac{FL}, particularly in the context of global cancer diagnosis. We highlight the potential of FL to enable computer-aided diagnosis tools that address this challenge with greater effectiveness than traditional data-driven methods. We hope that this comprehensive review will serve as a checkpoint for the field, summarizing the current state-of-the-art and identifying open problems and future research directions.Comment: Accepted at IEEE Internet of Things Journa

arXiv.org e-Print Archive

The Pipeline for the Continuous Development of Artificial Intelligence Models -- Current State of Research and Practice

Author: Felderer Michael
Ramler Rudolf
Steidl Monika
Publication venue: 'Elsevier BV'
Publication date: 01/01/2023
Field of study

Companies struggle to continuously develop and deploy AI models to complex production systems due to AI characteristics while assuring quality. To ease the development process, continuous pipelines for AI have become an active research area where consolidated and in-depth analysis regarding the terminology, triggers, tasks, and challenges is required. This paper includes a Multivocal Literature Review where we consolidated 151 relevant formal and informal sources. In addition, nine-semi structured interviews with participants from academia and industry verified and extended the obtained information. Based on these sources, this paper provides and compares terminologies for DevOps and CI/CD for AI, MLOps, (end-to-end) lifecycle management, and CD4ML. Furthermore, the paper provides an aggregated list of potential triggers for reiterating the pipeline, such as alert systems or schedules. In addition, this work uses a taxonomy creation strategy to present a consolidated pipeline comprising tasks regarding the continuous development of AI. This pipeline consists of four stages: Data Handling, Model Learning, Software Development and System Operations. Moreover, we map challenges regarding pipeline implementation, adaption, and usage for the continuous development of AI to these four stages.Comment: accepted in the Journal Systems and Softwar

arXiv.org e-Print Archive

Blekinge Institute of Technology

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Edge AI for Internet of Energy: Challenges and Perspectives

Author: Alsalemi Abdullah
Amira Abbes
Bensaali Faycal
Himeur Yassine
Sayed Aya Nabil
Publication venue
Publication date: 28/11/2023
Field of study

The digital landscape of the Internet of Energy (IoE) is on the brink of a revolutionary transformation with the integration of edge Artificial Intelligence (AI). This comprehensive review elucidates the promise and potential that edge AI holds for reshaping the IoE ecosystem. Commencing with a meticulously curated research methodology, the article delves into the myriad of edge AI techniques specifically tailored for IoE. The myriad benefits, spanning from reduced latency and real-time analytics to the pivotal aspects of information security, scalability, and cost-efficiency, underscore the indispensability of edge AI in modern IoE frameworks. As the narrative progresses, readers are acquainted with pragmatic applications and techniques, highlighting on-device computation, secure private inference methods, and the avant-garde paradigms of AI training on the edge. A critical analysis follows, offering a deep dive into the present challenges including security concerns, computational hurdles, and standardization issues. However, as the horizon of technology ever expands, the review culminates in a forward-looking perspective, envisaging the future symbiosis of 5G networks, federated edge AI, deep reinforcement learning, and more, painting a vibrant panorama of what the future beholds. For anyone vested in the domains of IoE and AI, this review offers both a foundation and a visionary lens, bridging the present realities with future possibilities

arXiv.org e-Print Archive