12 research outputs found
DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs (Extended)
The past few years have seen a surge of applying Deep Learning (DL) models
for a wide array of tasks such as image classification, object detection,
machine translation, etc. While DL models provide an opportunity to solve
otherwise intractable tasks, their adoption relies on them being optimized to
meet latency and resource requirements. Benchmarking is a key step in this
process but has been hampered in part due to the lack of representative and
up-to-date benchmarking suites. This is exacerbated by the fast-evolving pace
of DL models.
This paper proposes DLBricks, a composable benchmark generation design that
reduces the effort of developing, maintaining, and running DL benchmarks on
CPUs. DLBricks decomposes DL models into a set of unique runnable networks and
constructs the original model's performance using the performance of the
generated benchmarks. DLBricks leverages two key observations: DL layers are
the performance building blocks of DL models and layers are extensively
repeated within and across DL models. Since benchmarks are generated
automatically and the benchmarking time is minimized, DLBricks can keep
up-to-date with the latest proposed models, relieving the pressure of selecting
representative DL models. Moreover, DLBricks allows users to represent
proprietary models within benchmark suites. We evaluate DLBricks using
MXNet models spanning DL tasks on representative CPU systems. We show
that DLBricks provides an accurate performance estimate for the DL models and
reduces the benchmarking time across systems (e.g. within accuracy and
up to benchmarking time speedup on Amazon EC2 c5.xlarge)
Exploring the Impact of Serverless Computing on Peer To Peer Training Machine Learning
The increasing demand for computational power in big data and machine
learning has driven the development of distributed training methodologies.
Among these, peer-to-peer (P2P) networks provide advantages such as enhanced
scalability and fault tolerance. However, they also encounter challenges
related to resource consumption, costs, and communication overhead as the
number of participating peers grows. In this paper, we introduce a novel
architecture that combines serverless computing with P2P networks for
distributed training and present a method for efficient parallel gradient
computation under resource constraints.
Our findings show a significant enhancement in gradient computation time,
with up to a 97.34\% improvement compared to conventional P2P distributed
training methods. As for costs, our examination confirmed that the serverless
architecture could incur higher expenses, reaching up to 5.4 times more than
instance-based architectures. It is essential to consider that these higher
costs are associated with marked improvements in computation time, particularly
under resource-constrained scenarios. Despite the cost-time trade-off, the
serverless approach still holds promise due to its pay-as-you-go model.
Utilizing dynamic resource allocation, it enables faster training times and
optimized resource utilization, making it a promising candidate for a wide
range of machine learning applications
Function-as-a-Service for the Cloud-to-Thing Continuum: A Systematic Mapping Study
Until recently, Internet of Things applications were mainly seen as a means to gather sensor data for further processing in the Cloud. Nowadays, with the advent of Edge and Fog Computing, digital services are dragged closer to the physical world, with data processing and storage tasks distributed across the whole Cloud-to-Thing continuum. Function-as-a-Service (FaaS) is gaining momentum as one of the promising programming models for such digital services. This work investigates the current research landscape of applying FaaS over the Cloud-to-Thing continuum. In particular, we investigate the support offered by existing FaaS platforms for the deployment, placement, orchestration, and execution of functions across the whole continuum using the Systematic Mapping Study methodology. We selected 33 primary studies and analyzed their data, bringing a broad view on the current research landscape in the area.acceptedVersio
Energy-Efficient Service Placement for Latency-Sensitive Applications in Edge Computing
Edge computing is a promising solution to host artificial intelligence (AI) applications that enable real-time insights on user-generated and device-generated data. This requires edge computing resources (storage and compute) to be widely deployed close to end devices. Such edge deployments require a large amount of energy to run as edge resources are typically overprovisioned to flexibly meet the needs of time-varying user demand with a low latency. Moreover, AI applications rely on deep neural network (DNN) models that are increasingly larger in size to support high accuracy. These DNN models must be efficiently stored and transferred, so as to minimize their energy consumption. In this article, we model the problem of energy-efficient placement of services (namely, DNN models) for AI applications as a multiperiod optimization problem. The formulation jointly places services and schedules requests such that the overall energy consumption is minimized and latency is low. We propose a heuristic that efficiently solves the problem while taking into account the impact of placing services across time periods. We assess the quality of the proposed heuristic by comparing its solution to a lower bound of the problem, obtained by formulating and solving a Lagrangian relaxation of the original problem. Extensive simulations show that our proposed heuristic outperforms baseline approaches in achieving a low energy consumption by packing services on a minimal number of edge nodes, while at the same time keeping the average latency of served requests below a configured threshold in nearly all time periods.Peer reviewe
A Data-parallel Approach for Efficient Resource Utilization in Distributed Serverless Deep Learning
Serverless computing is an integral part of the recent success of cloud computing, offering cost and performance efficiency for small and large scale distributed systems. Owing to the increasing interest of developers in integrating distributed computing techniques into deep learning frameworks for better performance, serverless infrastructures have been the choice of many to host their applications. However, this computing architecture bears resource limitations which challenge the successful completion of many deep learning jobs.
In our research, an approach is presented to address timeout and memory resource limitations which are two key issues to deep learning on serverless infrastructures. Focusing on Apache OpenWhisk as severless platform, and TensorFlow as deep learning framework, our solution follows an in-depth assessment of the former and failed attempts at tackling resource constraints through system-level modifications. The proposed approach employs data parallelism and ensures the concurrent execution of separate cloud functions. A weighted averaging of intermediate models is afterwards applied to build an ensemble model ready for evaluation. Through a fine-grained system design, our solution executed and completed deep learning workflows on OpenWhisk with a 0% failure rate. Moreover, the comparison with a traditional deployment on OpenWhisk shows that our approach uses 45% less memory and reduces the execution time by 58%
Federated Learning for Medical Applications: A Taxonomy, Current Trends, Challenges, and Future Research Directions
With the advent of the IoT, AI, ML, and DL algorithms, the landscape of
data-driven medical applications has emerged as a promising avenue for
designing robust and scalable diagnostic and prognostic models from medical
data. This has gained a lot of attention from both academia and industry,
leading to significant improvements in healthcare quality. However, the
adoption of AI-driven medical applications still faces tough challenges,
including meeting security, privacy, and quality of service (QoS) standards.
Recent developments in \ac{FL} have made it possible to train complex
machine-learned models in a distributed manner and have become an active
research domain, particularly processing the medical data at the edge of the
network in a decentralized way to preserve privacy and address security
concerns. To this end, in this paper, we explore the present and future of FL
technology in medical applications where data sharing is a significant
challenge. We delve into the current research trends and their outcomes,
unravelling the complexities of designing reliable and scalable \ac{FL} models.
Our paper outlines the fundamental statistical issues in FL, tackles
device-related problems, addresses security challenges, and navigates the
complexity of privacy concerns, all while highlighting its transformative
potential in the medical field. Our study primarily focuses on medical
applications of \ac{FL}, particularly in the context of global cancer
diagnosis. We highlight the potential of FL to enable computer-aided diagnosis
tools that address this challenge with greater effectiveness than traditional
data-driven methods. We hope that this comprehensive review will serve as a
checkpoint for the field, summarizing the current state-of-the-art and
identifying open problems and future research directions.Comment: Accepted at IEEE Internet of Things Journa
The Pipeline for the Continuous Development of Artificial Intelligence Models -- Current State of Research and Practice
Companies struggle to continuously develop and deploy AI models to complex
production systems due to AI characteristics while assuring quality. To ease
the development process, continuous pipelines for AI have become an active
research area where consolidated and in-depth analysis regarding the
terminology, triggers, tasks, and challenges is required. This paper includes a
Multivocal Literature Review where we consolidated 151 relevant formal and
informal sources. In addition, nine-semi structured interviews with
participants from academia and industry verified and extended the obtained
information. Based on these sources, this paper provides and compares
terminologies for DevOps and CI/CD for AI, MLOps, (end-to-end) lifecycle
management, and CD4ML. Furthermore, the paper provides an aggregated list of
potential triggers for reiterating the pipeline, such as alert systems or
schedules. In addition, this work uses a taxonomy creation strategy to present
a consolidated pipeline comprising tasks regarding the continuous development
of AI. This pipeline consists of four stages: Data Handling, Model Learning,
Software Development and System Operations. Moreover, we map challenges
regarding pipeline implementation, adaption, and usage for the continuous
development of AI to these four stages.Comment: accepted in the Journal Systems and Softwar
Edge AI for Internet of Energy: Challenges and Perspectives
The digital landscape of the Internet of Energy (IoE) is on the brink of a
revolutionary transformation with the integration of edge Artificial
Intelligence (AI). This comprehensive review elucidates the promise and
potential that edge AI holds for reshaping the IoE ecosystem. Commencing with a
meticulously curated research methodology, the article delves into the myriad
of edge AI techniques specifically tailored for IoE. The myriad benefits,
spanning from reduced latency and real-time analytics to the pivotal aspects of
information security, scalability, and cost-efficiency, underscore the
indispensability of edge AI in modern IoE frameworks. As the narrative
progresses, readers are acquainted with pragmatic applications and techniques,
highlighting on-device computation, secure private inference methods, and the
avant-garde paradigms of AI training on the edge. A critical analysis follows,
offering a deep dive into the present challenges including security concerns,
computational hurdles, and standardization issues. However, as the horizon of
technology ever expands, the review culminates in a forward-looking
perspective, envisaging the future symbiosis of 5G networks, federated edge AI,
deep reinforcement learning, and more, painting a vibrant panorama of what the
future beholds. For anyone vested in the domains of IoE and AI, this review
offers both a foundation and a visionary lens, bridging the present realities
with future possibilities