821 research outputs found
Saturn: An Optimized Data System for Large Model Deep Learning Workloads
Large language models such as GPT-3 & ChatGPT have transformed deep learning
(DL), powering applications that have captured the public's imagination. These
models are rapidly being adopted across domains for analytics on various
modalities, often by finetuning pre-trained base models. Such models need
multiple GPUs due to both their size and computational load, driving the
development of a bevy of "model parallelism" techniques & tools. Navigating
such parallelism choices, however, is a new burden for end users of DL such as
data scientists, domain scientists, etc. who may lack the necessary systems
knowhow. The need for model selection, which leads to many models to train due
to hyper-parameter tuning or layer-wise finetuning, compounds the situation
with two more burdens: resource apportioning and scheduling. In this work, we
tackle these three burdens for DL users in a unified manner by formalizing them
as a joint problem that we call SPASE: Select a Parallelism, Allocate
resources, and SchedulE. We propose a new information system architecture to
tackle the SPASE problem holistically, representing a key step toward enabling
wider adoption of large DL models. We devise an extensible template for
existing parallelism schemes and combine it with an automated empirical
profiler for runtime estimation. We then formulate SPASE as an MILP.
We find that direct use of an MILP-solver is significantly more effective
than several baseline heuristics. We optimize the system runtime further with
an introspective scheduling approach. We implement all these techniques into a
new data system we call Saturn. Experiments with benchmark DL workloads show
that Saturn achieves 39-49% lower model selection runtimes than typical current
DL practice.Comment: Under submission at VLDB. Code available:
https://github.com/knagrecha/saturn. 12 pages + 3 pages references + 2 pages
appendi
Analytical validation of innovative magneto-inertial outcomes: a controlled environment study.
peer reviewe
Serverless Cloud Computing: A Comparative Analysis of Performance, Cost, and Developer Experiences in Container-Level Services
Serverless cloud computing is a subset of cloud computing considerably adopted to build modern web applications, while the underlying server and infrastructure management duties are abstracted from customers to the cloud vendors. In serverless computing, customers must pay for the runtime consumed by their services, but they are exempt from paying for the idle time. Prior to serverless containers, customers needed to provision, scale, and manage servers, which was a bottleneck for rapidly growing customer-facing applications where latency and scaling were a concern.
The viability of adopting a serverless platform for a web application regarding performance, cost, and developer experiences is studied in this thesis. Three serverless container-level services are employed in this study from AWS and GCP. The services include GCP Cloud Run, GKE AutoPilot, and AWS EKS with AWS Fargate. Platform as a Service (PaaS) underpins the former, and Container as a Service (CaaS) the remainder. A single-page web application was created to perform incremental and spike load tests on those services to assess the performance differences. Furthermore, the cost differences are compared and analyzed. Lastly, the final element considered while evaluating the developer experiences is the complexity of using the services during the project implementation.
Based on the results of this research, it was determined that PaaS-based solutions are a high-performing, affordable alternative for CaaS-based solutions in circumstances where high levels of traffic are periodically anticipated, but sporadic latency is never a concern. Given that this study has limitations, the author recommends additional research to strengthen it
Split Federated Learning for 6G Enabled-Networks: Requirements, Challenges and Future Directions
Sixth-generation (6G) networks anticipate intelligently supporting a wide
range of smart services and innovative applications. Such a context urges a
heavy usage of Machine Learning (ML) techniques, particularly Deep Learning
(DL), to foster innovation and ease the deployment of intelligent network
functions/operations, which are able to fulfill the various requirements of the
envisioned 6G services. Specifically, collaborative ML/DL consists of deploying
a set of distributed agents that collaboratively train learning models without
sharing their data, thus improving data privacy and reducing the
time/communication overhead. This work provides a comprehensive study on how
collaborative learning can be effectively deployed over 6G wireless networks.
In particular, our study focuses on Split Federated Learning (SFL), a technique
recently emerged promising better performance compared with existing
collaborative learning approaches. We first provide an overview of three
emerging collaborative learning paradigms, including federated learning, split
learning, and split federated learning, as well as of 6G networks along with
their main vision and timeline of key developments. We then highlight the need
for split federated learning towards the upcoming 6G networks in every aspect,
including 6G technologies (e.g., intelligent physical layer, intelligent edge
computing, zero-touch network management, intelligent resource management) and
6G use cases (e.g., smart grid 2.0, Industry 5.0, connected and autonomous
systems). Furthermore, we review existing datasets along with frameworks that
can help in implementing SFL for 6G networks. We finally identify key technical
challenges, open issues, and future research directions related to SFL-enabled
6G networks
The OpenMolcas Web: A Community-Driven Approach to Advancing Computational Chemistry
The developments of the open-source OpenMolcas chemistry software environment since spring 2020 are described, with a focus on novel functionalities accessible in the stable branch of the package or via interfaces with other packages. These developments span a wide range of topics in computational chemistry and are presented in thematic sections: electronic structure theory, electronic spectroscopy simulations, analytic gradients and molecular structure optimizations, ab initio molecular dynamics, and other new features. This report offers an overview of the chemical phenomena and processes OpenMolcas can address, while showing that OpenMolcas is an attractive platform for state-of-the-art atomistic computer simulations
Horizontally distributed inference of deep neural networks for AI-enabled IoT
Motivated by the pervasiveness of artificial intelligence (AI) and the Internet of Things (IoT) in the current “smart everything” scenario, this article provides a comprehensive overview of the most recent research at the intersection of both domains, focusing on the design and development of specific mechanisms for enabling a collaborative inference across edge devices towards the in situ execution of highly complex state-of-the-art deep neural networks (DNNs), despite the resource-constrained nature of such infrastructures. In particular, the review discusses the most salient approaches conceived along those lines, elaborating on the specificities of the partitioning schemes and the parallelism paradigms explored, providing an organized and schematic discussion of the underlying workflows and associated communication patterns, as well as the architectural aspects of the DNNs that have driven the design of such techniques, while also highlighting both the primary challenges encountered at the design and operational levels and the specific adjustments or enhancements explored in response to them.Agencia Estatal de Investigación | Ref. DPI2017-87494-RMinisterio de Ciencia e Innovación | Ref. PDC2021-121644-I00Xunta de Galicia | Ref. ED431C 2022/03-GR
Blockchain and Internet of Things in smart cities and drug supply management: Open issues, opportunities, and future directions
Blockchain-based drug supply management (DSM) requires powerful security and privacy procedures for high-level authentication, interoperability, and medical record sharing. Researchers have shown a surprising interest in Internet of Things (IoT)-based smart cities in recent years. By providing a variety of intelligent applications, such as intelligent transportation, industry 4.0, and smart financing, smart cities (SC) can improve the quality of life for their residents. Blockchain technology (BCT) can allow SC to offer a higher standard of security by keeping track of transactions in an immutable, secure, decentralized, and transparent distributed ledger. The goal of this study is to systematically explore the current state of research surrounding cutting-edge technologies, particularly the deployment of BCT and the IoT in DSM and SC. In this study, the defined keywords “blockchain”, “IoT”, drug supply management”, “healthcare”, and “smart cities” as well as their variations were used to conduct a systematic search of all relevant research articles that were collected from several databases such as Science Direct, JStor, Taylor & Francis, Sage, Emerald insight, IEEE, INFORMS, MDPI, ACM, Web of Science, and Google Scholar. The final collection of papers on the use of BCT and IoT in DSM and SC is organized into three categories. The first category contains articles about the development and design of DSM and SC applications that incorporate BCT and IoT, such as new architecture, system designs, frameworks, models, and algorithms. Studies that investigated the use of BCT and IoT in the DSM and SC make up the second category of research. The third category is comprised of review articles regarding the incorporation of BCT and IoT into DSM and SC-based applications. Furthermore, this paper identifies various motives for using BCT and IoT in DSM and SC, as well as open problems and makes recommendations. The current study contributes to the existing body of knowledge by offering a complete review of potential alternatives and finding areas where further research is needed. As a consequence of this, researchers are presented with intriguing potential to further create decentralized DSM and SC apps as a result of a comprehensive discussion of the relevance of BCT and its implementation.© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).fi=vertaisarvioitu|en=peerReviewed
Incremental Multilayer Resource Partitioning for Application Placement in Dynamic Fog
Fog computing platforms became essential for deploying low-latency applications at the network's edge. However, placing and managing time-critical applications over a Fog infrastructure with many heterogeneous and resource-constrained devices over a dynamic network is challenging. This paper proposes an incremental multilayer resource-aware partitioning (M-RAP) method that minimizes resource wastage and maximizes service placement and deadline satisfaction in a dynamic Fog with many application requests. M-RAP represents the heterogeneous Fog resources as a multilayer graph, partitions it based on the network structure and resource types, and constantly updates it upon dynamic changes in the underlying Fog infrastructure. Finally, it identifies the device partitions for placing the application services according to their resource requirements, which must overlap in the same low-latency network partition. We evaluated M-RAP through extensive simulation and two applications executed on a real testbed. The results show that M-RAP can place 1.6 times as many services, satisfy deadlines for 43% more applications, lower their response time by up to 58%, and reduce resource wastage by up to 54% compared to three state-of-the-art methods
FORTE: an extensible framework for robustness and efficiency in data transfer pipelines
In the age of big data and growing product complexity, it is common to monitor many aspects of a product or system, in order to extract well-founded intelligence and draw conclusions, to continue driving innovation. Automating and scaling processes in data-pipelines becomes essential to keep pace with increasing rates of data generated by such practices, while meeting security, governance, scalability and resource-efficiency demands.We present FORTE, an extensible framework for robustness and transfer-efficiency in data pipelines. We identify sources of potential bottlenecks and explore the design space of approaches to deal with the challenges they pose. We study and evaluate synergetic effects of data compression and in-memory processing as well as task scheduling, in association with pipeline performance.A prototype implementation of FORTE is implemented and studied in a use-case at Volvo Trucks for high-volume production-level data sets, in the order of magnitude of hundreds of gigabytes to terabytes per burst. Various general-purpose lossless data compression algorithms are evaluated, in order to balance compression effectiveness and time in the pipeline.All in all, FORTE enables to deal with trade-offs and achieve benefits in latency and sustainable rate (up to 1.8 times better), effectiveness in resource utilisation, all while also enabling additional features such as integrity verification, logging, monitoring and traceability, as well as cataloguing of transferred data. We also note that the resource efficiency improvements achievable with FORTE, and its extensibility, can imply further benefits regarding scheduling, orchestration and energy-efficiency in such pipelines
- …