4,214 research outputs found
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments
Deep neural networks (DNNs) have become core computation components within
low latency Function as a Service (FaaS) prediction pipelines: including image
recognition, object detection, natural language processing, speech synthesis,
and personalized recommendation pipelines. Cloud computing, as the de-facto
backbone of modern computing infrastructure for both enterprise and consumer
applications, has to be able to handle user-defined pipelines of diverse DNN
inference workloads while maintaining isolation and latency guarantees, and
minimizing resource waste. The current solution for guaranteeing isolation
within FaaS is suboptimal -- suffering from "cold start" latency. A major cause
of such inefficiency is the need to move large amount of model data within and
across servers. We propose TrIMS as a novel solution to address these issues.
Our proposed solution consists of a persistent model store across the GPU, CPU,
local storage, and cloud storage hierarchy, an efficient resource management
layer that provides isolation, and a succinct set of application APIs and
container technologies for easy and transparent integration with FaaS, Deep
Learning (DL) frameworks, and user code. We demonstrate our solution by
interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x
speedup in latency for image classification models and up to 210x speedup for
large models. We achieve up to 8x system throughput improvement.Comment: In Proceedings CLOUD 201
Recommended from our members
MobileTrust: Secure Knowledge Integration in VANETs
Vehicular Ad hoc NETworks (VANET) are becoming popular due to the emergence of the Internet of Things and ambient intelligence applications. In such networks, secure resource sharing functionality is accomplished by incorporating trust schemes. Current solutions adopt peer-to-peer technologies that can cover the large operational area. However, these systems fail to capture some inherent properties of VANETs, such as fast and ephemeral interaction, making robust trust evaluation of crowdsourcing challenging. In this article, we propose MobileTrust—a hybrid trust-based system for secure resource sharing in VANETs. The proposal is a breakthrough in centralized trust computing that utilizes cloud and upcoming 5G technologies to provide robust trust establishment with global scalability. The ad hoc communication is energy-efficient and protects the system against threats that are not countered by the current settings. To evaluate its performance and effectiveness, MobileTrust is modelled in the SUMO simulator and tested on the traffic features of the small-size German city of Eichstatt. Similar schemes are implemented in the same platform to provide a fair comparison. Moreover, MobileTrust is deployed on a typical embedded system platform and applied on a real smart car installation for monitoring traffic and road-state parameters of an urban application. The proposed system is developed under the EU-founded THREAT-ARREST project, to provide security, privacy, and trust in an intelligent and energy-aware transportation scenario, bringing closer the vision of sustainable circular economy
Acceleration-as-a-Service: Exploiting Virtualised GPUs for a Financial Application
'How can GPU acceleration be obtained as a service in a cluster?' This
question has become increasingly significant due to the inefficiency of
installing GPUs on all nodes of a cluster. The research reported in this paper
is motivated to address the above question by employing rCUDA (remote CUDA), a
framework that facilitates Acceleration-as-a-Service (AaaS), such that the
nodes of a cluster can request the acceleration of a set of remote GPUs on
demand. The rCUDA framework exploits virtualisation and ensures that multiple
nodes can share the same GPU. In this paper we test the feasibility of the
rCUDA framework on a real-world application employed in the financial risk
industry that can benefit from AaaS in the production setting. The results
confirm the feasibility of rCUDA and highlight that rCUDA achieves similar
performance compared to CUDA, provides consistent results, and more
importantly, allows for a single application to benefit from all the GPUs
available in the cluster without loosing efficiency.Comment: 11th IEEE International Conference on eScience (IEEE eScience) -
Munich, Germany, 201
- …