Search CORE

1,903 research outputs found

AccaSim: a Customizable Workload Management Simulator for Job Dispatching Research in HPC Systems

Author: Galleguillos Cristian
Kiziltan Zeynep
Netti Alessio
Soto Ricardo
Publication venue
Publication date: 18/06/2018
Field of study

We present AccaSim, a simulator for workload management in HPC systems. Thanks to AccaSim's scalability to large workload datasets, support for easy customization, and practical automated tools to aid experimentation, users can easily represent various real HPC systems, develop novel advanced dispatchers and evaluate them in a convenient way across different workload sources. AccaSim is thus an attractive tool for conducting job dispatching research in HPC systems.Comment: 27 page

arXiv.org e-Print Archive

A Survey of Techniques for Improving Security of GPUs

Author: Abhinaya S. B.
Ali Irfan
Mittal Sparsh
Reddy Manish
Publication venue
Publication date: 01/01/2018
Field of study

Graphics processing unit (GPU), although a powerful performance-booster, also has many security vulnerabilities. Due to these, the GPU can act as a safe-haven for stealthy malware and the weakest `link' in the security `chain'. In this paper, we present a survey of techniques for analyzing and improving GPU security. We classify the works on key attributes to highlight their similarities and differences. More than informing users and researchers about GPU security techniques, this survey aims to increase their awareness about GPU security vulnerabilities and potential countermeasures

arXiv.org e-Print Archive

NSML: Meet the MLaaS platform with a real-world case study

Author: Ha Jung-Woo
Jo Hyunwoo
Kim Hanjoo
Kim Jinwoong
Kim KyungHyun
Kim Minkyu
Kim Youngkwan
Park Heungseok
Park Soeun
Seo Dongjoo
Sung Nako
Yang Youngil
Publication venue
Publication date: 08/10/2018
Field of study

The boom of deep learning induced many industries and academies to introduce machine learning based approaches into their concern, competitively. However, existing machine learning frameworks are limited to sufficiently fulfill the collaboration and management for both data and models. We proposed NSML, a machine learning as a service (MLaaS) platform, to meet these demands. NSML helps machine learning work be easily launched on a NSML cluster and provides a collaborative environment which can afford development at enterprise scale. Finally, NSML users can deploy their own commercial services with NSML cluster. In addition, NSML furnishes convenient visualization tools which assist the users in analyzing their work. To verify the usefulness and accessibility of NSML, we performed some experiments with common examples. Furthermore, we examined the collaborative advantages of NSML through three competitions with real-world use cases

arXiv.org e-Print Archive

TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments

Author: Dakkak Abdul
de Gonzalo Simon Garcia
Hwu Wen-mei
Li Cheng
Xiong Jinjun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/11/2018
Field of study

Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure for both enterprise and consumer applications, has to be able to handle user-defined pipelines of diverse DNN inference workloads while maintaining isolation and latency guarantees, and minimizing resource waste. The current solution for guaranteeing isolation within FaaS is suboptimal -- suffering from "cold start" latency. A major cause of such inefficiency is the need to move large amount of model data within and across servers. We propose TrIMS as a novel solution to address these issues. Our proposed solution consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of application APIs and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models and up to 210x speedup for large models. We achieve up to 8x system throughput improvement.Comment: In Proceedings CLOUD 201

arXiv.org e-Print Archive

Criteria and Approaches for Virtualization on Modern FPGAs

Author: Le Duc-Canh
Youn Chan-Hyun
Publication venue
Publication date: 08/04/2019
Field of study

Modern field programmable gate arrays (FPGAs) can produce high performance in a wide range of applications, and their computational capacity is becoming abundant in personal computers. Regardless of this fact, FPGA virtualization is an emerging research field. Nowadays, challenges of the research area come from not only technical difficulties but also from the ambiguous standards of virtualization. In this paper, we introduce novel criteria of FPGA virtualization and discuss several approaches to accomplish those criteria. In addition, we present and describe in detail the specific FPGA virtualization architecture that we developed on Intel Arria 10 FPGA. We evaluate our solution with a combination of applications and microbenchmarks. The result shows that our virtualization solution can provide a full abstraction of FPGA device in both user and developer perspective while maintaining a reasonable performance compared to native FPGA

arXiv.org e-Print Archive

Vulnerable GPU Memory Management: Towards Recovering Raw Data from GPU

Author: Diao Wenrui
Li Zhou
Liu Rui
Liu Xiangyu
Zhang Kehuan
Zhou Zhe
Publication venue
Publication date: 21/05/2016
Field of study

In this paper, we present that security threats coming with existing GPU memory management strategy are overlooked, which opens a back door for adversaries to freely break the memory isolation: they enable adversaries without any privilege in a computer to recover the raw memory data left by previous processes directly. More importantly, such attacks can work on not only normal multi-user operating systems, but also cloud computing platforms. To demonstrate the seriousness of such attacks, we recovered original data directly from GPU memory residues left by exited commodity applications, including Google Chrome, Adobe Reader, GIMP, Matlab. The results show that, because of the vulnerable memory management strategy, commodity applications in our experiments are all affected

arXiv.org e-Print Archive

eScholarship - University of California

Towards Predictable Real-Time Performance on Multi-Core Platforms

Author: Kim Hyoseung
Publication venue
Publication date: 28/07/2016
Field of study

Cyber-physical systems (CPS) integrate sensing, computing, communication and actuation capabilities to monitor and control operations in the physical environment. A key requirement of such systems is the need to provide predictable real-time performance: the timing correctness of the system should be analyzable at design time with a quantitative metric and guaranteed at runtime with high assurance. This requirement of predictability is particularly important for safety-critical domains such as automobiles, aerospace, defense, manufacturing and medical devices. The work in this dissertation focuses on the challenges arising from the use of modern multi-core platforms in CPS. Even as of today, multi-core platforms are rarely used in safety-critical applications primarily due to the temporal interference caused by contention on various resources shared among processor cores, such as caches, memory buses, and I/O devices. Such interference is hard to predict and can significantly increase task execution time, e.g., up to 12x on commodity quad-core platforms. To address the problem of ensuring timing predictability on multi-core platforms, we develop novel analytical and systems techniques in this dissertation. Our proposed techniques theoretically bound temporal interference that tasks may suffer from when accessing shared resources. Our techniques also involve software primitives and algorithms for real-time operating systems and hypervisors, which significantly reduce the degree of the temporal interference. Specifically, we tackle the issues of cache and memory contention, locking and synchronization, interrupt handling, and access control for computational accelerators such as GPGPUs, all of which are crucial to achieving predictable real-time performance on a modern multi-core platform.Comment: This is the Ph.D. dissertation of the autho

arXiv.org e-Print Archive

FIT4Green - Energy aware ICT Optimization Policies

Author: Basmadjian Robert
Bunse Christian
Georgiadou Vasiliki
Giuliani Giovanni
Klingert Sonja
Lovasz Gergo
Majanen Mikko
Publication venue: 'Airiti Press, Inc.'
Publication date: 01/01/2010
Field of study

MAnnheim DOCument Server

Virtualization Technologies and Cloud Security: advantages, issues, and perspectives

Author: Di Pietro Roberto
Lombardi Flavio
Publication venue
Publication date: 02/08/2018
Field of study

Virtualization technologies allow multiple tenants to share physical resources with a degree of security and isolation that cannot be guaranteed by mere containerization. Further, virtualization allows protected transparent introspection of Virtual Machine activity and content, thus supporting additional control and monitoring. These features provide an explanation, although partial, of why virtualization has been an enabler for the flourishing of cloud services. Nevertheless, security and privacy issues are still present in virtualization technology and hence in Cloud platforms. As an example, even hardware virtualization protection/isolation is far from being perfect and uncircumventable, as recently discovered vulnerabilities show. The objective of this paper is to shed light on current virtualization technology and its evolution from the point of view of security, having as an objective its applications to the Cloud setting.Comment: arXiv admin note: text overlap with arXiv:1702.07521 by other author

arXiv.org e-Print Archive

A Distributed Multi-agent Market Place for HPC Compute Cycle Resource Trading

Author: Coveney Peter V.
Zasada Stefan J.
Publication venue
Publication date: 14/12/2015
Field of study

Computer simulation is finding a role in an increasing number of scientific disciplines, concomitant with the rise in available computing power. Realizing this inevitably requires access to computational power beyond the desktop, making use of clusters, supercomputers, data repositories, networks and distributed aggregations of these resources. Accessing one such resource entails a number of usability and security problems; when multiple geographically distributed resources are involved, the difficulty is compounded. This presents the user with the problem of how to gain access to suitable resources to run their workloads as they need them. In this paper we present our solutions to this problem, a resource trading platform that allows users to purchase access to resources within a distributed e-infrastructure. We present the implementation of this Resource Allocation Market Place as a distributed multi-agent system, and show how it provides a highly flexible, efficient tool to schedule workflows across high performance computing resources

arXiv.org e-Print Archive