Search CORE

532 research outputs found

TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments

Author: Dakkak Abdul
de Gonzalo Simon Garcia
Hwu Wen-mei
Li Cheng
Xiong Jinjun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/11/2018
Field of study

Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure for both enterprise and consumer applications, has to be able to handle user-defined pipelines of diverse DNN inference workloads while maintaining isolation and latency guarantees, and minimizing resource waste. The current solution for guaranteeing isolation within FaaS is suboptimal -- suffering from "cold start" latency. A major cause of such inefficiency is the need to move large amount of model data within and across servers. We propose TrIMS as a novel solution to address these issues. Our proposed solution consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of application APIs and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models and up to 210x speedup for large models. We achieve up to 8x system throughput improvement.Comment: In Proceedings CLOUD 201

arXiv.org e-Print Archive

Crossref

Detecting Hardware-assisted Hypervisor Rootkits within Nested Virtualized Environments

Author: Morabito Daniel B.
Publication venue: AFIT Scholar
Publication date: 14/06/2012
Field of study

Virtual machine introspection (VMI) is intended to provide a secure and trusted platform from which forensic information can be gathered about the true behavior of malware within a guest. However, it is possible for malware to escape a guest into the host and for hypervisor rootkits, such as BluePill, to stealthily transition a native OS into a virtualized environment. This research examines the effectiveness of selected detection mechanisms against hardware-assisted virtualization rootkits (HAV-R) within a nested virtualized environment. It presents the design, implementation, analysis, and evaluation of a hypervisor rootkit detection system which exploits both processor and translation lookaside buffer-based mechanisms to detect hypervisor rootkits within a variety of nested virtualized systems. It evaluates the effects of different types of virtualization on hypervisor rootkit detection and explores the effectiveness in-guest HAV-R obfuscation efforts. The results provide convincing evidence that the HAV-Rs are detectable in all SVMI scenarios examined, regardless of HAV-R or virtualization type. Also, that the selected detection techniques are effective at detection of HAV-R within nested virtualized environments, and that the type of virtualization implemented in a VMI system has minimal to no effect on HAV-R detection. Finally, it is determined that in-guest obfuscation does not successfully obfuscate the existence of HAV-R

AFTI Scholar (Air Force Institute of Technology)

SoC-Cluster as an Edge Server: an Application-driven Measurement Study

Author: Chen Chenyang
Fu Zhe
Lai Rujin
Li Xiang
Ma Xiao
Shi Boqing
Wang Shangguang
Xu Mengwei
Zhang Li
Zhou Ao
Publication venue
Publication date: 24/12/2022
Field of study

Huge electricity consumption is a severe issue for edge data centers. To this end, we propose a new form of edge server, namely SoC-Cluster, that orchestrates many low-power mobile system-on-chips (SoCs) through an on-chip network. For the first time, we have developed a concrete SoC-Cluster server that consists of 60 Qualcomm Snapdragon 865 SoCs in a 2U rack. Such a server has been commercialized successfully and deployed in large scale on edge clouds. The current dominant workload on those deployed SoC-Clusters is cloud gaming, as mobile SoCs can seamlessly run native mobile games. The primary goal of this work is to demystify whether SoC-Cluster can efficiently serve more general-purpose, edge-typical workloads. Therefore, we built a benchmark suite that leverages state-of-the-art libraries for two killer edge workloads, i.e., video transcoding and deep learning inference. The benchmark comprehensively reports the performance, power consumption, and other application-specific metrics. We then performed a thorough measurement study and directly compared SoC-Cluster with traditional edge servers (with Intel CPU and NVIDIA GPU) with respect to physical size, electricity, and billing. The results reveal the advantages of SoC-Cluster, especially its high energy efficiency and the ability to proportionally scale energy consumption with various incoming loads, as well as its limitations. The results also provide insightful implications and valuable guidance to further improve SoC-Cluster and land it in broader edge scenarios

arXiv.org e-Print Archive

Real-Time Localization Using Software Defined Radio

Author: Alyafawi Islam Fayez Abd
Publication venue: Universität Bern
Publication date: 25/08/2015
Field of study

Service providers make use of cost-effective wireless solutions to identify, localize, and possibly track users using their carried MDs to support added services, such as geo-advertisement, security, and management. Indoor and outdoor hotspot areas play a significant role for such services. However, GPS does not work in many of these areas. To solve this problem, service providers leverage available indoor radio technologies, such as WiFi, GSM, and LTE, to identify and localize users. We focus our research on passive services provided by third parties, which are responsible for (i) data acquisition and (ii) processing, and network-based services, where (i) and (ii) are done inside the serving network. For better understanding of parameters that affect indoor localization, we investigate several factors that affect indoor signal propagation for both Bluetooth and WiFi technologies. For GSM-based passive services, we developed first a data acquisition module: a GSM receiver that can overhear GSM uplink messages transmitted by MDs while being invisible. A set of optimizations were made for the receiver components to support wideband capturing of the GSM spectrum while operating in real-time. Processing the wide-spectrum of the GSM is possible using a proposed distributed processing approach over an IP network. Then, to overcome the lack of information about tracked devices’ radio settings, we developed two novel localization algorithms that rely on proximity-based solutions to estimate in real environments devices’ locations. Given the challenging indoor environment on radio signals, such as NLOS reception and multipath propagation, we developed an original algorithm to detect and remove contaminated radio signals before being fed to the localization algorithm. To improve the localization algorithm, we extended our work with a hybrid based approach that uses both WiFi and GSM interfaces to localize users. For network-based services, we used a software implementation of a LTE base station to develop our algorithms, which characterize the indoor environment before applying the localization algorithm. Experiments were conducted without any special hardware, any prior knowledge of the indoor layout or any offline calibration of the system

BORIS Theses

Bern Open Repository and Information System (BORIS)