1,097 research outputs found
Dynamical Modeling of Cloud Applications for Runtime Performance Management
Cloud computing has quickly grown to become an essential component in many modern-day software applications. It allows consumers, such as a provider of some web service, to quickly and on demand obtain the necessary computational resources to run their applications. It is desirable for these service providers to keep the running cost of their cloud application low while adhering to various performance constraints. This is made difficult due to the dynamics imposed by, e.g., resource contentions or changing arrival rate of users, and the fact that there exist multiple ways of influencing the performance of a running cloud application. To facilitate decision making in this environment, performance models can be introduced that relate the workload and different actions to important performance metrics.In this thesis, such performance models of cloud applications are studied. In particular, we focus on modeling using queueing theory and on the fluid model for approximating the often intractable dynamics of the queue lengths. First, existing results on how the fluid model can be obtained from the mean-field approximation of a closed queueing network are simplified and extended to allow for mixed networks. The queues are allowed to follow the processor sharing or delay disciplines, and can have multiple classes with phase-type service times. An improvement to this fluid model is then presented to increase accuracy when the \emph{system size}, i.e., number of servers, initial population, and arrival rate, is small. Furthermore, a closed-form approximation of the response time CDF is presented. The methods are tested in a series of simulation experiments and shown to be accurate. This mean-field fluid model is then used to derive a general fluid model for microservices with interservice delays. The model is shown to be completely extractable at runtime in a distributed fashion. It is further evaluated on a simple microservice application and found to accurately predict important performance metrics in most cases. Furthermore, a method is devised to reduce the cost of a running application by tuning load balancing parameters between replicas. The method is built on gradient stepping by applying automatic differentiation to the fluid model. This allows for arbitrarily defined cost functions and constraints, most notably including different response time percentiles. The method is tested on a simple application distributed over multiple computing clusters and is shown to reduce costs while adhering to percentile constraints. Finally, modeling of request cloning is studied using the novel concept of synchronized service. This allows certain forms of cloning over servers, each modeled with a single queue, to be equivalently expressed as one single queue. The concept is very general regarding the involved queueing discipline and distributions, but instead introduces new, less realistic assumptions. How the equivalent queue model is affected by relaxing these assumptions is studied considering the processor sharing discipline, and an extension to enable modeling of speculative execution is made. In a simulation campaign, it is shown that these relaxations only has a minor effect in certain cases
Recommended from our members
Accurate modeling of core and memory locality for proxy generation targeting emerging applications and architectures
Designing optimal computer systems for improved performance and energy efficiency requires architects and designers to have a deep understanding of the end-user workloads. However, many end-users (e.g., large corporations, banks, defense organizations, etc.) are apprehensive to share their applications with designers due to the confidential nature of software code and data. In addition, emerging applications pose significant challenges to early design space exploration due to their long-running nature and the highly complex nature of their software stack that cannot be supported on many early performance models.
The above challenges can be overcome by using a proxy benchmark. A miniaturized proxy benchmark can be used as a substitute of the original workload to perform early computer performance evaluation. The process of generating a proxy benchmark consists of extracting a set of key statistics to summarize the behavior of end-user applications through profiling and using the collected statistics to synthesize a representative proxy benchmark. Using such proxy benchmarks can help designers to understand the behavior of end-user’s workloads in a reasonable time without the users having to disclose sensitive information about their workloads.
Prior proxy benchmarking schemes leverage micro-architecture independent metrics, derived from detailed simulation tools, to generate proxy benchmarks. However, many emerging workloads do not work reliably with many profiling or simulation tools, in which case it becomes impossible to apply prior proxy generation techniques to generate proxy benchmarks for such complex applications. Furthermore, these techniques model instruction pipeline-level locality in great detail, but abstract out memory locality modeling using simple stride-based models. This results in poor cloning accuracy especially for emerging applications, which have larger memory footprints and complex access patterns. A few detailed cache and memory locality modeling techniques have also been proposed in literature. However, these techniques either model limited locality metrics and suffer from poor cloning accuracy or are fairly accurate, but at the expense of significant metadata overhead. Finally, none of the prior proxy benchmarking techniques model both core and memory locality with high accuracy. As a result, they are not useful for studying system-level performance behavior. Keeping the above key limitations and shortcomings of prior work in mind, this dissertation presents several techniques that expand the frontiers of workload proxy benchmarking, thereby enabling computer designers to gain a better and faster understanding of end-user application behavior without compromising the privileged nature of software or data.
This dissertation first presents a core-level proxy benchmark generation methodology that leverages performance metrics derived from hardware performance counter measurements to create miniature proxy benchmarks targeting emerging big-data applications. The presented performance counter based characterization and associated extrapolation into generic parameters for proxy generation enables faster analysis (runs almost at native hardware speeds, unlike prior workload cloning proposals) and proxy generation for emerging applications that do not work with simulators or profiling tools. The generated proxy benchmarks are representative of the performance of the real-world big-data applications, including operating system and run-time effects, and yet converge to results quickly without needing any complex software stack support.
Next, to improve upon the accuracy and efficiency of prior memory proxy benchmarking techniques, this dissertation presents a novel memory locality modeling technique that leverages localized pattern detection to create miniature memory proxy benchmarks. The presented technique models memory reference locality by decomposing an application’s memory accesses into a set of independent streams (localized by using address region based localization property), tracking fine-grained patterns within the localized streams and, finally, chaining or interleaving accesses from different localized memory streams to create an ordered proxy memory access sequence. This dissertation further extends the workload cloning approach to Graphics Processing Units (GPUs) and presents a novel proxy generation methodology to model the inherent memory access locality of GPU applications, while also accounting for the GPU’s parallel execution model. The generated memory proxy benchmarks help to enable fast and efficient design space exploration of futuristic memory hierarchies.
Finally, this dissertation presents a novel technique to integrate accurate core and memory locality models to create system-level proxy benchmarks targeting emerging applications. This is a new capability that can facilitate efficient overall system (core, cache and memory subsystem) design-space exploration. This dissertation further presents a novel methodology that exploits the synthetic benchmark generation framework to create hypothetical workloads with performance behavior that does not currently exist. Such proxies can be generated to cover anticipated code trends and can represent futuristic workloads before the workloads even exist.Electrical and Computer Engineerin
A cross-stack, network-centric architectural design for next-generation datacenters
This thesis proposes a full-stack, cross-layer datacenter architecture based on in-network computing and near-memory processing paradigms. The proposed datacenter architecture is built atop two principles: (1) utilizing commodity, off-the-shelf hardware (i.e., processor, DRAM, and network devices) with minimal changes to their architecture, and (2) providing a standard interface to the programmers for using the novel hardware. More specifically, the proposed datacenter architecture enables a smart network adapter to collectively compress/decompress data exchange between distributed DNN training nodes and assist the operating system in performing aggressive processor power management. It also deploys specialized memory modules in the servers, capable of performing general-purpose computation and network connectivity.
This thesis unlocks the potentials of hardware and operating system co-design in architecting application-transparent, near-data processing hardware for improving datacenter's performance, energy efficiency, and scalability. We evaluate the proposed datacenter architecture using a combination of full-system simulation, FPGA prototyping, and real-system experiments
Understanding and Leveraging Virtualization Technology in Commodity Computing Systems
Commodity computing platforms are imperfect, requiring various enhancements for performance and security purposes. In the past decade, virtualization technology has emerged as a promising trend for commodity computing platforms, ushering many opportunities to optimize the allocation of hardware resources. However, many abstractions offered by virtualization not only make enhancements more challenging, but also complicate the proper understanding of virtualized systems. The current understanding and analysis of these abstractions are far from being satisfactory. This dissertation aims to tackle this problem from a holistic view, by systematically studying the system behaviors. The focus of our work lies in performance implication and security vulnerabilities of a virtualized system.;We start with the first abstraction---an intensive memory multiplexing for I/O of Virtual Machines (VMs)---and present a new technique, called Batmem, to effectively reduce the memory multiplexing overhead of VMs and emulated devices by optimizing the operations of the conventional emulated Memory Mapped I/O in hypervisors. Then we analyze another particular abstraction---a nested file system---and attempt to both quantify and understand the crucial aspects of performance in a variety of settings. Our investigation demonstrates that the choice of a file system at both the guest and hypervisor levels has significant impact upon I/O performance.;Finally, leveraging utilities to manage VM disk images, we present a new patch management framework, called Shadow Patching, to achieve effective software updates. This framework allows system administrators to still take the offline patching approach but retain most of the benefits of live patching by using commonly available virtualization techniques. to demonstrate the effectiveness of the approach, we conduct a series of experiments applying a wide variety of software patches. Our results show that our framework incurs only small overhead in running systems, but can significantly reduce maintenance window
Building Computing-As-A-Service Mobile Cloud System
The last five years have witnessed the proliferation of smart mobile devices, the explosion of various mobile applications and the rapid adoption of cloud computing in business, governmental and educational IT deployment. There is also a growing trends of combining mobile computing and cloud computing as a new popular computing paradigm nowadays. This thesis envisions the future of mobile computing which is primarily affected by following three trends: First, servers in cloud equipped with high speed multi-core technology have been the main stream today. Meanwhile, ARM processor powered servers is growingly became popular recently and the virtualization on ARM systems is also gaining wide ranges of attentions recently. Second, high-speed internet has been pervasive and highly available. Mobile devices are able to connect to cloud anytime and anywhere. Third, cloud computing is reshaping the way of using computing resources. The classic pay/scale-as-you-go model allows hardware resources to be optimally allocated and well-managed. These three trends lend credence to a new mobile computing model with the combination of resource-rich cloud and less powerful mobile devices. In this model, mobile devices run the core virtualization hypervisor with virtualized phone instances, allowing for pervasive access to more powerful, highly-available virtual phone clones in the cloud. The centralized cloud, powered by rich computing and memory recourses, hosts virtual phone clones and repeatedly synchronize the data changes with virtual phone instances running on mobile devices. Users can flexibly isolate different computing environments.
In this dissertation, we explored the opportunity of leveraging cloud resources for mobile computing for the purpose of energy saving, performance augmentation as well as secure computing enviroment isolation. We proposed a framework that allows mo- bile users to seamlessly leverage cloud to augment the computing capability of mobile devices and also makes it simpler for application developers to run their smartphone applications in the cloud without tedious application partitioning. This framework was built with virtualization on both server side and mobile devices. It has three building blocks including agile virtual machine deployment, efficient virtual resource management, and seamless mobile augmentation. We presented the design, imple- mentation and evaluation of these three components and demonstrated the feasibility of the proposed mobile cloud model
End-to-End Application Cloning for Distributed Cloud Microservices with Ditto
We present Ditto, an automated framework for cloning end-to-end cloud
applications, both monolithic and microservices, which captures I/O and network
activity, as well as kernel operations, in addition to application logic. Ditto
takes a hierarchical approach to application cloning, starting with capturing
the dependency graph across distributed services, to recreating each tier's
control/data flow, and finally generating system calls and assembly that mimics
the individual applications. Ditto does not reveal the logic of the original
application, facilitating publicly sharing clones of production services with
hardware vendors, cloud providers, and the research community.
We show that across a diverse set of single- and multi-tier applications,
Ditto accurately captures their CPU and memory characteristics as well as their
high-level performance metrics, is portable across platforms, and facilitates a
wide range of system studies
- …