25 research outputs found
A Holistic Approach to Lowering Latency in Geo-distributed Web Applications
User perceived end-to-end latency of web applications have a huge impact on the revenue for many businesses. The end-to-end latency of web applications is impacted by: (i) User to Application server (front-end) latency which includes downloading and parsing web pages, retrieving further objects requested by javascript executions; and (ii) Application and storage server(back-end) latency which includes retrieving meta-data required for an initial rendering, and subsequent content based on user actions.
Improving the user-perceived performance of web applications is challenging, given their complex operating environments involving user-facing web servers, content distribution network (CDN) servers, multi-tiered application servers, and storage servers. Further, the application and storage servers are often deployed on multi-tenant cloud platforms that show high performance variability. While many novel approaches like SPDY and geo-replicated datastores have been developed to improve their performance, many of these solutions are specific to certain layers, and may have different impact on user-perceived performance.
The primary goal of this thesis is to address the above challenges in a holistic manner, focusing specifically on improving the end-to-end latency of geo-distributed multi-tiered web applications. This thesis makes the following contributions: (i) First, it reduces user-facing latency by helping CDNs identify and map objects that are more critical for page-load latency to the faster CDN cache layers. Through controlled experiments on real-world web pages, we show the potential of our approach to reduce hundreds of milliseconds in latency without affecting overall CDN miss rates. (ii) Next, it reduces back-end latency by optimally adapting the datastore replication policies (including number and location of replicas) to the heterogeneity in workloads. We show the benefits of our replication models using real-world traces of Twitter, Wikipedia and Gowalla on a 8 datacenter Cassandra cluster deployed on EC2. (iii) Finally, it makes multi-tier applications resilient to the inherent performance variability in the cloud through fine-grained request redirection. We highlight the benefits of our approach by deploying three real-world applications on commercial cloud platforms
A Stochastic Model of Plausibility in Live-Virtual-Constructive Environments
Distributed live-virtual-constructive simulation promises a number of benefits for the test and evaluation community, including reduced costs, access to simulations of limited availability assets, the ability to conduct large-scale multi-service test events, and recapitalization of existing simulation investments. However, geographically distributed systems are subject to fundamental state consistency limitations that make assessing the data quality of live-virtual-constructive experiments difficult. This research presents a data quality model based on the notion of plausible interaction outcomes. This model explicitly accounts for the lack of absolute state consistency in distributed real-time systems and offers system designers a means of estimating data quality and fitness for purpose. Experiments with World of Warcraft player trace data validate the plausibility model and exceedance probability estimates. Additional experiments with synthetic data illustrate the model\u27s use in ensuring fitness for purpose of live-virtual-constructive simulations and estimating the quality of data obtained from live-virtual-constructive experiments
INFORMATION-UPDATE SYSTEMS: MODELS, ALGORITHMS, AND ANALYSIS
Age of information (AoI) has been proposed as a new metric to measure the staleness of data. For time-sensitive information, it is critical to keep the AoI at a low level. A lot of work have been done on the analysis and optimization on AoI in information-update systems. Prior studies on AoI optimization often consider a push model, which is concerned about when and how to "push" (i.e., generate and transmit) the updated information to the user. In stark contrast, we introduce a new pull model, which is more relevant for certain applications (such as the real-time stock quotes service), where a user sends requests to the servers to proactively "pull" the information of interest. Moreover, we propose to employ request replication to reduce the AoI. Interestingly, we find that under this new Pull model, replication schemes capture a novel tradeoff between different levels of information freshness and different response times across the servers, which can be exploited to minimize the expected AoI at the user's side. Specifically, assuming Poisson updating process for the servers and exponentially distributed response time with known expectation, we derive a closed-form formula for computing the expected AoI and obtain the optimal number of responses to wait for to minimize the expected AoI. Then, we extend our analysis to the setting where the user aims to maximize the utility, which is an exponential function of the negative AoI and represents the user's satisfaction level about the timeliness of the received information. We can similarly derive a closed-form formula of the expected utility and find the optimal number of responses to wait for. Further, we consider a more realistic scenario where the updating rate and the mean response time at the servers are unknown to the user. In this case, we formulate the utility maximization problem as a stochastic Multi-Armed Bandit (MAB) Problem. The formulated MAB problem has a special linear feedback graph, which can be leveraged to design policies with an improved regret upper bound. We also notice that one factor has been missing in most of the previous solutions on AoI minimization, which is the cost of performing updates. Therefore, we focus on the tradeoff between the AoI and the update cost, which is of significant importance in time-sensitive data-driven applications. We consider the applications where the information provider is directly connected to the data source, and the clients need to obtain the data from the information provider in a real-time manner (such as the real-time environmental monitoring system). The provider needs to update the data so that it can reply to the clients' requests with fresh information. However, the update cost limits the frequency that the server can refresh the data, which makes it important to design an efficient policy with optimal tradeoff between data freshness and update cost. We define the staleness cost, which reflects the AoI of the data and formulate the problem as the minimization over the summation of the update cost and the staleness cost. We first propose important guidelines of designing update policies in such information-update systems that can be applied to arbitrary request arrival processes. Then, we design an update policy with a simple threshold-based structure, which is easy to implement. Under the assumption of Poisson request arrival process, we derive the closed-form expression of the average cost of the threshold-based policy and prove its optimality among all online update policies. In almost all prior works, the analysis and optimization are based on traditional queueing models with the probabilistic approaches. However, in the traditional probabilistic study of general queueing models, the analysis is heavily dependent on the properties of specific distributions. Under this framework, it is also usually hard to handle distributions with heavy tail behavior. To that end, in this work, we take an alternative new approach and focus on the Peak Age of Information (PAoI), which is the largest age of each update shown to the end users. Specifically, we employ a recently developed analysis framework based on robust optimization and model the uncertainty in the stochastic arrival and service processes by uncertainty sets. This robust queueing framework enables us to approximate the steady-state PAoI performance of information-update systems with very general arrival and service processes, including those exhibiting heavy-tailed behavior. We first propose a new bound of the PAoI under the single-source system that performs much better than previous results, especially with light traffic. Then, we generalize it to multi-source systems with symmetric arrivals, which involves new technical challenges. It has been extensively investigated for various queueing models based on the probabilistic approaches. However, in the traditional probabilistic study of general queueing models, the analysis is heavily dependent on the properties of specific distributions, such as the memoryless property of the Poisson distribution. Under this framework, it is also usually hard to handle distributions with heavy tail behavior. To that end, we take an alternative new approach and focus on the Peak Age of Information (PAoI), which is the largest age of each update shown to the end users. Specifically, we employ a recently developed analysis framework based on robust optimization and model the uncertainty in the stochastic arrival and service processes by uncertainty sets. This robust queueing framework enables us to approximate the steady-state PAoI performance of information-update systems with very general arrival and service processes, including those exhibiting heavy-tailed behavior. We first propose a new bound of the PAoI under the single-source system that performs much better than previous results, especially with light traffic. Then, we generalize it to multi-source systems with symmetric arrivals, which involves new technical challenges.Computer and Information Scienc
Latency Based Approach for Characterization of Cloud Application Performance.
PhDPublic cloud infrastructures provide 
exible hosting for web application providers, but the
rented virtual machines (VMs) often offer unpredictable performance to the deployed applications.
Understanding cloud performance is challenging for application providers, as clouds
provide limited information that would help them have expectations about their application
performance. In this thesis I present a technique to measure the performance of cloud applications,
based on observations of the application latency. I treat the cloud application as a
black box, making no assumption about the underlying platform. From my measurements, I
can observe the varying performance provided by the different VM profles across well-known
commercial cloud platforms. I also identify a trade-of between the responsiveness and the
load of the measured servers, which can help application providers in their deployment and
provisioning
Exploiting cost-performance tradeoffs for modern cloud systems
The trade-off between cost and performance is a fundamental challenge for modern cloud systems. This thesis explores cost-performance tradeoffs for three types of systems that permeate today's clouds, namely (1) storage, (2) virtualization, and (3) computation. A distributed key-value storage system must choose between the cost of keeping replicas synchronized (consistency) and performance (latency) or read/write operations. A cloud-based disaster recovery system can reduce the cost of managing a group of VMs as a single unit for recovery by implementing this abstraction in software (instead of hardware) at the risk of impacting application availability performance. As another example, run-time performance of graph analytics jobs sharing a multi-tenant cluster can be made better by trading of the cost of replication of the input graph data-set stored in the associated distributed file system. 
Today cloud system providers have to manually tune the system to meet desired trade-offs. This can be challenging since the optimal trade-off between cost and performance may vary depending on network and workload conditions.  Thus our hypothesis is that it is feasible to imbue a wide variety of cloud systems with adaptive and opportunistic mechanisms to efficiently navigate the cost-performance tradeoff space to meet desired tradeoffs. The types of cloud systems considered in this thesis include key-value stores, cloud-based disaster recovery systems, and multi-tenant graph computation engines. 
Our first contribution, PCAP is an adaptive distributed storage system. The foundation of the PCAP system is a probabilistic variation of the classical CAP theorem, which quantifies the (un-)achievable envelope of probabilistic consistency and latency under different network conditions characterized by a probabilistic partition model. Our PCAP system proposes adaptive mechanisms for tuning control knobs to meet desired consistency-latency tradeoffs expressed in terms in service-level agreements. 
Our second system, GeoPCAP is a geo-distributed extension of PCAP. In GeoPCAP, we propose generalized probabilistic composition rules for composing consistency-latency tradeoffs across geo-distributed instances of distributed key-value stores, each running on separate data-centers. GeoPCAP also includes a geo-distributed adaptive control system that adapts new controls knobs to meet SLAs across geo-distributed data-centers. 
Our third system, GCVM proposes a light-weight hypervisor-managed mechanism for taking crash consistent snapshots across VMs distributed over servers. This mechanism enables us to move the consistency group abstraction from hardware to software, and thus lowers reconfiguration cost while incurring modest VM pause times which impact application availability. 
Finally, our fourth contribution is a new opportunistic graph processing system called OPTiC for efficiently scheduling multiple graph analytics jobs sharing a multi-tenant cluster. By opportunistically creating at most 1 additional replica in the distributed file system (thus incurring cost), we show up to 50% reduction in median job completion time for graph processing jobs under realistic network and workload conditions. Thus with a modest increase in storage and bandwidth cost in disk, we can reduce job completion time (improve performance). 
For the first two systems (PCAP, and GeoPCAP), we exploit the cost-performance tradeoff space through efficient navigation of the tradeoff space to meet SLAs and perform close to the optimal tradeoff. For the third (GCVM) and fourth (OPTiC) systems, we move from one solution point to another solution point in the tradeoff space. For the last two systems, explicitly mapping out the tradeoff space allows us to consider new design tradeoffs for these systems
Staged Grid NewSQL Database System for OLTP and Big Data Applications
Big data applications demand and consequently lead to developments of diverse scalable data management systems, ranging from NoSQL systems to the emerging NewSQL systems. In order to serve thousands of applications and their huge amounts of data, data management systems must be capable of scale-out to clusters of commodity servers. The overarching goal of this dissertation is to propose principles,  paradigms and protocols to architect efficient, scalable and practical NewSQL database systems that address the unique set of challenges posed by the big data trend. This dissertation shows that with careful choice of design and features, it is possible to implement scalable NewSQL database systems that efficiently support transactional semantics to ease application design. In this dissertation, we first investigate, analyze and characterize current scalable data management systems in depth and develop comprehensive taxonomies for various critical aspects covering the data model, the system architecture and the consistency model.  On the basis of analyzing the scalability limitations of current systems, we then highlight the key principles for designing and implementing scalable NewSQL database systems. This dissertation advances the state-of-the-art by improving and providing satisfactory solutions to critical facets of NewSQL database systems. In particular, first we specify a staged grid architecture to support scalable and efficient transaction processing using clusters of commodity servers. The key insight is to disintegrate and reassemble system components into encapsulated staged modules. Effective behavior rules for communication are then defined to orchestrate independent staged modules deployed on networked computing nodes into one integrated system. Second, we propose a new formula-based protocol for distributed concurrency control to support thousands of concurrent users accessing data distributed over commodity servers.  The formula protocol for concurrency is a variation of the multi-version time-stamp concurrency control protocol, which guarantees serializability.  We reduce the overhead of conventional implementation by technologies including logical formula caching and dynamic timestamp ordering. Third, we identify a new consistency model-BASIC (Basic Availability, Scalability, Instant Consistency) that matches the requirements where extra efforts are not needed to manipulate inconsistent soft states of weak consistency models. BASIC extends the current understanding of CAP theorem by characterizing precisely different degree of dimensions that can be achieved rather than simply what cannot be done. We introduce all these novel ideas and features based on the implementation of Rubato DB, a highly scalable NewSQL database system.  We have conducted extensive experiments that clearly show that Rubato DB is highly scalable with efficient performance under both TPC-C and YCSB benchmarks.  These results verify that the staged grid architecture and the formula protocol  provide a satisfactory solution to one of the important challenges in the NewSQL database systems: to develop a highly scalable database management system that supports various consistency levels from ACID to BASE
Improving Caches in Consolidated Environments
Memory (cache, DRAM, and disk) is in charge of providing data and instructions to a computer’s processor. In order to maximize performance, the speeds of the memory and the processor should be equal. However, using memory that always match the speed of the processor is prohibitively expensive. Computer hardware designers have managed to drastically lower the cost of the system with the use of memory caches by sacrificing some performance. A cache is a small piece of fast memory that stores popular data so it can be accessed faster. Modern computers have evolved into a hierarchy of caches, where a memory level is the cache for a larger and slower memory level immediately below it. Thus, by using caches, manufacturers are able to store terabytes of data at the cost of cheapest memory while achieving speeds close to the speed of the fastest one.
The most important decision about managing a cache is what data to store in it. Failing to make good decisions can lead to performance overheads and over- provisioning. Surprisingly, caches choose data to store based on policies that have not changed in principle for decades. However, computing paradigms have changed radically leading to two noticeably different trends. First, caches are now consol- idated across hundreds to even thousands of processes. And second, caching is being employed at new levels of the storage hierarchy due to the availability of high-performance flash-based persistent media. This brings four problems. First, as the workloads sharing a cache increase, it is more likely that they contain dupli- cated data. Second, consolidation creates contention for caches, and if not managed carefully, it translates to wasted space and sub-optimal performance. Third, as contented caches are shared by more workloads, administrators need to carefully estimate specific per-workload requirements across the entire memory hierarchy in order to meet per-workload performance goals. And finally, current cache write poli- cies are unable to simultaneously provide performance and consistency guarantees for the new levels of the storage hierarchy.
We addressed these problems by modeling their impact and by proposing solu- tions for each of them. First, we measured and modeled the amount of duplication at the buffer cache level and contention in real production systems. Second, we created a unified model of workload cache usage under contention to be used by administrators for provisioning, or by process schedulers to decide what processes to run together. Third, we proposed methods for removing cache duplication and to eliminate wasted space because of contention for space. And finally, we pro- posed a technique to improve the consistency guarantees of write-back caches while preserving their performance benefits
