565 research outputs found

    Workload Interleaving with Performance Guarantees in Data Centers

    Get PDF
    In the era of global, large scale data centers residing in clouds, many applications and users share the same pool of resources for the purposes of reducing energy and operating costs, and of improving availability and reliability. Along with the above benefits, resource sharing also introduces performance challenges: when multiple workloads access the same resources concurrently, contention may occur and introduce delays in the performance of individual workloads. Providing performance isolation to individual workloads needs effective management methodologies. The challenges of deriving effective management methodologies lie in finding accurate, robust, compact metrics and models to drive algorithms that can meet different performance objectives while achieving efficient utilization of resources. This dissertation proposes a set of methodologies aiming at solving the challenging performance isolation problem in workload interleaving in data centers, focusing on both storage components and computing components. at the storage node level, we focus on methodologies for better interleaving user traffic with background workloads, such as tasks for improving reliability, availability, and power savings. More specifically, a scheduling policy for background workload based on the statistical characteristics of the system busy periods and a methodology that quantitatively estimates the performance impact of power savings are developed. at the storage cluster level, we consider methodologies on how to efficiently conduct work consolidation and schedule asynchronous updates without violating user performance targets. More specifically, we develop a framework that can estimate beforehand the benefits and overheads of each option in order to automate the process of reaching intelligent consolidation decisions while achieving faster eventual consistency. at the computing node level, we focus on improving workload interleaving at off-the-shelf servers as they are the basic building blocks of large-scale data centers. We develop priority scheduling middleware that employs different policies to schedule background tasks based on the instantaneous resource requirements of the high priority applications running on the server node. Finally, at the computing cluster level, we investigate popular computing frameworks for large-scale data intensive distributed processing, such as MapReduce and its Hadoop implementation. We develop a new Hadoop scheduler called DyScale to exploit capabilities offered by heterogeneous cores in order to achieve a variety of performance objectives

    D-SPACE4Cloud: A Design Tool for Big Data Applications

    Get PDF
    The last years have seen a steep rise in data generation worldwide, with the development and widespread adoption of several software projects targeting the Big Data paradigm. Many companies currently engage in Big Data analytics as part of their core business activities, nonetheless there are no tools and techniques to support the design of the underlying hardware configuration backing such systems. In particular, the focus in this report is set on Cloud deployed clusters, which represent a cost-effective alternative to on premises installations. We propose a novel tool implementing a battery of optimization and prediction techniques integrated so as to efficiently assess several alternative resource configurations, in order to determine the minimum cost cluster deployment satisfying QoS constraints. Further, the experimental campaign conducted on real systems shows the validity and relevance of the proposed method

    Estimating data divergence in cloud computing storage systems

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaMany internet services are provided through cloud computing infrastructures that are composed of multiple data centers. To provide high availability and low latency, data is replicated in machines in different data centers, which introduces the complexity of guaranteeing that clients view data consistently. Data stores often opt for a relaxed approach to replication, guaranteeing only eventual consistency, since it improves latency of operations. However, this may lead to replicas having different values for the same data. One solution to control the divergence of data in eventually consistent systems is the usage of metrics that measure how stale data is for a replica. In the past, several algorithms have been proposed to estimate the value of these metrics in a deterministic way. An alternative solution is to rely on probabilistic metrics that estimate divergence with a certain degree of certainty. This relaxes the need to contact all replicas while still providing a relatively accurate measurement. In this work we designed and implemented a solution to estimate the divergence of data in eventually consistent data stores, that scale to many replicas by allowing clientside caching. Measuring the divergence when there is a large number of clients calls for the development of new algorithms that provide probabilistic guarantees. Additionally, unlike previous works, we intend to focus on measuring the divergence relative to a state that can lead to the violation of application invariants.Partially funded by project PTDC/EIA EIA/108963/2008 and by an ERC Starting Grant, Agreement Number 30773

    Introducing memory versatility to enhance memory system performance, energy efficiency and reliability

    Get PDF
    Main memory system is facing increasingly high pressure from the advances of computation power scaling. Nowadays memory systems are expected to have much higher capacity than before. However, DRAM devices have limited scalability. Higher capacity usually translates to proportional hardware and power cost. Memory compression is a promising technology to prevent it from happening. Previous memory compression works are generally based on rigid data layout which limits their performance. We thus propose Flexible Memory which supports out-of-order memory block layout to lower compression-related overhead and improve performance. Besides, the cost of memory reliability also increases with capacity growth. Conventional error protection schemes utilizes Hamming-based SECDED code that comes with 12.5\% capacity and power overhead of entire memory system. However it is not necessary to protect whole memory system because some data is not critical or not sensitive to memory errors. Memory capacity and power used in protecting them is almost wasted. Therefore, Selective Error Protection is necessary to lower the cost of large scale memory protection. The method to select critical data and non-critical data has been proposed before, however a memory system design to support its partitioned memory is challenging and does not exist at that time. Therefore, we propose a memory system design that has the capability to maintain two or more partitions with different layout in main memory at the same time. This design makes SEP schemes a complete practical design. Even with selective error protection, supporting memory reliability is still hurting scaling of memory capacity. Fortunately, memory data has been proved to be very compressible. Most common applications are expected to free up enough space that can be used to store their own ECC code. For these applications, memory reliability incurs very low space and power overhead. However, combining ECC and memory compression is not trivial. It is difficult to achieve high percentage of coverage over entire memory when compressibility of different memory blocks varies a lot. We thus introduce Flexible ECC that is based on Flexible Memory to allow easier ECC code placement. When a block has more choices to store its ECC code, it is more likely to be covered by ECC. With Flexible ECC, a larger portion of memory can be covered by ECC codes whose storage overhead is lowered by memory compression

    Managing contamination delay to improve Timing Speculation architectures

    Get PDF
    Timing Speculation (TS) is a widely known method for realizing better-than-worst-case systems. Aggressive clocking, realizable by TS, enable systems to operate beyond specified safe frequency limits to effectively exploit the data dependent circuit delay. However, the range of aggressive clocking for performance enhancement under TS is restricted by short paths. In this paper, we show that increasing the lengths of short paths of the circuit increases the effectiveness of TS, leading to performance improvement. Also, we propose an algorithm to efficiently add delay buffers to selected short paths while keeping down the area penalty. We present our algorithm results for ISCAS-85 suite and show that it is possible to increase the circuit contamination delay by up to 30% without affecting the propagation delay. We also explore the possibility of increasing short path delays further by relaxing the constraint on propagation delay and analyze the performance impact

    FloWatcher-DPDK: lightweight line-rate flow-level monitoring in software

    Get PDF
    In the last few years, several software-based solutions have been proved to be very efficient for high-speed packet processing, traffic generation and monitoring, and can be considered valid alternatives to expensive and non-flexible hardware-based solutions. In our work, we first benchmark heterogeneous design choices for software-based packet monitoring systems in terms of achievable performance and required resources (i.e., the number of CPU cores). Building on this extensive analysis we design FloWatcher-DPDK, a DPDK-based high-speed software traffic monitor we provide to the community as an open source project. In a nutshell, FloWatcher-DPDK provides tunable fine-grained statistics at packet and flow levels. Experimental results demonstrate that FloWatcher-DPDK sustains per-flow statistics with 5-nines precision at high-speed (e.g., 14.88 Mpps) using a limited amount of resources. Finally, we showcase the usage of FloWatcher-DPDK by configuring it to analyze the performance of two open source prototypes for stateful flow-level end-host and in-network packet processing
    • …
    corecore