16 research outputs found

    CPHASH: A cache-partitioned hash table

    Get PDF
    CPHash is a concurrent hash table for multicore processors. CPHash partitions its table across the caches of cores and uses message passing to transfer lookups/inserts to a partition. CPHash's message passing avoids the need for locks, pipelines batches of asynchronous messages, and packs multiple messages into a single cache line transfer. Experiments on a 80-core machine with 2 hardware threads per core show that CPHash has ~1.6x higher throughput than a hash table implemented using fine-grained locks. An analysis shows that CPHash wins because it experiences fewer cache misses and its cache misses are less expensive, because of less contention for the on-chip interconnect and DRAM. CPServer, a key/value cache server using CPHash, achieves ~5% higher throughput than a key/value cache server that uses a hash table with fine-grained locks, but both achieve better throughput and scalability than memcached. The throughput of CPHash and CPServer also scale near-linearly with the number of cores.Quanta Computer (Firm)National Science Foundation (U.S.). (Award 915164

    Performance characterization and analysis for Hadoop K-means iteration

    Get PDF

    Proliferating Cloud Density through Big Data Ecosystem, Novel XCLOUDX Classification and Emergence of as-a-Service Era

    Get PDF
    Big Data is permeating through the bigger aspect of human life for scientific and commercial dependencies, especially for massive scale data analytics of beyond the exabyte magnitude. As the footprint of Big Data applications is continuously expanding, the reliability on cloud environments is also increasing to obtain appropriate, robust and affordable services to deal with Big Data challenges. Cloud computing avoids any need to locally maintain the overly scaled computing infrastructure that include not only dedicated space, but the expensive hardware and software also. Several data models to process Big Data are already developed and a number of such models are still emerging, potentially relying on heterogeneous underlying storage technologies, including cloud computing. In this paper, we investigate the growing role of cloud computing in Big Data ecosystem. Also, we propose a novel XCLOUDX {XCloudX, X…X} classification to zoom in to gauge the intuitiveness of the scientific name of the cloud-assisted NoSQL Big Data models and analyze whether XCloudX always uses cloud computing underneath or vice versa. XCloudX symbolizes those NoSQL Big Data models that embody the term “cloud” in their name, where X is any alphanumeric variable. The discussion is strengthen by a set of important case studies. Furthermore, we study the emergence of as-a-Service era, motivated by cloud computing drive and explore the new members beyond traditional cloud computing stack, developed over the last few years

    Energy Efficient Big Data Networks: Impact of Volume and Variety

    Get PDF
    In this article, we study the impact of big data’s volume and variety dimensions on Energy Efficient Big Data Networks (EEBDN) by developing a Mixed Integer Linear Programming (MILP) model to encapsulate the distinctive features of these two dimensions. Firstly, a progressive energy efficient edge, intermediate, and central processing technique is proposed to process big data’s raw traffic by building processing nodes (PNs) in the network along the way from the sources to datacenters. Secondly, we validate the MILP operation by developing a heuristic that mimics, in real time, the behaviour of the MILP for the volume dimension. Thirdly, we test the energy efficiency limits of our green approach under several conditions where PNs are less energy efficient in terms of processing and communication compared to data centers. Fourthly, we test the performance limits in our energy efficient approach by studying a “software matching” problem where different software packages are required to process big data. The results are then compared to the Classical Big Data Networks (CBDN) approach where big data is only processed inside centralized data centers. Our results revealed that up to 52% and 47% power saving can be achieved by the EEBDN approach compared to the CBDN approach, under the impact of volume and variety scenarios, respectively. Moreover, our results identify the limits of the progressive processing approach and in particular the conditions under which the CBDN centralized approach is more appropriate given certain PNs energy efficiency and software availability levels

    Customized Interfaces for Modern Storage Devices

    Get PDF
    In the past decade, we have seen two major evolutions on storage technologies: flash storage and non-volatile memory. These storage technologies are both vastly different in their properties and implementations than the disk-based storage devices that current soft- ware stacks and applications have been built for and optimized over several decades. The second major trend that the industry has been witnessing is new classes of applications that are moving away from the conventional ACID (SQL) database access to storage. The resulting new class of NoSQL and in-memory storage applications consume storage using entirely new application programmer interfaces than their predecessors. The most significant outcome given these trends is that there is a great mismatch in terms of both application access interfaces and implementations of storage stacks when consuming these new technologies. In this work, we study the unique, intrinsic properties of current and next-generation storage technologies and propose new interfaces that allow application developers to get the most out of these storage technologies without having to become storage experts them- selves. We first build a new type of NoSQL key-value (KV) store that is FTL-aware rather than flash optimized. Our novel FTL cooperative design for KV store proofed to simplify development and outperformed state of the art KV stores, while reducing write amplification. Next, to address the growing relevance of byte-addressable persistent memory, we build a new type of KV store that is customized and optimized for persistent memory. The resulting KV store illustrates how to program persistent effectively while exposing a simpler interface and performing better than more general solutions. As the final component of the thesis, we build a generic, native storage solution for byte-addressable persistent memory. This new solution provides the most generic interface to applications, allow- ing applications to store and manipulate arbitrarily structured data with strong durability and consistency properties. With this new solution, existing applications as well as new “green field” applications will get to experience native performance and interfaces that are customized for the next storage technology evolution

    RMem: An OS Service for Transparent Remote Memory Access in Lightweight Manycores

    Get PDF
    International audienceLightweight manycores deliver high performance and scal-ability at low power consumption. However, architectural intricacies of these processors impose programmability challenges that keep them away from mass adoption. While several efforts aim at introducing parallel programming environments to lightweight manycores, few initiatives are concerned about how to design rich Operating Systems (OSs) to them. In this work, we focus on the open challenges that arise from constrained memory subsystems of lightweight manycores, such as the presence of multiple address spaces and limited on-chip memory. To cope with transparent data access in this scenario, we introduce an OS service, named RMem. This service provides a shared memory abstraction over multiple address spaces and exposes system calls that enable one-sided communication on top of this abstraction. We implemented a prototype of our service in the Nanvix research OS, and we deployed the system the Kalray MPPA-256 lightweight manycore. Our experimental results with a microbenchmark unveiled that, while exposing an easier-to-program interface, the RMem Service may deliver about 91% of the write performance and up to 2.4Ă— better read performance than the primitives in the libraries of the experimental platform

    Physically Dense Server Architectures.

    Full text link
    Distributed, in-memory key-value stores have emerged as one of today's most important data center workloads. Being critical for the scalability of modern web services, vast resources are dedicated to key-value stores in order to ensure that quality of service guarantees are met. These resources include: many server racks to store terabytes of key-value data, the power necessary to run all of the machines, networking equipment and bandwidth, and the data center warehouses used to house the racks. There is, however, a mismatch between the key-value store software and the commodity servers on which it is run, leading to inefficient use of resources. The primary cause of inefficiency is the overhead incurred from processing individual network packets, which typically carry small payloads, and require minimal compute resources. Thus, one of the key challenges as we enter the exascale era is how to best adjust to the paradigm shift from compute-centric to storage-centric data centers. This dissertation presents a hardware/software solution that addresses the inefficiency issues present in the modern data centers on which key-value stores are currently deployed. First, it proposes two physical server designs, both of which use 3D-stacking technology and low-power CPUs to improve density and efficiency. The first 3D architecture---Mercury---consists of stacks of low-power CPUs with 3D-stacked DRAM. The second architecture---Iridium---replaces DRAM with 3D NAND Flash to improve density. The second portion of this dissertation proposes and enhanced version of the Mercury server design---called KeyVault---that incorporates integrated, zero-copy network interfaces along with an integrated switching fabric. In order to utilize the integrated networking hardware, as well as reduce the response time of requests, a custom networking protocol is proposed. Unlike prior works on accelerating key-value stores---e.g., by completely bypassing the CPU and OS when processing requests---this work only bypasses the CPU and OS when placing network payloads into a process' memory. The insight behind this is that because most of the overhead comes from processing packets in the OS kernel---and not the request processing itself---direct placement of packet's payload is sufficient to provide higher throughput and lower latency than prior approaches.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111414/1/atgutier_1.pd

    Architecting Efficient Data Centers.

    Full text link
    Data center power consumption has become a key constraint in continuing to scale Internet services. As our society’s reliance on “the Cloud” continues to grow, companies require an ever-increasing amount of computational capacity to support their customers. Massive warehouse-scale data centers have emerged, requiring 30MW or more of total power capacity. Over the lifetime of a typical high-scale data center, power-related costs make up 50% of the total cost of ownership (TCO). Furthermore, the aggregate effect of data center power consumption across the country cannot be ignored. In total, data center energy usage has reached approximately 2% of aggregate consumption in the United States and continues to grow. This thesis addresses the need to increase computational efficiency to address this grow- ing problem. It proposes a new classes of power management techniques: coordinated full-system idle low-power modes to increase the energy proportionality of modern servers. First, we introduce the PowerNap server architecture, a coordinated full-system idle low- power mode which transitions in and out of an ultra-low power nap state to save power during brief idle periods. While effective for uniprocessor systems, PowerNap relies on full-system idleness and we show that such idleness disappears as the number of cores per processor continues to increase. We expose this problem in a case study of Google Web search in which we demonstrate that coordinated full-system active power modes are necessary to reach energy proportionality and that PowerNap is ineffective because of a lack of idleness. To recover full-system idleness, we introduce DreamWeaver, architectural support for deep sleep. DreamWeaver allows a server to exchange latency for full-system idleness, allowing PowerNap-enabled servers to be effective and provides a better latency- power savings tradeoff than existing approaches. Finally, this thesis investigates workloads which achieve efficiency through methodical cluster provisioning techniques. Using the popular memcached workload, this thesis provides examples of provisioning clusters for cost-efficiency given latency, throughput, and data set size targets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91499/1/meisner_1.pd

    Building A Scalable And High-Performance Key-Value Store System

    Get PDF
    Contemporary web sites can store and process very large amounts of data. To provide timely service to their users, they have adopted key-value (KV) stores, which is a simple but effective caching infrastructure atop the conventional databases that store these data, to boost performance. Examples are Facebook, Twitter and Amazon. As yet little is known about the realistic workloads outside of the companies that operate them, this dissertation work provides a detailed workload study on Facebook\u27s Memcached, which is one of the world\u27s largest KV deployment. We analyze the Memcached workload from the perspective of server-side performance, request composition, caching efficacy, and key locality. The observations presented in this dissertation lead to several design insights and new research direction for KV stores - Hippos, a high-throughput, low-latency, and energy-efficient KV-store implementation. Long considered an application that is memory-bound and network-bound, re- cent KV-store implementations on multicore servers grow increasingly CPU-bound instead. This limitation often leads to under-utilization of available bandwidth and poor energy efficiency, as well as long response times under heavy load. To address these issues, Hippos moves the KV-store into the operating system\u27s kernel and thus removes most of the overhead associated with the network stack and system calls. It uses the Netfilter framework to quickly handle UDP packets, removing the overhead of UDP-based GET requests almost entirely. Combined with lock-free multithreaded data access, Hippos removes several performance bottlenecks both internal and external to the KV-store application. Hippos is prototyped as a Linux loadable kernel module and evaluated it against the ubiquitous Memcached using various micro-benchmarks and workloads from Face- book\u27s production systems. The experiments show that Hippos provides some 20- 200% throughput improvements on a 1Gbps network (up to 590% improvement on a 10Gbps network) and 5-20% saving of power compared with Memcached

    On the Energy Efficiency of Networked Systems

    Get PDF
    Energy is a first-class resource for datacenter operators since its cost is the biggest limiting factor in scaling a large computing facility. The solution embraced by major operators is to build their facilities in strategic geographical locations and to abandon expensive specialized hardware for cheap commodity systems. However, such systems are not efficient when it comes to energy and a considerable amount of research effort has been put in finding a solution to this problem. Furthermore, the need for more programmable and flexible networking devices is pushing the need for hardware commoditization also within the datacenter network. In this thesis we propose two solutions aimed at improving the overall energy efficiency of a datacenter facility. The first address efficiency in computing, by proposing a different hardware architecture for server systems. We propose a hybrid architecture that blends traditional server processors with very-low-power processors from the mobile devices world. The second solution envisions the usage of current server platforms as network switches or routers and provides guidelines for the implementation of power saving algorithms that do not affect peak performance while saving up to 50% power. This work is based on both theoretical modeling and simulation and experimentation with real-world prototypes
    corecore