20 research outputs found

    CS 172.02: Introduction to Computer Modeling

    Get PDF

    A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

    Get PDF
    Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate arrays (FPGA). Each server in the fabric contains one FPGA, and all FPGAs within a 48-server rack are interconnected over a low-latency, high-bandwidth network. We describe a medium-scale deployment of this fabric on a bed of 1632 servers, and measure its effectiveness in accelerating the ranking component of the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system. Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by 95% at a desirable latency distribution or reduces tail latency by 29% at a fixed throughput. In other words, the reconfigurable fabric enables the same throughput using only half the number of servers

    Rapid identification of architectural bottlenecks via precise event counting

    No full text

    ELite: Cost-effective approximation of exploration-based graph analysis

    No full text
    Vertex-centric block synchronous processing systems, exemplified by Pregel and Giraph, have received extensive attention for graph processing. These systems allow programmers to think only about operations that take place at one vertex and provide the underlying computation framework that involves multiple iterations (supersteps) with communication between neighboring vertices between supersteps. As graphs grow in size to billions of vertices and trillions of edges, processing them in this model face challenges: (1) The poor latency of supersteps dominated by the tasks performed on high degree vertices or densely connected components; and (2) The overwhelming network communication among vertices that can be proved of high redundancy. For many applications, approximate results are acceptable, and if these can be computed rapidly, they may be preferable. Many of the existing approximate solutions suffer from algorithm-specific designs that are not generic or lacking theoretical guarantees on the results\u27 quality. In this paper we tackle this problem using a generic approach that can be incorporated into the graph processing platform. The approach we advocate involves communicating vertex states to a subset of the neighbors at each superstep; this is called selective edge lookup. We show how this approach can be incorporated into two primitive graph operators: BFS and DFS, which can be the basis of many graph analysis workloads. Extensive experiments over real-world and synthetic graphs validate the effectiveness and efficiency of the selective edge lookup approach
    corecore