20 research outputs found
Recommended from our members
Overcoming the Intuition Wall: Measurement and Analysis in Computer Architecture
These are exciting times for computer architecture research. Today there is significant demand to improve the performance and energy-efficiency of emerging, transformative applications which are being hammered out by the hundreds for new computing platforms and usage models. This booming growth of applications and the variety of programming languages used to create them is challenging our ability as architects to rapidly and rigorously characterize these applications. Concurrently, hardware has become more complex with the emergence of accelerators, multicore systems, and heterogeneity caused by further divergence between processor market segments. No one architect can now understand all the complexities of many systems and reason about the full impact of changes or new applications.
To that end, this dissertation presents four case studies in quantitative methods. Each case study attacks a different application and proposes a new measurement or analytical technique. In each case study we find at least one surprising or unintuitive result which would likely not have been found without the application of our method
Recommended from our members
Anti-Virus in Silicon
Anti-virus (AV) software is fundamentally broken. AV systems today rely on correct functioning of not only the AV software but also the underlying OS and VMM. Thus proper functioning of software AV requires millions of lines of complex code – which houses thousands of bugs – to work correctly. Needless to say, and as evidenced in numerous software AV attacks, effective software AV systems have been difficult to build. At the same time, malware incidents are increasing and there is strong demand for good anti-virus solutions; the software anti-virus market is estimated at close to 8B dollars annually.
In this work we present a new class of robust AV systems called Silicon anti-virus systems. Unlike software AV systems, these systems are lean and mostly implemented in hardware to avoid reliance on complex software, but, like software AV systems, are updatable in the field when new malware is encountered. We describe the first generation of silicon AV that uses simple machine learning techniques with existing performance counter infrastructure. Our published and unpublished work shows that common malware such as viruses and adware, and even zero day exploits can be detected accurately. These systems form a very effective first-line, energy- efficient defense against malware
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services
Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate arrays (FPGA). Each server in the fabric contains one FPGA, and all FPGAs within a 48-server rack are interconnected over a low-latency, high-bandwidth network. We describe a medium-scale deployment of this fabric on a bed of 1632 servers, and measure its effectiveness in accelerating the ranking component of the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system. Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by 95% at a desirable latency distribution or reduces tail latency by 29% at a fixed throughput. In other words, the reconfigurable fabric enables the same throughput using only half the number of servers
ELite: Cost-effective approximation of exploration-based graph analysis
Vertex-centric block synchronous processing systems, exemplified by Pregel and Giraph, have received extensive attention for graph processing. These systems allow programmers to think only about operations that take place at one vertex and provide the underlying computation framework that involves multiple iterations (supersteps) with communication between neighboring vertices between supersteps. As graphs grow in size to billions of vertices and trillions of edges, processing them in this model face challenges: (1) The poor latency of supersteps dominated by the tasks performed on high degree vertices or densely connected components; and (2) The overwhelming network communication among vertices that can be proved of high redundancy. For many applications, approximate results are acceptable, and if these can be computed rapidly, they may be preferable. Many of the existing approximate solutions suffer from algorithm-specific designs that are not generic or lacking theoretical guarantees on the results\u27 quality. In this paper we tackle this problem using a generic approach that can be incorporated into the graph processing platform. The approach we advocate involves communicating vertex states to a subset of the neighbors at each superstep; this is called selective edge lookup. We show how this approach can be incorporated into two primitive graph operators: BFS and DFS, which can be the basis of many graph analysis workloads. Extensive experiments over real-world and synthetic graphs validate the effectiveness and efficiency of the selective edge lookup approach