Search CORE

8 research outputs found

A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs

Author: Doug Burger
Jeremy Fowers
Joo-Young Kim
Scott Hauck
Publication venue
Publication date: 24/04/2020
Field of study

Abstract-Data compression techniques have been the subject of intense study over the past several decades due to exponential increases in the quantity of data stored and transmitted by computer systems. Compression algorithms are traditionally forced to make tradeoffs between throughput and compression quality (the ratio of original file size to compressed file size). FPGAs represent a compelling substrate for streaming applications such as data compression thanks to their capacity for deep pipelines and custom caching solutions. Unfortunately, data hazards in compression algorithms such as LZ77 inhibit the creation of deep pipelines without sacrificing some amount of compression quality. In this work we detail a scalable fully pipelined FPGA accelerator that performs LZ77 compression and static Huffman encoding at rates up to 5.6 GB/s. Furthermore, we explore tradeoffs between compression quality and FPGA area that allow the same throughput at a fraction of the logic utilization in exchange for moderate reductions in compression quality. Compared to recent FPGA compression studies, our emphasis on scalability gives our accelerator a 3.0x advantage in resource utilization at equivalent throughput and compression ratio

CiteSeerX

A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

Author: Burger Doug
Caulfield Adrian M.
Chiou Derek
Chung Eric S.
Constantinides Kypros
Demme John
Esmaeilzadeh Hadi
Fowers Jeremy
Gopal Gopi Prashanth
Gray Jan
Haselman Michael
Hauck Scott
Heil Stephen
Hormati Amir
Kim Joo-Young
Lanka Sitaram
Larus James
Peterson Eric
Pope Simon
Putnam Andrew
Smith Aaron
Thong Jason
Xiao Phillip Yi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/01/2017
Field of study

Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate arrays (FPGA). Each server in the fabric contains one FPGA, and all FPGAs within a 48-server rack are interconnected over a low-latency, high-bandwidth network. We describe a medium-scale deployment of this fabric on a bed of 1632 servers, and measure its effectiveness in accelerating the ranking component of the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system. Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by 95% at a desirable latency distribution or reduces tail latency by 29% at a fixed throughput. In other words, the reconfigurable fabric enables the same throughput using only half the number of servers

Infoscience - École polytechnique fédérale de Lausanne

A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications

Author: Greg Brown
Greg Stitt
Jeremy Fowers
Patrick Cooke
Publication venue
Publication date: 01/01/2012
Field of study

With the emergence of accelerator devices such as multicores, graphics-processing units (GPUs), and field-programmable gate arrays (FPGAs), application designers are confronted with the problem of searching a huge design space that has been shown to have widely varying performance and energy metrics for different accelerators, different application domains, and different use cases. To address this problem, numerous studies have evaluated specific applications across different accelerators. In this paper, we analyze an important domain of applications, referred to as sliding-window applications, when executing on FPGAs, GPUs, and multicores. For each device, we present optimization strategies and analyze use cases where each device is most effective. The results show that FPGAs can achieve speedup of up to 11x and 57x compared to GPUs and multicores, respectively, while also using orders of magnitude less energy

CiteSeerX

Crossref

Accelerating Deep Convolutional Neural Networks Using Specialized Hardware

Author: Eric S Chung
Jeremy Fowers
Joo-Young Kim
Kalin Ovtcharov
Karin Strauss
Olatunji Ruwase
Publication venue
Publication date: 03/04/2020
Field of study

Abstract Recent breakthroughs in the development of multi-layer convolutional neural networks have led to stateof-the-art improvements in the accuracy of non-trivial recognition tasks such as large-category image classification and automatic speech recognition Hardware specialization in the form of GPGPUs, FPGAs, and ASICs 1 offers a promising path towards major leaps in processing capability while achieving high energy efficiency. To harness specialization, an effort is underway at Microsoft to accelerate Deep Convolutional Neural Networks (CNN) using servers augmented with FPGAs-similar to the hardware that is being integrated into some of Microsoft's datacenter

CiteSeerX

A reconfigurable fabric for accelerating large-scale datacenter services

Author: Aaron Smith
Adrian M. Caulfield
Amir Hormati
Andrew Putnam
Derek Chiou
Eric Peterson
Eric S. Chung
Gopal Jan
Gopi Prashanth
Gray Michael
Hadi Esmaeilzadeh
Haselman Scott
Hauck Stephen Heil
James Larus
Jason Thong
Jeremy Fowers
John Demme
Joo-young Kim
Kypros Constantinides
Phillip Yi
Simon Pope
Sitaram Lanka
Xiao Doug Burger
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/11/2014
Field of study

Datacenter workloads demand high computational capabili-ties, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance dat-acenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, recon-figurable fabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6x8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, acces-sible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables. In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by a factor of 95 % for a fixed latency distribution— or, while maintaining equivalent throughput, reduces the tail latency by 29%. 1

CiteSeerX

Crossref