Search CORE

6 research outputs found

LEAP Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic [Extended Version]

Author: Adler Michael
Emer Joel
Fleming Kermin E.
Parashar Angshuman
Pellauer Michael
Publication venue
Publication date: 23/11/2010
Field of study

CORRECTION: The authors for entry [4] in the references should have been "E. S. Chung, J. C. Hoe, and K. Mai".Developers accelerating applications on FPGAs or other reconfigurable logic have nothing but raw memory devices in their standard toolkits. Each project typically includes tedious development of single-use memory management. Software developers expect a programming environment to include automatic memory management. Virtual memory provides the illusion of very large arrays and processor caches reduce access latency without explicit programmer instructions. LEAP scratchpads for reconfigurable logic dynamically allocate and manage multiple, independent, memory arrays in a large backing store. Scratchpad accesses are cached automatically in multiple levels, ranging from shared on-board, RAM-based, set-associative caches to private caches stored in FPGA RAM blocks. In the LEAP framework, scratchpads share the same interface as on-die RAM blocks and are plug-in replacements. Additional libraries support heap management within a storage set. Like software developers, accelerator authors using scratchpads may focus more on core algorithms and less on memory management. Two uses of FPGA scratchpads are analyzed: buffer management in an H.264 decoder and memory management within a processor microarchitecture timing model

DSpace@MIT

FPGA-accelerated group-by aggregation using synchronizing caches

Author: Absalyamov I
Budhkar P
Halstead RJ
Najjar WA
Tsotras VJ
Windh S
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Recent trends in hardware have dramatically dropped the price of RAM and shifted focus from systems operating on disk-resident data to in-memory solutions. In this environment high memory access latency, also known as memory wall, becomes the biggest data processing bottleneck. Traditional CPU-based architectures solved this problem by introducing large cache hierarchies. However algorithms which experience poor locality can limit the benefits of caching. In turn, hardware multithreading provides a generic solution that does not rely on algorithm-specific locality properties. In this paper we present an FPGA-accelerated implementation of in-memory group-by hash aggregation. Our design relies on hardware multithreading to efficiently mask long memory access latency by implementing a custom operation datapath on FPGA. We propose using CAMs (Content Addressable Memories) as a mechanism of synchronization and local pre-aggregation. To the best of our knowledge this is the first work, which uses CAMs as a synchronizing cache. We evaluate aggregation throughput against the state-of-the-art multithreaded software implementations and demonstrate that the FPGA-accelerated approach significantly outperforms them on large grouping key cardinalities and yields speedup up to 10x

Crossref

eScholarship - University of California

Area-efficient near-associative memories on FPGAs

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Crossref

Recommended from our members

A New N-way Reconfigurable Data Cache Architecture for Embedded Systems

Author: Bani Ruchi Rastogi
Publication venue: 'University of North Texas Libraries'
Publication date: 01/12/2009
Field of study

Performance and power consumption are most important issues while designing embedded systems. Several studies have shown that cache memory consumes about 50% of the total power in these systems. Thus, the architecture of the cache governs both performance and power usage of embedded systems. A new N-way reconfigurable data cache is proposed especially for embedded systems. This thesis explores the issues and design considerations involved in designing a reconfigurable cache. The proposed reconfigurable data cache architecture can be configured as direct-mapped, two-way, or four-way set associative using a mode selector. The module has been designed and simulated in Xilinx ISE 9.1i and ModelSim SE 6.3e using the Verilog hardware description language

UNT Digital Library

A parameterized automatic cache generator for FPGAs

Author: Jonathan Rose
Peter Yiannacouras
Publication venue
Publication date: 01/01/2003
Field of study

Soft Processors, which are processors implemented in the programmable fabric on FPGAs, are finding a multitude of applications in modern systems. An important part of processor design are the caches that have been used to alleviate the degradation in performance caused by accessing slow memory. In this paper we present a cache generator which can produce caches with a variety of associativities, latencies, and dimensions. This tool allows processor system designers to effortlessly create, and investigate different caches in order to best meet the needs of their target system. The effect of these three parameters on the area and speed of the caches is also examined and we show that the designs can meet a wide range of specifications and are in general fast and compact. 1

CiteSeerX