43,254 research outputs found
A conflict-free memory banking architecture for fast VOQ packet buffers
In order to support the enormous growth of the Internet, innovative research in every router subsystem is needed. We focus our attention on packet buffer design for routers supporting high-speed line rates. More specifically, we address the design of packet buffers using virtual output queuing (VOQ), which are used in most modern router architectures. The design is based on a previously proposed scheme that uses a combination of SRAM and DRAM modules. We propose a storage scheme that achieves a conflict-free memory bank organization. This leads to a reduction of the granularity of DRAM accesses, resulting in a decrease of storage capacity needed by the SRAM. In the DRAM/SRAM scheme, SRAM memory bandwidth needs to fit the line rate. Since memory bandwidth is limited by its size, searching for memory schemes having a small SRAM size arises as an essential issue for high speed line rates (e.g. OC768, 40 Gbps and OC3072, 160 Gbps).Peer ReviewedPostprint (published version
BAG : Managing GPU as buffer cache in operating systems
This paper presents the design, implementation and evaluation of BAG, a system that manages GPU as the buffer cache in operating systems. Unlike previous uses of GPUs, which have focused on the computational capabilities of GPUs, BAG is designed to explore a new dimension in managing GPUs in heterogeneous systems where the GPU memory is an exploitable but always ignored resource. With the carefully designed data structures and algorithms, such as concurrent hashtable, log-structured data store for the management of GPU memory, and highly-parallel GPU kernels for garbage collection, BAG achieves good performance under various workloads. In addition, leveraging the existing abstraction of the operating system not only makes the implementation of BAG non-intrusive, but also facilitates the system deployment
Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays
Local moments are used for local regression, to compute statistical measures
such as sums, averages, and standard deviations, and to approximate probability
distributions. We consider the case where the data source is a very large I/O
array of size n and we want to compute the first N local moments, for some
constant N. Without precomputation, this requires O(n) time. We develop a
sequence of algorithms of increasing sophistication that use precomputation and
additional buffer space to speed up queries. The simpler algorithms partition
the I/O array into consecutive ranges called bins, and they are applicable not
only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM,
etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A
more sophisticated approach uses hierarchical buffering and has a logarithmic
time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b.
Using Overlapped Bin Buffering, we show that only a single buffer is needed, as
with wavelet-based algorithms, but using much less storage. Applications exist
in multidimensional and statistical databases over massive data sets,
interactive image processing, and visualization
Recommended from our members
Semantics and synthesis of signals in behavioral VHDL
Signals are a fundamental part of VHDL behavioral descriptions. There are many kinds of VHDL signals, each possesing complex and hence often misunderstood semantics. The result is that synthesis tools often inadequately address synthesis of signals. In this report, we first make clear the semantics of the various signal kinds shared by multiple processes through the use of conceptual hardware, rather than just text. Second, with the semantics firmly understood, we discuss techniques and issues in synthesizing actual hardware for shared signals. This information can be used to take a step towards synthesizing correct hardware from VHDL descriptions while greatly reducing current restrictions imposed by synthesis tools on allowable VHDL behavior
LEAP Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic [Extended Version]
CORRECTION: The authors for entry [4] in the references should have been "E. S. Chung,
J. C. Hoe, and K. Mai".Developers accelerating applications on FPGAs or other reconfigurable logic have nothing but raw memory devices in their standard toolkits. Each project typically includes tedious development of single-use memory management. Software developers expect a programming environment to include automatic memory management. Virtual memory provides the illusion of very large arrays and processor caches reduce access latency without explicit programmer instructions. LEAP scratchpads for reconfigurable logic dynamically allocate and manage multiple, independent, memory arrays in a large backing store. Scratchpad accesses are cached automatically in multiple levels, ranging from shared on-board, RAM-based, set-associative caches to private caches stored in FPGA RAM blocks. In the LEAP framework, scratchpads share the same interface as on-die RAM blocks and are plug-in replacements. Additional libraries support heap management within a storage set. Like software developers, accelerator authors using scratchpads may focus more on core algorithms and less on memory management. Two uses of FPGA scratchpads are analyzed: buffer management in an H.264 decoder and memory management within a processor microarchitecture timing model
- …