Search CORE

43,254 research outputs found

A conflict-free memory banking architecture for fast VOQ packet buffers

Author: Cerdà Alabern Llorenç
Corbal San Adrián Jesús
García Vidal Jorge
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

In order to support the enormous growth of the Internet, innovative research in every router subsystem is needed. We focus our attention on packet buffer design for routers supporting high-speed line rates. More specifically, we address the design of packet buffers using virtual output queuing (VOQ), which are used in most modern router architectures. The design is based on a previously proposed scheme that uses a combination of SRAM and DRAM modules. We propose a storage scheme that achieves a conflict-free memory bank organization. This leads to a reduction of the granularity of DRAM accesses, resulting in a decrease of storage capacity needed by the SRAM. In the DRAM/SRAM scheme, SRAM memory bandwidth needs to fit the line rate. Since memory bandwidth is limited by its size, searching for memory schemes having a small SRAM size arises as an essential issue for high speed line rates (e.g. OC768, 40 Gbps and OC3072, 160 Gbps).Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

BAG : Managing GPU as buffer cache in operating systems

Author: Chen Hao
He Ligang
Li Kenli
Sun Jianhua
Tan Huailiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2014
Field of study

This paper presents the design, implementation and evaluation of BAG, a system that manages GPU as the buffer cache in operating systems. Unlike previous uses of GPUs, which have focused on the computational capabilities of GPUs, BAG is designed to explore a new dimension in managing GPUs in heterogeneous systems where the GPU memory is an exploitable but always ignored resource. With the carefully designed data structures and algorithms, such as concurrent hashtable, log-structured data store for the management of GPU memory, and highly-parallel GPU kernels for garbage collection, BAG achieves good performance under various workloads. In addition, leveraging the existing abstraction of the operating system not only makes the implementation of BAG non-intrusive, but also facilitates the system deployment

CiteSeerX

Warwick Research Archives Portal Repository

SSP: Eliminating Redundant Writes in Failure-Atomic NVRAMs via Shadow Sub-Paging

Author: Bittman Daniel
Coburn Joel
Hitz Dave
Kolli Aasheesh
Kwon Youngjin
Lee Changman
Lee Se Kwon
Minh Chi Cao
Ni Yuanjiang
Pelley Steven
Talluri Madhusudhan
Venkataraman Shivaram
Volos Haris
Xu Jian
Yang Jun
Zhao Jishen
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Crossref

eScholarship - University of California

Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays

Author: Chakrabarti K.
Daniel Lemire
Geffner S.
Gray J.
Lemire D.
Li B.-C.
Moerkotte G.
Owen Kaser
Schmidt R. R.
Scott D.
Vitter J. S.
Zhou F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2008
Field of study

Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array of size n and we want to compute the first N local moments, for some constant N. Without precomputation, this requires O(n) time. We develop a sequence of algorithms of increasing sophistication that use precomputation and additional buffer space to speed up queries. The simpler algorithms partition the I/O array into consecutive ranges called bins, and they are applicable not only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM, etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A more sophisticated approach uses hierarchical buffering and has a logarithmic time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b. Using Overlapped Bin Buffering, we show that only a single buffer is needed, as with wavelet-based algorithms, but using much less storage. Applications exist in multidimensional and statistical databases over massive data sets, interactive image processing, and visualization

arXiv.org e-Print Archive

R-libre

Crossref

Recommended from our members

Semantics and synthesis of signals in behavioral VHDL

Author: Gajski Daniel D.
Narayan Sanjiv
Ramachandran Loganath
Vahid Frank
Publication venue: eScholarship, University of California
Publication date: 23/03/1992
Field of study

Signals are a fundamental part of VHDL behavioral descriptions. There are many kinds of VHDL signals, each possesing complex and hence often misunderstood semantics. The result is that synthesis tools often inadequately address synthesis of signals. In this report, we first make clear the semantics of the various signal kinds shared by multiple processes through the use of conceptual hardware, rather than just text. Second, with the semantics firmly understood, we discuss techniques and issues in synthesizing actual hardware for shared signals. This information can be used to take a step towards synthesizing correct hardware from VHDL descriptions while greatly reducing current restrictions imposed by synthesis tools on allowable VHDL behavior

eScholarship - University of California

LEAP Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic [Extended Version]

Author: Adler Michael
Emer Joel
Fleming Kermin E.
Parashar Angshuman
Pellauer Michael
Publication venue
Publication date: 23/11/2010
Field of study

CORRECTION: The authors for entry [4] in the references should have been "E. S. Chung, J. C. Hoe, and K. Mai".Developers accelerating applications on FPGAs or other reconfigurable logic have nothing but raw memory devices in their standard toolkits. Each project typically includes tedious development of single-use memory management. Software developers expect a programming environment to include automatic memory management. Virtual memory provides the illusion of very large arrays and processor caches reduce access latency without explicit programmer instructions. LEAP scratchpads for reconfigurable logic dynamically allocate and manage multiple, independent, memory arrays in a large backing store. Scratchpad accesses are cached automatically in multiple levels, ranging from shared on-board, RAM-based, set-associative caches to private caches stored in FPGA RAM blocks. In the LEAP framework, scratchpads share the same interface as on-die RAM blocks and are plug-in replacements. Additional libraries support heap management within a storage set. Like software developers, accelerator authors using scratchpads may focus more on core algorithms and less on memory management. Two uses of FPGA scratchpads are analyzed: buffer management in an H.264 decoder and memory management within a processor microarchitecture timing model

DSpace@MIT