Search CORE

866 research outputs found

A Scalable Unsegmented Multiport Memory for FPGA-Based Systems

Author: Attia Osama
Jones Phillip
Jones Phillip
Townsend Kevin
Zambreno Joseph
Zambreno Joseph
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2015
Field of study

On-chip multiport memory cores are crucial primitives for many modern high-performance reconfigurable architectures and multicore systems. Previous approaches for scaling memory cores come at the cost of operating frequency, communication overhead, and logic resources without increasing the storage capacity of the memory. In this paper, we present two approaches for designing multiport memory cores that are suitable for reconfigurable accelerators with substantial on-chip memory or complex communication. Our design approaches tackle these challenges by banking RAM blocks and utilizing interconnect networks which allows scaling without sacrificing logic resources. With banking, memory congestion is unavoidable and we evaluate our multiport memory cores under different memory access patterns to gain insights about different design trade-offs. We demonstrate our implementation with up to 256 memory ports using a Xilinx Virtex-7 FPGA. Our experimental results report high throughput memories with resource usage that scales with the number of ports

Digital Repository @ Iowa State University (ISU)

Crossref

Directory of Open Access Journals

Multiport VNA Measurements

Author: Ferrero Andrea Pierenrico
Grossman B.
Martens J.
Ruttan T.G.
Teppati Valeria
Publication venue
Publication date: 01/01/2008
Field of study

This article presents some of the most recent multiport VNA measurement methodologies used to characterize these highspeed digital networks for signal integrity. There will be a discussion of the trends and measurement challenges of high-speed digital systems, followed by a presentation of the multiport VNA measurement system details, calibration, and measurement techniques, as well as some examples of interconnect device measurements. The intent here is to present some general concepts and trends for multiport VNA measurements as applied to computer system board-level interconnect structures, and not to promote any particular brand or produc

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Multi-port Memory Design for Advanced Computer Architectures

Author: Zhao Yirong
Publication venue
Publication date: 24/09/2013
Field of study

In this thesis, we describe and evaluate novel memory designs for multi-port on-chip and off-chip use in advanced computer architectures. We focus on combining multi-porting and evaluating the performance over a range of design parameters. Multi-porting is essential for caches and shared-data systems, especially multi-core System-on-chips (SOC). It can significantly increase the memory access throughput. We evaluate FinFET voltage-mode multi-port SRAM cells using different metrics including leakage current, static noise margin and read/write performance. Simulation results show that single-ended multi-port FinFET SRAMs with isolated read ports offer improved read stability and flexibility over classical double-ended structures at the expense of write performance. By increasing the size of the access transistors, we show that the single-ended multi-port structures can achieve equivalent write performance to the classical double-ended multi-port structure for 9% area overhead. Moreover, compared with CMOS SRAM, FinFET SRAM has better stability and standby power. We also describe new methods for the design of FinFET current-mode multi-port SRAM cells. Current-mode SRAMs avoid the full-swing of the bitline, reducing dynamic power and access time. However, that comes at the cost of voltage drop, which compromises stability. The design proposed in this thesis utilizes the feature of Independent Gate (IG) mode FinFET, which can leverage threshold voltage by controlling the back gate voltage, to merge two transistors into one through high-Vt and low-Vt transistors. This design not only reduces the voltage drop, but it also reduces the area in multi-port current-mode SRAM design. For off-chip memory, we propose a novel two-port 1-read, 1-write (1R1W) phasechange memory (PCM) cell, which significantly reduces the probability of blocking at the bank levels. Different from the traditional PCM cell, the access transistors are at the top and connected to the bitline. We use Verilog-A to model the behavior of Ge2Sb2Te5 (GST: the storage component). We evaluate the performance of the two-port cell by transistor sizing and voltage pumping. Simulation results show that pMOS transistor is more practical than nMOS transistor as the access device when both area and power are considered. The estimated area overhead is 1.7�, compared to single-port PCM cell. In brief, the contribution we make in this thesis is that we propose and evaluate three different kinds of multi-port memories that are favorable for advanced computer architectures

D-Scholarship@Pitt

Scalability of broadcast performance in wireless network-on-chip

Author: Abadal Cavallé Sergi
Alarcón Cot Eduardo José
Cabellos Aparicio Alberto
González Colás Antonio María
Lee Heekwan
Mestres Sugrañes Albert
Nemirovsky Mario
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Design Space Exploration of Algorithmic Multi-Port Memories in High-Performance Application-Specific Accelerators

Author: Sethi Khushal
Publication venue
Publication date: 18/07/2020
Field of study

Memory load/store instructions consume an important part in execution time and energy consumption in domain-specific accelerators. For designing highly parallel systems, available parallelism at each granularity is extracted from the workloads. The maximal use of parallelism at each granularity in these high-performance designs requires the utilization of multi-port memories. Currently, true multiport designs are less popular because there is no inherent EDA support for multiport memory beyond 2-ports, utilizing more ports requires circuit-level implementation and hence a high design time. In this work, we present a framework for Design Space Exploration of Algorithmic Multi-Port Memories (AMM) in ASICs. We study different AMM designs in the literature, discuss how we incorporate them in the Pre-RTL Aladdin Framework with different memory depth, port configurations and banking structures. From our analysis on selected applications from the MachSuite (accelerator benchmark suite), we understand and quantify the potential use of AMMs (as true multiport memories) for high performance in applications with low spatial locality in memory access patterns

arXiv.org e-Print Archive

Repeat-Until-Success Quantum Computing

Author: Almut Beige
L.-M. Duan
Leong Chuan Kwek
Yuan Liang Lim
Publication venue: 'American Physical Society (APS)'
Publication date: 20/04/2005
Field of study

We demonstrate the possibility to perform distributed quantum computing using only single photon sources (atom-cavity-like systems), linear optics and photon detectors. The qubits are encoded in stable ground states of the sources. To implement a universal two-qubit gate, two photons should be generated simultaneously and pass through a linear optics network, where a measurement is performed on them. Gate operations can be repeated until a success is heralded without destroying the qubits at any stage of the operation. In contrast to other schemes, this does not require explicit qubit-qubit interactions, a priori entangled ancillas nor the feeding of photons into photon sources.Comment: 5 pages, 2 figures, v3: substantially revised, v4: typos correcte

arXiv.org e-Print Archive

Crossref

FPGA based synchronous multi-port SRAM architecture for motion estimation

Author: Alves Luis Nero
Nalluri Purnachand
Navarro António
Publication venue
Publication date: 15/02/2013
Field of study

Very often in signal and video processing applications, there is a strong demand for accessing the same memory location through multiple read ports. For video processing applications like Motion Estimation (ME), the same pixel, as part of the search window, is used in many calculations of SAD (Sum of Absolute Differences). In a design for such applications, there is a trade-off between number of effective gates used and the maximum operating frequency. Particularly, in FPGAs, the existing block RAMs do not support multiple port access and the replication of DRAM (Distributed RAM) leads to significant increase in the number of used CLBs (Configurable Logic Blocks). The present paper analyses different approaches that were previously used to solve this problem (same location reading)and proposes an effective solution by using an efficient combinational logic to synchronously and simultaneously read the video pixel memory data through multiple read-ports

Repositório Institucional da Universidade de Aveiro

Concepts of Communication and Synchronization in FPGA-Based Embedded Multiprocessor Systems

Author: David Antonio-Torres
Publication venue: 'IntechOpen'
Publication date: 16/03/2012
Field of study

IntechOpen