Search CORE

11,914 research outputs found

Honeycomb: ordered key-value store acceleration on an FPGA-based SmartNIC

Author: Castro Miguel
Dragojevic Aleksandar
Flemming Shane
Kalia Anuj
Katsarakis Antonios
Korolija Dario
Liu Junyi
Ng Ho-cheung
Zablotchi Igor
Publication venue
Publication date: 06/04/2023
Field of study

In-memory ordered key-value stores are an important building block in modern distributed applications. We present Honeycomb, a hybrid software-hardware system for accelerating read-dominated workloads on ordered key-value stores that provides linearizability for all operations including scans. Honeycomb stores a B-Tree in host memory, and executes SCAN and GET on an FPGA-based SmartNIC, and PUT, UPDATE and DELETE on the CPU. This approach enables large stores and simplifies the FPGA implementation but raises the challenge of data access and synchronization across the slow PCIe bus. We describe how Honeycomb overcomes this challenge with careful data structure design, caching, request parallelism with out-of-order request execution, wait-free read operations, and batching synchronization between the CPU and the FPGA. For read-heavy YCSB workloads, Honeycomb improves the throughput of a state-of-the-art ordered key-value store by at least 1.8x. For scan-heavy workloads inspired by cloud storage, Honeycomb improves throughput by more than 2x. The cost-performance, which is more important for large-scale deployments, is improved by at least 1.5x on these workloads

arXiv.org e-Print Archive

Symbol Synchronization for SDR Using a Polyphase Filterbank Based on an FPGA

Author: Fiala P.
Linhart R.
Publication venue: 'Brno University of Technology'
Publication date: 01/09/2015
Field of study

This paper is devoted to the proposal of a highly efficient symbol synchronization subsystem for Software Defined Radio. The proposed feedback phase-locked loop timing synchronizer is suitable for parallel implementation on an FPGA. The polyphase FIR filter simultaneously performs matched-filtering and arbitrary interpolation between acquired samples. Determination of the proper sampling instant is achieved by selecting a suitable polyphase filterbank using a derived index. This index is determined based on the output either the Zero-Crossing or Gardner Timing Error Detector. The paper will extensively focus on simulation of the proposed synchronization system. On the basis of this simulation, a complete, fully pipelined VHDL description model is created. This model is composed of a fully parallel polyphase filterbank based on distributed arithmetic, timing error detector and interpolation control block. Finally, RTL synthesis on an Altera Cyclone IV FPGA is presented and resource utilization in comparison with a conventional model is analyzed

Directory of Open Access Journals

Digital library of Brno University of Technology

MicroTCA implementation of synchronous Ethernet-Based DAQ systems for large scale experiments

Author: Autiero D.
Carlus B.
Gardien S.
Girerd C.
Marteau J.
Tromeur W.
Publication venue
Publication date: 01/01/2009
Field of study

Large LAr TPCs are among the most powerful detectors to address open problems in particle and astro-particle physics, such as CP violation in leptonic sector, neutrino properties and their astrophysical implications, proton decay search etc. The scale of such detector implies severe constraints on their readout and DAQ system. In this article we describe a data acquisition scheme for this new generation of large detectors. The main challenge is to propose a scalable and easy to use solution able to manage a large number of channels at the lowest cost. It is interesting to note that these constraints are very similar to those existing in Network Telecommunication Industry. We propose to study how emerging technologies like ATCA and

\mu

TCA could be used in neutrino experiments. We describe the design of an Advanced Mezzanine Board (AMC) including 32 ADC channels. This board receives 32 analogical channels at the front panel and sends the formatted data through the

\mu

TCA backplane using a Gigabit Ethernet link. The gigabit switch of the MCH is used to centralize and to send the data to the event building computer. The core of this card is a FPGA (ARIA-GX from ALTERA) including the whole system except the memories. A hardware accelerator has been implemented using a NIOS II

\mu

P and a Gigabit MAC IP. Obviously, in order to be able to reconstruct the tracks from the events a time synchronisation system is mandatory. We decided to implement the IEEE1588 standard also called Precision Timing Protocol, another emerging and promising technology in Telecommunication Industry. In this article we describe a Gigabit PTP implementation using the recovered clock of the gigabit link. By doing so the drift is directly cancelled and the PTP will be used only to evaluate and to correct the offset.Comment: Talk presented at the 2009 Real Time Conference, Beijing, May '09, submitted to the proceeding

arXiv.org e-Print Archive

CiteSeerX

HAL-IN2P3

Crossref

A proof-of-concept superregenerative QPSK transceiver

Author: Bonet Dalmau Jordi
Giralt Mas Ma. Rosa
López Riera Alexis
Moncunill Geniz Francesc Xavier
Palà Schönwälder Pere
Águila López Francisco del
Publication venue
Publication date: 01/01/2014
Field of study

In this paper we present a description and experimental verification of an HF-band proof-of-concept superregenerative transceiver for QPSK signals. We describe a simple implementation of an all-digital, FPGA-based, QPSK transmitter section. On the receiver side, the quench signal is generated in the same FPGA with a minimum of analog circuitry. As the main novelty, we present a simple synchronization scheme suitable for packetized transmissions.Peer ReviewedPostprint (author’s final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Efficient FPGA implementation of high-throughput mixed radix multipath delay commutator FFT processor for MIMO-OFDM

Author: A. AMIRA
A. GUESSOUM
Ayinala
Bingham
Boopal
Chen
Fu
Garrido
Garrido
Gesbert
Li
Lin
Lin
M. DALI
N. RAMZAN
R. M. GIBSON
Sampath
Shousheng He
Shousheng He
Song-Nien Tang
Swartzlander
Tang
Tsai
Uzun
Wang
Wang
Yang
Yu-Wei Lin
Publication venue: 'Universitatea Stefan cel Mare din Suceava'
Publication date: 01/01/2017
Field of study

This article presents and evaluates pipelined architecture designs for an improved high-frequency Fast Fourier Transform (FFT) processor implemented on Field Programmable Gate Arrays (FPGA) for Multiple Input Multiple Output Orthogonal Frequency Division Multiplexing (MIMO-OFDM). The architecture presented is a Mixed-Radix Multipath Delay Commutator. The presented parallel architecture utilizes fewer hardware resources compared to Radix-2 architecture, while maintaining simple control and butterfly structures inherent to Radix-2 implementations. The high-frequency design presented allows enhancing system throughput without requiring additional parallel data paths common in other current approaches, the presented design can process two and four independent data streams in parallel and is suitable for scaling to any power of two FFT size N. FPGA implementation of the architecture demonstrated significant resource efficiency and high-throughput in comparison to relevant current approaches within literature. The proposed architecture designs were realized with Xilinx System Generator (XSG) and evaluated on both Virtex-5 and Virtex-7 FPGA devices. Post place and route results demonstrated maximum frequency values over 400 MHz and 470 MHz for Virtex-5 and Virtex-7 FPGA devices respectively

Crossref

Directory of Open Access Journals

Research Repository and Portal - University of the West of Scotland

ResearchOnline@GCU