11,914 research outputs found
Honeycomb: ordered key-value store acceleration on an FPGA-based SmartNIC
In-memory ordered key-value stores are an important building block in modern
distributed applications. We present Honeycomb, a hybrid software-hardware
system for accelerating read-dominated workloads on ordered key-value stores
that provides linearizability for all operations including scans. Honeycomb
stores a B-Tree in host memory, and executes SCAN and GET on an FPGA-based
SmartNIC, and PUT, UPDATE and DELETE on the CPU. This approach enables large
stores and simplifies the FPGA implementation but raises the challenge of data
access and synchronization across the slow PCIe bus. We describe how Honeycomb
overcomes this challenge with careful data structure design, caching, request
parallelism with out-of-order request execution, wait-free read operations, and
batching synchronization between the CPU and the FPGA. For read-heavy YCSB
workloads, Honeycomb improves the throughput of a state-of-the-art ordered
key-value store by at least 1.8x. For scan-heavy workloads inspired by cloud
storage, Honeycomb improves throughput by more than 2x. The cost-performance,
which is more important for large-scale deployments, is improved by at least
1.5x on these workloads
Symbol Synchronization for SDR Using a Polyphase Filterbank Based on an FPGA
This paper is devoted to the proposal of a highly efficient symbol synchronization subsystem for Software Defined Radio. The proposed feedback phase-locked loop timing synchronizer is suitable for parallel implementation on an FPGA. The polyphase FIR filter simultaneously performs matched-filtering and arbitrary interpolation between acquired samples. Determination of the proper sampling instant is achieved by selecting a suitable polyphase filterbank using a derived index. This index is determined based on the output either the Zero-Crossing or Gardner Timing Error Detector. The paper will extensively focus on simulation of the proposed synchronization system. On the basis of this simulation, a complete, fully pipelined VHDL description model is created. This model is composed of a fully parallel polyphase filterbank based on distributed arithmetic, timing error detector and interpolation control block. Finally, RTL synthesis on an Altera Cyclone IV FPGA is presented and resource utilization in comparison with a conventional model is analyzed
MicroTCA implementation of synchronous Ethernet-Based DAQ systems for large scale experiments
Large LAr TPCs are among the most powerful detectors to address open problems
in particle and astro-particle physics, such as CP violation in leptonic
sector, neutrino properties and their astrophysical implications, proton decay
search etc. The scale of such detector implies severe constraints on their
readout and DAQ system. In this article we describe a data acquisition scheme
for this new generation of large detectors. The main challenge is to propose a
scalable and easy to use solution able to manage a large number of channels at
the lowest cost. It is interesting to note that these constraints are very
similar to those existing in Network Telecommunication Industry. We propose to
study how emerging technologies like ATCA and TCA could be used in
neutrino experiments. We describe the design of an Advanced Mezzanine Board
(AMC) including 32 ADC channels. This board receives 32 analogical channels at
the front panel and sends the formatted data through the TCA backplane
using a Gigabit Ethernet link. The gigabit switch of the MCH is used to
centralize and to send the data to the event building computer. The core of
this card is a FPGA (ARIA-GX from ALTERA) including the whole system except the
memories. A hardware accelerator has been implemented using a NIOS II P
and a Gigabit MAC IP. Obviously, in order to be able to reconstruct the tracks
from the events a time synchronisation system is mandatory. We decided to
implement the IEEE1588 standard also called Precision Timing Protocol, another
emerging and promising technology in Telecommunication Industry. In this
article we describe a Gigabit PTP implementation using the recovered clock of
the gigabit link. By doing so the drift is directly cancelled and the PTP will
be used only to evaluate and to correct the offset.Comment: Talk presented at the 2009 Real Time Conference, Beijing, May '09,
submitted to the proceeding
A proof-of-concept superregenerative QPSK transceiver
In this paper we present a description and experimental verification of an HF-band proof-of-concept superregenerative transceiver for QPSK signals. We describe a simple implementation of an all-digital, FPGA-based, QPSK transmitter section. On the receiver side, the quench signal is generated in the same FPGA with a minimum of analog circuitry. As the main novelty, we present a simple synchronization scheme suitable for packetized transmissions.Peer ReviewedPostprint (authorâs final draft
Efficient FPGA implementation of high-throughput mixed radix multipath delay commutator FFT processor for MIMO-OFDM
This article presents and evaluates pipelined architecture designs for an improved high-frequency Fast Fourier
Transform (FFT) processor implemented on Field Programmable Gate Arrays (FPGA) for Multiple Input Multiple Output
Orthogonal Frequency Division Multiplexing (MIMO-OFDM). The architecture presented is a Mixed-Radix Multipath Delay
Commutator. The presented parallel architecture utilizes fewer hardware resources compared to Radix-2 architecture,
while maintaining simple control and butterfly structures inherent to Radix-2 implementations. The high-frequency
design presented allows enhancing system throughput without requiring additional parallel data paths common in
other current approaches, the presented design can process two and four independent data streams in parallel
and is suitable for scaling to any power of two FFT size N. FPGA implementation of the architecture demonstrated
significant resource efficiency and high-throughput in comparison to relevant current approaches within
literature. The proposed architecture designs were realized with Xilinx System Generator (XSG) and evaluated
on both Virtex-5 and Virtex-7 FPGA devices. Post place and route results demonstrated maximum frequency
values over 400 MHz and 470 MHz for Virtex-5 and Virtex-7 FPGA devices respectively
- âŠ