11,914 research outputs found

    Honeycomb: ordered key-value store acceleration on an FPGA-based SmartNIC

    Full text link
    In-memory ordered key-value stores are an important building block in modern distributed applications. We present Honeycomb, a hybrid software-hardware system for accelerating read-dominated workloads on ordered key-value stores that provides linearizability for all operations including scans. Honeycomb stores a B-Tree in host memory, and executes SCAN and GET on an FPGA-based SmartNIC, and PUT, UPDATE and DELETE on the CPU. This approach enables large stores and simplifies the FPGA implementation but raises the challenge of data access and synchronization across the slow PCIe bus. We describe how Honeycomb overcomes this challenge with careful data structure design, caching, request parallelism with out-of-order request execution, wait-free read operations, and batching synchronization between the CPU and the FPGA. For read-heavy YCSB workloads, Honeycomb improves the throughput of a state-of-the-art ordered key-value store by at least 1.8x. For scan-heavy workloads inspired by cloud storage, Honeycomb improves throughput by more than 2x. The cost-performance, which is more important for large-scale deployments, is improved by at least 1.5x on these workloads

    Symbol Synchronization for SDR Using a Polyphase Filterbank Based on an FPGA

    Get PDF
    This paper is devoted to the proposal of a highly efficient symbol synchronization subsystem for Software Defined Radio. The proposed feedback phase-locked loop timing synchronizer is suitable for parallel implementation on an FPGA. The polyphase FIR filter simultaneously performs matched-filtering and arbitrary interpolation between acquired samples. Determination of the proper sampling instant is achieved by selecting a suitable polyphase filterbank using a derived index. This index is determined based on the output either the Zero-Crossing or Gardner Timing Error Detector. The paper will extensively focus on simulation of the proposed synchronization system. On the basis of this simulation, a complete, fully pipelined VHDL description model is created. This model is composed of a fully parallel polyphase filterbank based on distributed arithmetic, timing error detector and interpolation control block. Finally, RTL synthesis on an Altera Cyclone IV FPGA is presented and resource utilization in comparison with a conventional model is analyzed

    MicroTCA implementation of synchronous Ethernet-Based DAQ systems for large scale experiments

    Full text link
    Large LAr TPCs are among the most powerful detectors to address open problems in particle and astro-particle physics, such as CP violation in leptonic sector, neutrino properties and their astrophysical implications, proton decay search etc. The scale of such detector implies severe constraints on their readout and DAQ system. In this article we describe a data acquisition scheme for this new generation of large detectors. The main challenge is to propose a scalable and easy to use solution able to manage a large number of channels at the lowest cost. It is interesting to note that these constraints are very similar to those existing in Network Telecommunication Industry. We propose to study how emerging technologies like ATCA and Ό\muTCA could be used in neutrino experiments. We describe the design of an Advanced Mezzanine Board (AMC) including 32 ADC channels. This board receives 32 analogical channels at the front panel and sends the formatted data through the Ό\muTCA backplane using a Gigabit Ethernet link. The gigabit switch of the MCH is used to centralize and to send the data to the event building computer. The core of this card is a FPGA (ARIA-GX from ALTERA) including the whole system except the memories. A hardware accelerator has been implemented using a NIOS II Ό\muP and a Gigabit MAC IP. Obviously, in order to be able to reconstruct the tracks from the events a time synchronisation system is mandatory. We decided to implement the IEEE1588 standard also called Precision Timing Protocol, another emerging and promising technology in Telecommunication Industry. In this article we describe a Gigabit PTP implementation using the recovered clock of the gigabit link. By doing so the drift is directly cancelled and the PTP will be used only to evaluate and to correct the offset.Comment: Talk presented at the 2009 Real Time Conference, Beijing, May '09, submitted to the proceeding

    A proof-of-concept superregenerative QPSK transceiver

    Get PDF
    In this paper we present a description and experimental verification of an HF-band proof-of-concept superregenerative transceiver for QPSK signals. We describe a simple implementation of an all-digital, FPGA-based, QPSK transmitter section. On the receiver side, the quench signal is generated in the same FPGA with a minimum of analog circuitry. As the main novelty, we present a simple synchronization scheme suitable for packetized transmissions.Peer ReviewedPostprint (author’s final draft

    Efficient FPGA implementation of high-throughput mixed radix multipath delay commutator FFT processor for MIMO-OFDM

    Get PDF
    This article presents and evaluates pipelined architecture designs for an improved high-frequency Fast Fourier Transform (FFT) processor implemented on Field Programmable Gate Arrays (FPGA) for Multiple Input Multiple Output Orthogonal Frequency Division Multiplexing (MIMO-OFDM). The architecture presented is a Mixed-Radix Multipath Delay Commutator. The presented parallel architecture utilizes fewer hardware resources compared to Radix-2 architecture, while maintaining simple control and butterfly structures inherent to Radix-2 implementations. The high-frequency design presented allows enhancing system throughput without requiring additional parallel data paths common in other current approaches, the presented design can process two and four independent data streams in parallel and is suitable for scaling to any power of two FFT size N. FPGA implementation of the architecture demonstrated significant resource efficiency and high-throughput in comparison to relevant current approaches within literature. The proposed architecture designs were realized with Xilinx System Generator (XSG) and evaluated on both Virtex-5 and Virtex-7 FPGA devices. Post place and route results demonstrated maximum frequency values over 400 MHz and 470 MHz for Virtex-5 and Virtex-7 FPGA devices respectively
    • 

    corecore