6 research outputs found

    Achieving 10Gbps Line-rate Key-value Stores with FPGAs

    No full text
    Distributed in-memory key-value stores such as memcached have become a critical middleware application within current web infrastructure. However, typical x86-based systems yield limited performance scalability and high power consumption as their architecture with its optimization for single thread performance is not wellmatched towards the memory-intensive and parallel nature of this application. In this paper we present the design of a novel memcached architecture implemented on Field Programmable Gate Arrays (FPGAs) which is the first in literature to achieve 10Gbps line rate processing for all packet sizes. By transformation of the functionality into a dataflow architecture, the implementation can not only provide significant speed-up but also operate at a lower power consumption than any x86. More specifically, with our prototype we have measured an increase of up to a factor of 36x in requests per second per Watt that can be serviced in comparison to the best published numbers for regular servers with optimized software. Additionally, we show that through the tight integration of network interface, memory and compute, round trip latency can be reduced down to below 4.5 microseconds

    Design Space Exploration of LDPC Decoders Using High-Level Synthesis

    No full text
    International audienceToday, high-level synthesis (HLS) tools are being touted as a means to perform rapid prototyping and shortening the long development cycles needed to produce hardware designs in register transfer level (RTL). In this paper, we attempt to verify this claim by testing the productivity benefits offered by current HLS tools by using them to develop one of the most important and complex processing blocks of modern software-defined radio systems: the forward error correction unit that uses low density parity-check (LDPC) codes. More specifically, we consider three state-of-the-art HLS tools and demonstrate how they can enable users with little hardware design expertise to quickly explore a large design space and develop complex hardware designs that achieve performances that are within the same order of magnitude of handcrafted ones in RTL. Additionally, we discuss how the underlying computation model used in these HLS tools can constrain the microarchitecture of the generated designs and, consequently, impose limits on achievable performance. Our prototype LDPC decoders developed using HLS tools obtain throughputs ranging from a few Mbits/s up to Gbits/s and latencies as low as 5 ms. Based on these results, we provide insights that will help users to select the most suitable model for designing LDPC decoder blocks using these HLS tools. From a broader perspective, these results illustrate how well today’s HLS tools deliver upon their promise to lower the effort and cost of developing complex signal processing blocks, such as the LDPC block we have considered in this paper
    corecore