2,573 research outputs found

    Co-design Hardware and Algorithm for Vector Search

    Full text link
    Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0×\times and 37.2×\times speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5×\times and 7.6×\times speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.Comment: 11 page

    Runtime Adaptive Hybrid Query Engine based on FPGAs

    Get PDF
    This paper presents the fully integrated hardware-accelerated query engine for large-scale datasets in the context of Semantic Web databases. As queries are typically unknown at design time, a static approach is not feasible and not flexible to cover a wide range of queries at system runtime. Therefore, we introduce a runtime reconfigurable accelerator based on a Field Programmable Gate Array (FPGA), which transparently incorporates with the freely available Semantic Web database LUPOSDATE. At system runtime, the proposed approach dynamically generates an optimized hardware accelerator in terms of an FPGA configuration for each individual query and transparently retrieves the query result to be displayed to the user. During hardware-accelerated execution the host supplies triple data to the FPGA and retrieves the results from the FPGA via PCIe interface. The benefits and limitations are evaluated on large-scale synthetic datasets with up to 260 million triples as well as the widely known Billion Triples Challenge

    Efficient Real-Time Architectures and FPGA Implementations of Histogram-Based Median Filters for High Definition Videos

    Get PDF
    Digital filtering plays an important role in many signal processing applications. Filtering is performed to recover the original signal from its corrupted version. Median filter is a non-linear digital filter that replaces a sample in a given window by the median value of the samples in the window. For images corrupted with impulse noise, median filter provides a very high quality of filtered images. Several modifications of median filters have been proposed and implemented to achieve high image quality compared to that provided by conventional median filters. When these filters are implemented on hardware platforms such as FPGAs, the performance parameters, namely, the area, power and operating frequency should be taken into consideration in addition to the quality of the filtered image. Therefore, efficient implementation of median filters on FPGAs for image and video processing algorithms has been a topic of much interest. The existing hardware-based median filters for high definition video formats do not always satisfy the real-time throughput requirements or are inefficient with respect to hardware performance parameters, such as the area and frequency. This is due to the fact that most of the existing techniques use sorting-based median calculation, which results in a low hardware performance. In this thesis, architectures that use histogram-based median computation, which is a non-sorting-based operation, are designed with a view of efficient hardware implementation. This is carried out in two parts. We design and implement efficient architectures that satisfy the real-time throughput requirements of full high definition (FHD) videos in the first part and that of ultra high definition (UHD) videos in the second part. In the first part, an efficient real-time histogram-based median filter that uses the concept of bit-plane-slicing and adaptive switching median filter (ASMF) is designed and implemented on FPGAs. We term this architecture as hybrid architecture for median filtering (HAMF). The proposed HAMF computes an approximate median, since it uses only the most significant B-bits of the pixel values for median calculation. As a result, the algorithmic level implementation of the proposed HAMF results in a slight degradation in the filtered image quality compared to that provided by ASMF. The proposed HAMF provides a significant improvement over ASMF in terms of the area and operating frequency, when implemented on different generation FPGAs. Analysis of the different parameters, such as the number of bit-planes used in the computation of the median and the number of pipelining stages, is carried out to study the trade-off between the quality of the filtered image and hardware performance. Although the FPGA implementation of the proposed HAMF provides a very high operating frequency, the quality of the images filtered by its algorithmic level implementation decreases with increasing window size and noise density. This filter may be suitable for applications that require FHD filtering with cost constraints, but not for applications where the output image quality is as important as the hardware performance. Hence, in the second part, we design an efficient and real-time architecture of the hierarchical histogram-based median filter (HHMF). The proposed architecture is designed using a full synchronous pipeline, a synchronous accumulate-and-compare unit, and is scalable. The FPGA implementation of the proposed architecture of HHMF can perform real-time filtering of 4K and 8K UHD videos. The quality of the image filtered by HHMF is not compromised as in the case of HAMF, since HHMF uses all the bit-planes and computes the actual median. Although the FPGA implementation of HHMF results in more area utilization, the proposed implementation is more economical than a GPU-based HHMF implementation and provides a better throughput

    A scalable 32 channel neural recording and real-time FPGA based spike sorting system

    No full text
    This demo presents a scalable a 32-channel neural recording platform with real-time, on-node spike sorting ca- pability. The hardware consists of: an Intan RHD2132 neural amplifier; a low power Igloo ® nano FPGA; and an FX3 USB 3.0 controller. Graphical User Interfaces for controlling the system, displaying real-time data, and template generation with a modified form of WaveClus are demonstrated
    • …
    corecore