Search CORE

2,573 research outputs found

Design for scalability in 3D computer graphics architectures

Author: Holten-Lund Hans Erik
Publication venue
Publication date: 01/03/2002
Field of study

Co-design Hardware and Algorithm for Vector Search

Author: Alonso Gustavo
He Zhenhao
Hoefler Torsten
Jiang Wenqi
Li Shigang
Licht Johannes de Fine
Rekatsinas Theodoros
Renggli Cedric
Shi Runbin
Zhang Shuai
Zhu Yu
Publication venue
Publication date: 27/06/2023
Field of study

Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0

\times

and 37.2

\times

speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5

\times

and 7.6

\times

speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.Comment: 11 page

arXiv.org e-Print Archive

Runtime Adaptive Hybrid Query Engine based on FPGAs

Author: Christopher Blochwitz
Dennis Heinrich
Stefan Werner
Sven Groppe
Thilo Pionteck
Publication venue: RonPub
Publication date: 01/01/2016
Field of study

This paper presents the fully integrated hardware-accelerated query engine for large-scale datasets in the context of Semantic Web databases. As queries are typically unknown at design time, a static approach is not feasible and not flexible to cover a wide range of queries at system runtime. Therefore, we introduce a runtime reconfigurable accelerator based on a Field Programmable Gate Array (FPGA), which transparently incorporates with the freely available Semantic Web database LUPOSDATE. At system runtime, the proposed approach dynamically generates an optimized hardware accelerator in terms of an FPGA configuration for each individual query and transparently retrieves the query result to be displayed to the user. During hardware-accelerated execution the host supplies triple data to the FPGA and retrieves the results from the FPGA via PCIe interface. The benefits and limitations are evaluated on large-scale synthetic datasets with up to 260 million triples as well as the widely known Billion Triples Challenge

RonPub -- Research Online Publishing

Efficient Real-Time Architectures and FPGA Implementations of Histogram-Based Median Filters for High Definition Videos

Author: Goel Anish
Publication venue
Publication date: 01/10/2019
Field of study

Digital filtering plays an important role in many signal processing applications. Filtering is performed to recover the original signal from its corrupted version. Median filter is a non-linear digital filter that replaces a sample in a given window by the median value of the samples in the window. For images corrupted with impulse noise, median filter provides a very high quality of filtered images. Several modifications of median filters have been proposed and implemented to achieve high image quality compared to that provided by conventional median filters. When these filters are implemented on hardware platforms such as FPGAs, the performance parameters, namely, the area, power and operating frequency should be taken into consideration in addition to the quality of the filtered image. Therefore, efficient implementation of median filters on FPGAs for image and video processing algorithms has been a topic of much interest. The existing hardware-based median filters for high definition video formats do not always satisfy the real-time throughput requirements or are inefficient with respect to hardware performance parameters, such as the area and frequency. This is due to the fact that most of the existing techniques use sorting-based median calculation, which results in a low hardware performance. In this thesis, architectures that use histogram-based median computation, which is a non-sorting-based operation, are designed with a view of efficient hardware implementation. This is carried out in two parts. We design and implement efficient architectures that satisfy the real-time throughput requirements of full high definition (FHD) videos in the first part and that of ultra high definition (UHD) videos in the second part. In the first part, an efficient real-time histogram-based median filter that uses the concept of bit-plane-slicing and adaptive switching median filter (ASMF) is designed and implemented on FPGAs. We term this architecture as hybrid architecture for median filtering (HAMF). The proposed HAMF computes an approximate median, since it uses only the most significant B-bits of the pixel values for median calculation. As a result, the algorithmic level implementation of the proposed HAMF results in a slight degradation in the filtered image quality compared to that provided by ASMF. The proposed HAMF provides a significant improvement over ASMF in terms of the area and operating frequency, when implemented on different generation FPGAs. Analysis of the different parameters, such as the number of bit-planes used in the computation of the median and the number of pipelining stages, is carried out to study the trade-off between the quality of the filtered image and hardware performance. Although the FPGA implementation of the proposed HAMF provides a very high operating frequency, the quality of the images filtered by its algorithmic level implementation decreases with increasing window size and noise density. This filter may be suitable for applications that require FHD filtering with cost constraints, but not for applications where the output image quality is as important as the hardware performance. Hence, in the second part, we design an efficient and real-time architecture of the hierarchical histogram-based median filter (HHMF). The proposed architecture is designed using a full synchronous pipeline, a synchronous accumulate-and-compare unit, and is scalable. The FPGA implementation of the proposed architecture of HHMF can perform real-time filtering of 4K and 8K UHD videos. The quality of the image filtered by HHMF is not compromised as in the case of HAMF, since HHMF uses all the bit-planes and computes the actual median. Although the FPGA implementation of HHMF results in more area utilization, the proposed implementation is more economical than a GPU-based HHMF implementation and provides a better throughput

Concordia University Research Repository

A scalable 32 channel neural recording and real-time FPGA based spike sorting system

Author: Constandinou TG
Jackson A
Luan S
Williams I
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/08/2015
Field of study

This demo presents a scalable a 32-channel neural recording platform with real-time, on-node spike sorting ca- pability. The hardware consists of: an Intan RHD2132 neural amplifier; a low power Igloo ® nano FPGA; and an FX3 USB 3.0 controller. Graphical User Interfaces for controlling the system, displaying real-time data, and template generation with a modified form of WaveClus are demonstrated

Crossref

Spiral - Imperial College Digital Repository