Search CORE

199 research outputs found

HLS-ENABLEDDYNAMIC STREAM PROCESSING

Author: Kritikakis Charalampos
Publication venue
Publication date: 31/12/2021
Field of study

The University of Manchester - Institutional Repository

Runtime Management of Dynamic Dataflows with Partially Reconfigurable Pipelines on FPGAs

Author: Mätas Kaspar
Publication venue
Publication date: 31/12/2023
Field of study

The University of Manchester - Institutional Repository

Runtime Adaptive Hybrid Query Engine based on FPGAs

Author: Christopher Blochwitz
Dennis Heinrich
Stefan Werner
Sven Groppe
Thilo Pionteck
Publication venue: RonPub
Publication date: 01/01/2016
Field of study

This paper presents the fully integrated hardware-accelerated query engine for large-scale datasets in the context of Semantic Web databases. As queries are typically unknown at design time, a static approach is not feasible and not flexible to cover a wide range of queries at system runtime. Therefore, we introduce a runtime reconfigurable accelerator based on a Field Programmable Gate Array (FPGA), which transparently incorporates with the freely available Semantic Web database LUPOSDATE. At system runtime, the proposed approach dynamically generates an optimized hardware accelerator in terms of an FPGA configuration for each individual query and transparently retrieves the query result to be displayed to the user. During hardware-accelerated execution the host supplies triple data to the FPGA and retrieves the results from the FPGA via PCIe interface. The benefits and limitations are evaluated on large-scale synthetic datasets with up to 260 million triples as well as the widely known Billion Triples Challenge

RonPub -- Research Online Publishing

Wire-Speed Implementation of Sliding-Window Aggregate Operator over Out-of-Order Data Streams

Author: Irie Hidetsugu
Kawashima Hideyuki
Miyoshi Takefumi
Oge Yasin
Yoshimi Masato
Yoshinaga Tsutomu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2013
Field of study

This paper shows the design and evaluation of an FPGA-based accelerator for sliding-window aggregation over data streams with out-of-order data arrival. We propose an order-agnostic hardware implementation technique for windowing operators based on a one-pass query evaluation strategy called Window-ID, which is originally proposed for software implementation. The proposed implementation succeeds to process out-of-order data items, or tuples, at wire speed due to the simultaneous evaluations of overlapping sliding-windows. In order to verify the effectiveness of the proposed approach, we have also implemented an experimental system as a case study. Our experiments demonstrate that the proposed accelerator with a network interface achieves an effective throughput around 760 Mbps or equivalently nearly 6 million tuples per second, by fully utilizing the available bandwidth of the network interface

Creative Repository of Electro-Communications

An Efficient and Scalable Implementation of Sliding-Window Aggregate Operator on FPGA

Author: Irie Hidetsugu
Kawashima Hideyuki
Miyoshi Takefumi
Oge Yasin
Yoshimi Masato
Yoshinaga Tsutomu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2013
Field of study

This paper presents an efficient and scalable implementation of an FPGA-based accelerator for sliding-window aggregates over disordered data streams. With an increasing number of overlapping sliding-windows, the window aggregates have a serious scalability issue, especially when it comes to implementing them in parallel processing hardware (e.g., FPGAs). To address the issue, we propose a resource-ef?cient, scalable, and order-agnostic hardware design and its implementation by examining and integrating two key concepts, called Window-ID and Pane, which are originally proposed for software implementation, respectively. Evaluation results show that the proposed implementation scales well compared to the previous FPGA implementation in terms of both resource consumption and performance. The proposed design is fully pipelined and our implementation can process out-of-order data items, or tuples, at wire speed up to 200 million tuples per second

Creative Repository of Electro-Communications

Packet Switched vs. Time Multiplexed FPGA Overlay Networks

Author: Barnor Henry
DeHon André
deLorimier Michael
Kapre Nachiket
Mehta Nikil
Rubin Raphael
Wilson Michael J.
Wrighton Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Dedicated, spatially configured FPGA interconnect is efficient for applications that require high throughput connections between processing elements (PEs) but with a limited degree of PE interconnectivity (e.g. wiring up gates and datapaths). Applications which virtualize PEs may require a large number of distinct PE-to-PE connections (e.g. using one PE to simulate 100s of operators, each requiring input data from thousands of other operators), but with each connection having low throughput compared with the PE’s operating cycle time. In these highly interconnected conditions, dedicating spatial interconnect resources for all possible connections is costly and inefficient. Alternatively, we can time share physical network resources by virtualizing interconnect links, either by statically scheduling the sharing of resources prior to runtime or by dynamically negotiating resources at runtime. We explore the tradeoffs (e.g. area, route latency, route quality) between time-multiplexed and packet-switched networks overlayed on top of commodity FPGAs. We demonstrate modular and scalable networks which operate on a Xilinx XC2V6000-4 at 166MHz. For our applications, time-multiplexed, offline scheduling offers up to a 63% performance increase over online, packet-switched scheduling for equivalent topologies. When applying designs to equivalent area, packet-switching is up to 2× faster for small area designs while time-multiplexing is up to 5× faster for larger area designs. When limited to the capacity of a XC2V6000, if all communication is known, time-multiplexed routing outperforms packet-switching; however when the active set of links drops below 40% of the potential links, packet-switched routing can outperform time-multiplexing

CiteSeerX

Crossref

Caltech Authors

Resource Elastic Dynamic Stream Processing on FPGAs Exemplified on Database Acceleration

Author: Manev Kristiyan
Publication venue
Publication date: 01/08/2022
Field of study

The University of Manchester - Institutional Repository