373 research outputs found

    低電力非同期回路の面積高効率化設計

    Get PDF
    Tohoku University亀山充隆課

    FASTCUDA: Open Source FPGA Accelerator & Hardware-Software Codesign Toolset for CUDA Kernels

    Get PDF
    Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach

    Accelerating Hash-Based Query Processing Operations on FPGAs by a Hash Table Caching Technique

    Get PDF
    Extracting valuable information from the rapidly growing field of Big Data faces serious performance constraints, especially in the software-based database management systems (DBMS). In a query processing system, hash-based computational primitives such as the hash join and the group-by are the most time-consuming operations, as they frequently need to access the hash table on the high-latency off-chip memories and also to traverse whole the table. Subsequently, the hash collision is an inherent issue related to the hash tables, which can adversely degrade the overall performance. In order to alleviate this problem, in this paper, we present a novel pure hardware-based hash engine, implemented on the FPGA. In order to mitigate the high memory access latencies and also to faster resolve the hash collisions, we follow a novel design point. It is based on caching the hash table entries in the fast on-chip Block-RAMs of FPGA. Faster accesses to the correspondent hash table entries from the cache can lead to an improved overall performance. We evaluate the proposed approach by running hash-based table join and group-by operations of 5 TPC-H benchmark queries. The results show 2.9×–4.4× speedups over the cache-less FPGA-based baseline.The research leading to these results has received funding from the European Union’s Seventh Framework Program (FP7/2007-2013), for Advanced Analytics for Extremely Large European Databases (AXLE) project under grant agreement number 318633, and from the Ministry of Economy and Competitiveness of Spain under contract number TIN2015-65316-p.Peer ReviewedPostprint (author's final draft

    マルチノードFPGAによるストリームデータ分散結合処理

    Get PDF
     多彩なWebサービスの普及とセンサ技術の発達にともない,データセンタに収集されるデータの速度と量は増大を続けている.金融情報処理やネットワークトラフィック監視のようなアプリケーションでは途切れなく到着するストリームデータに対して厳しい時間的制約が課せられるため,従来のデータ処理モデルではインターフェイスがボトルネックになるなど十分な処理性能を実現できない. リアルタイム性が求められるアプリケーションを処理する仕組みであるData Stream Management System(DSMS)に対して性能向上を目的としてField Programmable Gate Array(FPGA)を利用した支援HWに関する研究が数多く行われている.その中でもストリームデータ結合処理の並列処理アルゴリズムであるHandshake JoinのHW実装では,CPU処理と比較して高い並列性とスループットを実現している.しかしFPGAで実行できる処理の規模を示すウィンドウサイズの拡大は単一のFPGAがもつリソース量に制約を持つという問題がある. 本研究報告では,ストリームデータの結合処理を行うHandshake JoinのFPGAアクセラレータをマルチノードに拡張する手法を提案し,その性能評価を報告する.マルチノード拡張は,データ通信の2つの工夫によって実現する.FPGA上に複数のJoin Core を実装すると共に,FPGAボード上のDRAMを介してFPGA間でJoin Coreを接続する仕組みを導入する.またデータ通信と結合演算をオーバラップすることで,通信に要するオーバヘッドを隠蔽する.これらのデータ配布方法の工夫により,単一のFPGAでは実装できなかった大きなウィンドウサイズの結合演算が可能となる.最大16ノードでの性能評価の結果,マルチノードに拡張した場合においても,結合演算のスループットが維持されるとともに,並列化効果が得られることが確認された.電気通信大学201

    HAL-ASOS - Linux com aceleração em hardware para sistemas operativos dedicados à aplicação

    Get PDF
    Programa doutoral em Engenharia Eletrónica e de Computadores (PDEEC) (especialidade de Informática Industrial e Sistemas Embebidos)O ecossistema de sistemas embebidos de hoje tornou-se enorme, cobrindo vários e diferentes sistemas, exigindo desempenho e mobilidade completa enquanto atingem autonomias de bateria cada vez maiores. Mas a crescente frequência de relógio que resultou em dispositivos cada vez mais rápidos começou a estagnar antes dos transístores pararem de encolher. Plataformas Field Programmable Gate Array (FPGA) são uma solução alternativa para a implementação de sistemas completos e reconfiguráveis. Fornecem desempenho e eficiência computacional para satisfazer requisitos da aplicação e do sistema embebido. Vários Sistemas Operativos (SO) assistidos por FPGA foram propostos, mas ao estreitar seu foco na síntese do datapath do acelerador de hardware, a grande maioria ignora a integração semântica destes no SO. Ambientes de síntese de alto nível (HLS) elevaram a abstração além da linguagem de transferência de registo (RTL), seguindo uma abordagem específica de domínio enquanto misturam software e abstrações de hardware ad hoc, que dificultam as otimizações. Além disso, os modelos de programação para software e hardware reconfigurável carecem de semelhanças, o que com o tempo dificultará a Exploração do Ambiente de Design (DSE) e diminuirá o potencial de reutilização de código. Para responder a estas necessidades, propomos HAL-ASOS, uma ferramenta para implementar sistemas embebidos baseados em Linux que fornece (1) elasticidade no design em conformidade com a natureza evolutiva deste SO, (2) integração semântica profunda de tarefas de hardware nos modelos de programação do Linux, (3) facilidade na gestão de complexidade através de metodologia e ferramentas para apoiar o design, verificação e implementação, (4) orientada por princípios de design híbridos e eficiência no sistema. Para avaliar as funcionalidades da ferramenta, foi implementado um aplicativo criptográfico que demonstra alcance de desempenho enquanto se emprega a metodologia de design. Novos níveis de desempenho são atingidos numa aplicação de Visão por Computador que explora recursos de programação assíncrona-síncrona. Os resultados demonstram uma abordagem flexível na reconfiguração entre hardware e software, e desempenho que aumenta consistentemente com acréscimo de recursos ou frequência de relógio.Today’s embedded systems ecosystem became huge while covering several and different computer-based systems, demanding for performance and complete mobility while experiencing longer battery lives. But the rampant frequency that resulted in faster devices began hitting a wall even before transistors stopped shrinking. Field Programmable Gate Array (FPGA) platforms are an alternative solution towards implementing complete reconfigurable systems. They provide computational power, efficiency, in a lightweight solution to serve the application requirements and increase performance in the overall system. Several FPGA-assisted Operating Systems (OS) have been proposed, but by narrowing their focus on datapath synthesis of the hardware accelerator, they completely ignore the deep semantic integration of these accelerators into the OS. State-of-the-art High-Level Synthesis (HLS) environments have raised the level of abstraction beyond Register Transfer Language (RTL) by following a domain-specific approach while mixing ad hoc software and hardware abstractions, making harder for performance optimizations. Furthermore, the programming models for software and reconfigurable hardware lack commonalities, which in time will hinder the Design Space Exploration (DSE) and lower the potential for code reuse. To overcome these issues, we propose HAL-ASOS, a framework to implement Linux-based Embedded systems which provides (1) elasticity by design to comply with the evolutive nature of Linux, (2) deep semantic integration of the hardware tasks in the Linux programming models, (3) easy complexity management using methodology and tools to fully support design, verification and deployment, (4) hybrid and efficiency-oriented design principles. To evaluate the framework functionalities, a cryptographic application was implemented and demonstrates performance achievements while using the promoted application-driven design methodology. To demonstrate new levels of performance that can be achieved, a Computer Vision application explores several mixed asynchronous-synchronous programming features. Experiments demonstrate a flexible design approach in terms of hardware and software reconfiguration, and significant performance that increases consistently with the rising in processing resources or clock frequencies.Financial support received from Portuguese Foundation for Science and Technology (FCT) with the PhD grant SFRH/BD/82732/2011
    corecore