6,854 research outputs found
2D hardware acceleration
The objective of the project is to develop an IP-core that provides hardware acceleration for common 2D rendering operations in an embedded system. The requirements for graphical user interfaces (GUI) on modern display and touchscreen based systems are increasing steadily. Rendering complex and attractive GUIs requires a lot of processing power. At the same time, energy consumption for most of these embedded systems should decrease. Being able to off-load processor intensive tasks such as rendering of 2D shapes to dedicated hardware vastly decreases rendering time and frees a lot of processor resources which leads to a faster GUI and a less power consuming system
Hardware Acceleration Using Functional Languages
Cílem této práce je prozkoumat možnosti využití funkcionálního paradigmatu pro hardwarovou akceleraci, konkrétně pro datově paralelní úlohy. Úroveň abstrakce tradičních jazyků pro popis hardwaru, jako VHDL a Verilog, přestáví stačit. Pro popis na algoritmické či behaviorální úrovni se rozmáhají jazyky původně navržené pro vývoj softwaru a modelování, jako C/C++, SystemC nebo MATLAB. Funkcionální jazyky se s těmi imperativními nemůžou měřit v rozšířenosti a oblíbenosti mezi programátory, přesto je předčí v mnoha vlastnostech, např. ve verifikovatelnosti, schopnosti zachytit inherentní paralelismus a v kompaktnosti kódu. Pro akceleraci datově paralelních výpočtů se často používají jednotky FPGA, grafické karty (GPU) a vícejádrové procesory. Praktická část této práce rozšiřuje existující knihovnu Accelerate pro počítání na grafických kartách o výstup do VHDL. Accelerate je možno chápat jako doménově specifický jazyk vestavěný do Haskellu s backendem pro prostředí NVIDIA CUDA. Rozšíření pro vysokoúrovňovou syntézu obvodů ve VHDL představené v této práci používá stejný jazyk a frontend.The aim of this thesis is to research how the functional paradigm can be used for hardware acceleration with an emphasis on data-parallel tasks. The level of abstraction of the traditional hardware description languages, such as VHDL or Verilog, is becoming to low. High-level languages from the domains of software development and modeling, such as C/C++, SystemC or MATLAB, are experiencing a boom for hardware description on the algorithmic or behavioral level. Functional Languages are not so commonly used, but they outperform imperative languages in verification, the ability to capture inherent paralellism and the compactness of code. Data-parallel task are often accelerated on FPGAs, GPUs and multicore processors. In this thesis, we use a library for general-purpose GPU programs called Accelerate and extend it to produce VHDL. Accelerate is a domain-specific language embedded into Haskell with a backend for the NVIDIA CUDA platform. We use the language and its frontend, and create a new backend for high-level synthesis of circuits in VHDL.
Hardware Acceleration of Cipher Attack
Hardwarová akcelerácia výpočtu býva často vhodným nástrojom ako docieliť výrazne lepšieho výkonu pri spracovávaní veľkého množstva dát alebo pri realizácii algoritmu ktorý je možné dobre paralelizovať. Cieľom práce je demonštrovať výsledky použitia FPGA obvodov na implementáciu algoritmu s exponenciálnou zložitosťou. Zvoleným algoritmom je útok hrubou silou na šifrovací algoritmus WEP so 40 bitovým klúčom. Účelom práce je porovnať vlastnosti a výkon softwarovej a hardwarovej implementácie algoritmu.Hardware acceleration is often good tool to achieve significantly better performance of processing great ammount of data or of realization of parallel algoritms. Aim of this work is to demonstrate resoluts of using FPGA circuits for implementation exponentially complex algorithm. As example haschosen brute-force attack on WEP cryptographic algorithm with 40-bit long key. Goal of this work is to compare properties and performance of software and hardware implementation of choosen algorithm.
Hardware Acceleration of Neural Graphics
Rendering and inverse-rendering algorithms that drive conventional computer
graphics have recently been superseded by neural representations (NR). NRs have
recently been used to learn the geometric and the material properties of the
scenes and use the information to synthesize photorealistic imagery, thereby
promising a replacement for traditional rendering algorithms with scalable
quality and predictable performance. In this work we ask the question: Does
neural graphics (NG) need hardware support? We studied representative NG
applications showing that, if we want to render 4k res. at 60FPS there is a gap
of 1.5X-55X in the desired performance on current GPUs. For AR/VR applications,
there is an even larger gap of 2-4 OOM between the desired performance and the
required system power. We identify that the input encoding and the MLP kernels
are the performance bottlenecks, consuming 72%,60% and 59% of application time
for multi res. hashgrid, multi res. densegrid and low res. densegrid encodings,
respectively. We propose a NG processing cluster, a scalable and flexible
hardware architecture that directly accelerates the input encoding and MLP
kernels through dedicated engines and supports a wide range of NG applications.
We also accelerate the rest of the kernels by fusing them together in Vulkan,
which leads to 9.94X kernel-level performance improvement compared to un-fused
implementation of the pre-processing and the post-processing kernels. Our
results show that, NGPC gives up to 58X end-to-end application-level
performance improvement, for multi res. hashgrid encoding on average across the
four NG applications, the performance benefits are 12X,20X,33X and 39X for the
scaling factor of 8,16,32 and 64, respectively. Our results show that with
multi res. hashgrid encoding, NGPC enables the rendering of 4k res. at 30FPS
for NeRF and 8k res. at 120FPS for all our other NG applications
Hardware acceleration of photon mapping
PhD ThesisThe quest for realism in computer-generated graphics has yielded a range of algorithmic
techniques, the most advanced of which are capable of rendering images at close to photorealistic
quality. Due to the realism available, it is now commonplace that computer graphics are used in
the creation of movie sequences, architectural renderings, medical imagery and product
visualisations.
This work concentrates on the photon mapping algorithm [1, 2], a physically based global
illumination rendering algorithm. Photon mapping excels in producing highly realistic, physically
accurate images.
A drawback to photon mapping however is its rendering times, which can be significantly longer
than other, albeit less realistic, algorithms. Not surprisingly, this increase in execution time is
associated with a high computational cost. This computation is usually performed using the
general purpose central processing unit (CPU) of a personal computer (PC), with the algorithm
implemented as a software routine. Other options available for processing these algorithms
include desktop PC graphics processing units (GPUs) and custom designed acceleration hardware
devices.
GPUs tend to be efficient when dealing with less realistic rendering solutions such as rasterisation,
however with their recent drive towards increased programmability they can also be used to
process more realistic algorithms. A drawback to the use of GPUs is that these algorithms often
have to be reworked to make optimal use of the limited resources available.
There are very few custom hardware devices available for acceleration of the photon mapping
algorithm. Ray-tracing is the predecessor to photon mapping, and although not capable of
producing the same physical accuracy and therefore realism, there are similarities between the
algorithms. There have been several hardware prototypes, and at least one commercial offering,
created with the goal of accelerating ray-trace rendering [3]. However, properties making many of
these proposals suitable for the acceleration of ray-tracing are not shared by photon mapping.
There are even fewer proposals for acceleration of the additional functions found only in photon
mapping.
All of these approaches to algorithm acceleration offer limited scalability. GPUs are inherently
difficult to scale, while many of the custom hardware devices available thus far make use of large
processing elements and complex acceleration data structures.
In this work we make use of three novel approaches in the design of highly scalable specialised
hardware structures for the acceleration of the photon mapping algorithm. Increased scalability is
gained through:
• The use of a brute-force approach in place of the commonly used smart approach, thus
eliminating much data pre-processing, complex data structures and large processing units
often required.
• The use of Logarithmic Number System (LNS) arithmetic computation, which facilitates a
reduction in processing area requirement.
• A novel redesign of the photon inclusion test, used within the photon search method of
the photon mapping algorithm. This allows an intelligent memory structure to be used for
the search.
The design uses two hardware structures, both of which accelerate one core rendering function.
Renderings produced using field programmable gate array (FPGA) based prototypes are presented,
along with details of 90nm synthesised versions of the designs which show that close to an orderof-
magnitude speedup over a software implementation is possible. Due to the scalable nature of
the design, it is likely that any advantage can be maintained in the face of improving processor
speeds.
Significantly, due to the brute-force approach adopted, it is possible to eliminate an often-used
software acceleration method. This means that the device can interface almost directly to a frontend
modelling package, minimising much of the pre-processing required by most other proposals
Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
State-of-the-art convolutional neural networks are enormously costly in both
compute and memory, demanding massively parallel GPUs for execution. Such
networks strain the computational capabilities and energy available to embedded
and mobile processing platforms, restricting their use in many important
applications. In this paper, we push the boundaries of hardware-effective CNN
design by proposing BCNN with Separable Filters (BCNNw/SF), which applies
Singular Value Decomposition (SVD) on BCNN kernels to further reduce
computational and storage complexity. To enable its implementation, we provide
a closed form of the gradient over SVD to calculate the exact gradient with
respect to every binarized weight in backward propagation. We verify BCNNw/SF
on the MNIST, CIFAR-10, and SVHN datasets, and implement an accelerator for
CIFAR-10 on FPGA hardware. Our BCNNw/SF accelerator realizes memory savings of
17% and execution time reduction of 31.3% compared to BCNN with only minor
accuracy sacrifices.Comment: 9 pages, 6 figures, accepted for Embedded Vision Workshop (CVPRW
Hardware Acceleration of the SUDOKU Game
Tato práce pojednáva o implementaci hardwarové jednotky řešící SUDOKU. V práci jsem zadefinoval pojmy týkající se hlavolamu SUDOKU a popsal některé jeho vlastnosti, zejména z hlediska řešení na počítačovém systému. Práce dále popisuje některé techniky používané při řešení SUDOKU a možnosti jejich hardwarové implementace. V hlavní části je popsána konkrétní realizace jednotky řešící SUDOKU a také je zhodnocena výkonnost navržené jednotky. Jednotku jsem ověřil i na reálném hardwaru. V závěru práce jsem zhodnotil možnosti dalšího rozšíření navržené jednotky.This work deals with the implementation of a hardware-based SUDOKU solver. SUDOKU terminology is described as well as SUDOKU puzzle metrics related to computer puzzle solvers. Solving techniques are introduced and possibilities of a hardware-based implementation are discussed. The implementation of the SUDOKU solver is described and the performance of the implemented unit is assessed. The designed solver was also verified on a real hardware platform. In conclusions possible unit extensions are proposed.
- …