1,362 research outputs found
Hardware Accelarated Visual Tracking Algorithms. A Systematic Literature Review
Many industrial applications need object recognition and tracking capabilities. The algorithms developed for those purposes are computationally expensive. Yet ,real time performance, high accuracy and small power consumption are essential measures of the system. When all these requirements are combined, hardware acceleration of these algorithms becomes a feasible solution. The purpose of this study is to analyze the current state of these hardware acceleration solutions, which algorithms have been implemented in hardware and what modifications have been done in order to adapt these algorithms to hardware.Siirretty Doriast
TreeBASIS Feature Descriptor and Its Hardware Implementation
This paper presents a novel feature descriptor called TreeBASIS that provides improvements in descriptor size, computation time, matching speed, and accuracy. This new descriptor uses a binary vocabulary tree that is computed using basis dictionary images and a test set of feature region images. To facilitate real-time implementation, a feature region image is binary quantized and the resulting quantized vector is passed into the BASIS vocabulary tree. A Hamming distance is then computed between the feature region image and the effectively descriptive basis dictionary image at a node to determine the branch taken and the path the feature region image takes is saved as a descriptor. The TreeBASIS feature descriptor is an excellent candidate for hardware implementation because of its reduced descriptor size and the fact that descriptors can be created and features matched without the use of floating point operations. The TreeBASIS descriptor is more computationally and space efficient than other descriptors such as BASIS, SIFT, and SURF. Moreover, it can be computed entirely in hardware without the support of a CPU for additional software-based computations. Experimental results and a hardware implementation show that the TreeBASIS descriptor compares well with other descriptors for frame-to-frame homography computation while requiring fewer hardware resources
Low power and high performance heterogeneous computing on FPGAs
L'abstract è presente nell'allegato / the abstract is in the attachmen
Reconfigurable acceleration of Recurrent Neural Networks
Recurrent Neural Networks (RNNs) have been successful in a wide range of applications involving temporal sequences such as natural language processing, speech recognition and video analysis. However, RNNs often require a significant amount of memory and computational resources. In addition, the recurrent nature and data dependencies in RNN computations can lead to system stall, resulting in low throughput and high latency.
This work describes novel parallel hardware architectures for accelerating RNN inference using Field-Programmable Gate Array (FPGA) technology, which considers the data dependencies and high computational costs of RNNs.
The first contribution of this thesis is a latency-hiding architecture that utilizes column-wise matrix-vector multiplication instead of the conventional row-wise operation to eliminate data dependencies and improve the throughput of RNN inference designs. This architecture is further enhanced by a configurable checkerboard tiling strategy which allows large dimensions of weight matrices, while supporting element-based parallelism and vector-based parallelism. The presented reconfigurable RNN designs show significant speedup over CPU, GPU, and other FPGA designs.
The second contribution of this thesis is a weight reuse approach for large RNN models with weights stored in off-chip memory, running with a batch size of one. A novel blocking-batching strategy is proposed to optimize the throughput of large RNN designs on FPGAs by reusing the RNN weights. Performance analysis is also introduced to enable FPGA designs to achieve the best trade-off between area, power consumption and performance. Promising power efficiency improvement has been achieved in addition to speeding up over CPU and GPU designs.
The third contribution of this thesis is a low latency design for RNNs based on a partially-folded hardware architecture. It also introduces a technique that balances initiation interval of multi-layer RNN inferences to increase hardware efficiency and throughput while reducing latency. The approach is evaluated on a variety of applications, including gravitational wave detection and Bayesian RNN-based ECG anomaly detection.
To facilitate the use of this approach, we open source an RNN template which enables the generation of low-latency FPGA designs with efficient resource utilization using high-level synthesis tools.Open Acces
Analysis and Hardware In the Loop Testing of ADCS Algorithm for the CubeSat 3AMADEUS
One of the main challenges with Cubesats’ ADCSs (Attitude Determination and Control Subsystems) is how heavy and power consuming the most precise systems are. This
means that developing lighter, less consuming ones is of the greatest importance.
3AMADEUS is a mission that aims to find a solution to this exact problem. Magnetic
ADCS components are among the lightest, least power consuming and most reliable options in the CubeSat industry. However, due to their low precision, this kind of component
can’t be used by themselves in missions that require precise attitude control. One of the
ways to improve the precision of this kind of component is to use novel ADCS algorithms
that maximize system performance for magnetic ADCSs. That is why 3AMADEUS has
the purpose of, not only developing, but also testing multiple of these algorithms inflight,
with hopes that one day the implementation of purely magnetic ADCSs can be generalized
in nanosatellites.
In order to possibilitate an analysis of what algorithms are to be implemented in the
3AMADEUS mission, this work presents a satellite attitude model that allows for a SIL
(Software In the Loop) simulation. Furthermore, a HIL (Hardware In the Loop) simulation is made, aiming at validating the usage of an FPGA (Field Programmable Gate Array)
for the implementation of this kind of algorithm, since the usage of FPGAs in CubeSats has
been rising significantly, and is particularly interesting in a project where reprogrammability is useful.
Having that in mind, since the algorithms for this mission are still under development, a purely magnetic ADCS algorithm that has been developed in another context is
then tested in a SIL environment, where its performance in terms of accuracy and stabilization, as well as its suitability for the 3AMADEUS mission, is analyzed under different
conditions. Finally, one of these tests is performed but this time in a HIL Simulation, not
considering attitude determination. The results of this simulation are compared to those
obtained in the SIL test, providing relevant data on the feasibility and performance of a
real life ADCS algorithm implementation in an FPGA.Um dos grandes entraves dos ADCSs (Attitude Determination and Control Subsystems) de CubeSats é o elevado peso e o alto consumo dos seus componentes de maior
precisão, o que significa que desenvolver opções mais leves e de menor consumo é de extrema importância.
A 3AMADEUS é uma missão que visa a encontrar uma solução para este mesmo
problema. Componentes de ADCS magnéticos estão entre as opções mais leves, de menor
consumo energético e mais fíaveis na indústria dos CubeSats. No entanto, devido à sua
baixa precisão, estes não podem ser utilizados por si só em missões cujos requisitos de
precisão de controlo de atitude sejam elevados. Uma das formas de aumentar a precisão
deste tipo de componentes é o uso de novos algoritmos que maximizem o desempenho de
ADCSs magnéticos, que é a razão pela qual a 3AMADEUS tem o propósito de desenvolver
e testar, em voo, vários destes algoritmos, com a esperança de que um dia a implementação de ADCSs exclusivamente magnéticos seja generalizada em CubeSats.
Para que seja possível analisar quais algoritmos devem ser implementados na missão
3AMADEUS, este trabalho apresenta um modelo de atitude de um satélite que permite
uma simulação SIL (Software In the Loop). Para além disso, é também feita uma simulação HIL (Hardware In the Loop) que procura validar o uso de um FPGA (Field Programmable Gate Array) para a implementação deste tipo de algoritmo, já que o uso de
FPGAs em CubeSats tem tido um crescimento significativo, e é particularmente interessante num projeto onde a reprogramabilidade é uma característica útil.
Tendo isto em conta, como os algoritmos para esta missão ainda estão em desenvolvimento, um algoritmo puramente magnético desenvolvido noutro contexto é então testado
num ambiente SIL, no qual o seu desempenho em termos de precisão e estabilização, assim como a sua viabilidade para a missão 3AMADEUS, são analisados sob diferentes
condições. Por fim, um destes testes é realizado num ambiente de simulação HIL. Os
resultados desta simulação, que não têm em conta a determinação da atitude, são comparados com os obtidos no teste em ambiente SIL, fornecendo dados relevantes sobre a
viabilidade e desempenho de uma implementação de um algoritmo de ADCS num FPGA
na realidade
- …