Search CORE

4 research outputs found

A systematic implementation of image processing algorithms on configurable computing hardware

Author: Levine Benjamin Alexander
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/1999
Field of study

Configurable computing hardware has many advantages over both general-purpose processors and application specific hardware. However, the difficulty of using this type of hardware has limited its use. An automated system for implementing image Processing applications in configurable hardware, called CHAMPION, is under development at the University of Tennessee. CHAMPION will map applications in the Khoros Cantata graphical programming environment to hardware. A relatively complex automatic target recognition (ATR) application was manually mapped from Cantata to a commercially available configurable computing platform. This manual implementation was done to assist in the development of function libraries and hardware for use in the CHAMPION systems, as well as to develop procedures to perform the application mapping. The mapping techniques used were developed in such a way that they could serve as the basis for the automated system. Many important considerations for the mapping process were identified and included in the mapping algorithms. The manual mapping was successful, allowing the ATR application to be run on a Wildforce-XL configurable computing board. The successful application implementation validated the basic hardware design and mapping concepts to be used in CHAMPION. Nearly a tenfold performance increase was realized in the hardware implementation and performance bottlenecks were identified which should enable even greater performance improvements to be realized in the automated system. The manual implementation also helped to identify some of the challenges that must be overcome to complete the development of the automated system

University of Tennessee, Knoxville: Trace

CRD : um co-processador reconfiguravel dinamicamente para a melhoria de desempenho

Author: Renon Felipe Joffre Romano
Publication venue: [s.n.]
Publication date: 04/08/2018
Field of study

Orientador : Paulo Cesar CentoducatteDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O desempenho de sistemas computacionais tem sido um requisito recorrente para um grande número de aplicações. Porém, nem sempre as soluções tradicionais para se melhorar o desempenho como por exemplo: o aumento na freqüência de operação dos processadores, a utilização de processamento paralelo etc, podem ser viáveis técnica ou economicamente, principalmente em se tratando de um sistema dedicado. Uma alternativa para a melhoria de desempenho em tais sistemas é a identificação dos trechos da aplicação que são executados de forma pouco eficientes por software e implementá-los diretamente em hardware. Os candidatos naturais para esta abordagem são os laços interiores, que normalmente são pequenos e responsáveis por grande parte do tempo de execução e, que quando implementados em hardware, não fazem uso de uma grande área de silício. Neste trabalho propomos um co-processador reconfigurável, mapeado em memória, denominado Co-processador Reconfigurável Dinamicamente (CRD), capaz de executar trechos de códigos pouco eficientes em software, tais como laços internos (kernels), diretamente em hardware. Com o intuito de reduzir a área ocupada pelo co-processador, diminuindo desta forma o custo do sistema, o CRD é dotado de uma unidade de reprogramação, que permite reutilizar os recursos disponíveis para implementar diferentes trechos de programa em hardware em uma mesma instância de execução. Os trechos de programas escolhidos para serem executados diretamente em hardware (no CRD) são aqueles responsáveis pela maior parte do tempo de execução do programa como um todo. O uso desta técnica mostrou um ganho total, no tempo de execução dos programas do benchmark DSPStone de até 20 vezesAbstract: Performance has beem a current requirement for a great number of applications. However, in some cases, the traditional solutions to improve performance, like: increase frequency of processor's operation, parallel processing etc, can be applied, or to be viable economically, when the improvement object is a embedded system. An alternative solution that can be adopted is to identify the blocks in source code inefficient when implemented in software and to implement them in the hardware directly. Natural candidates are the inner loops, thats normally are small and responsible for great parte of the execution time and that implemented in the hardware doesn't use great silicon area. In this work we propose a reconfigurable coprocessor system mapped in memory called CRD, capable to execute inefficient codes in software, such as internal loops (kernels), directly in the hardware. With intention to reduce the filled area for the ASIC, reducing by this way the price of the system, it has a reprogrammable unit inside of this, destined to fill the lack of memory that is not being more used for a hardware instruction, for other that it will be used in the future. The parts of chosen programs to be executed in the hardware are those responsible ones mostly of the time of program execution. The use of this technique shows a total speedup of up to 20 times, in the execution time of the DSPstone benchmark programsMestradoEngenharia de ComputaçãoMestre em Computaçã

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio da Producao Cientifica e Intelectual da Unicamp

Ant colony optimization on runtime reconfigurable architectures

Author: Scheuermann Bernd
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

KITopen

Automated Target Recognition on SPLASH 2

Author: Brad L. Hutchings
Michael Rencher
Publication venue
Publication date
Field of study

Automated target recognition is an application area that requires special-purpose hardware to achieve reasonable performance. FPGA-based platforms can provide a high level of performance for ATR systems if the implementation can be adapted to the limited FPGA and routing resources of these architectures. This paper discusses a mapping experiment where a linear-systolic implementation of an ATR algorithm is mapped to the Splash 2 platform. Simple columnoriented processors were used throughout the design to achieve high performance with limited nearestneighbor communication. The distributed Splash 2 memories are also exploited to achieve a high degree of parallelism. The resulting design is scalable and can be spread across multiple Splash 2 boards with a linear increase in performance. 1 Introduction Automated target recognition (ATR) is a computationally demanding application area that typically requires special-purpose hardware to achieve desirable performance. ASICs are not an opt..

CiteSeerX