11 research outputs found

    Real-Time Video Processing Using Native Programming on Android Platform

    Get PDF
    As the smartphone industry grows rapidly, smartphone applications need to be faster and real-time. For this purpose, most of the smartphone platforms run the program on the native language or compiler that can produce native code for hardware. However for the Android platform that based on the JAVA language, most of the software algorithm is running on JAVA that consumes more time to be compiled. In this paper the performance of native programming and high level programming using JAVA are compared with respect to video processing speed. Eight image processing methods are applied to each frame of the video captured from a smartphone that is running on an Android platform. The efficiencies of the two applications with difference programming language are compared by observing their frame processing rate. The experimental results show that out of the eight images processing methods, six methods that are executed using the native programming are faster than that of the JAVA programming with a total average ratio of 0.41. An application of the native programming for real-time object detection is also presented in this paper. The result shows that with native programming on Android platform, even a complicated object detection algorithm can be done in real-time

    H-SIMD machine : configurable parallel computing for data-intensive applications

    Get PDF
    This dissertation presents a hierarchical single-instruction multiple-data (H-SLMD) configurable computing architecture to facilitate the efficient execution of data-intensive applications on field-programmable gate arrays (FPGAs). H-SIMD targets data-intensive applications for FPGA-based system designs. The H-SIMD machine is associated with a hierarchical instruction set architecture (HISA) which is developed for each application. The main objectives of this work are to facilitate ease of program development and high performance through ease of scheduling operations and overlapping communications with computations. The H-SIMD machine is composed of the host, FPGA and nano-processor layers. They execute host SIMD instructions (HSIs), FPGA SIMD instructions (FSIs) and nano-processor instructions (NPLs), respectively. A distinction between communication and computation instructions is intended for all the HISA layers. The H-SIMD machine also employs a memory switching scheme to bridge the omnipresent large bandwidth gaps in configurable systems. To showcase the proposed high-performance approach, the conditions to fully overlap communications with computations are investigated for important applications. The building blocks in the H-SLMD machine, such as high-performance and area-efficient register files, are presented in detail. The H-SLMD machine hierarchy is implemented on a host Dell workstation and the Annapolis Wildstar II FPGA board. Significant speedups have been achieved for matrix multiplication (MM), 2-dimensional discrete cosine transform (2D DCT) and 2-dimensional fast Fourier transform (2D FFT) which are used widely in science and engineering. In another FPGA-based programming paradigm, a high-level language (here ANSI C) can be used to program the FPGAs in a mode similar to that of the H-SIMD machine in terms of trying to minimize the effect of overheads. More specifically, a multi-threaded overlapping scheme is proposed to reduce as much as possible, or even completely hide, runtime FPGA reconfiguration overheads. Nevertheless, although the HLL-enabled reconfigurable machine allows software developers to customize FPGA functions easily, special architecture techniques are needed to achieve high-performance without significant penalty on area and clock frequency. Two important high-performance applications, matrix multiplication and image edge detection, are tested on the SRC-6 reconfigurable machine. The implemented algorithms are able to exploit the available data parallelism with independent functional units and application-specific cache support. Relevant performance and design tradeoffs are analyzed

    Controle de sistema de dois graus de liberdade com realimentação visual por meio de segmentação por cores em plataforma reconfigurável

    Get PDF
    Trabalho de conclusão de curso (graduação)—Universidade de Brasília, Faculdade de Tecnologia, 2013.O presente trabalho apresenta uma implementação reconfigurável de um sistema de segmentação por cores em espaço HSV, composto de tonalidade, saturação e brilho. Implementação esta que é comparada à de um sistema de segmentação por distância euclidiana no espaço RGB, composto de intensidades de vermelho, verde e azul. Com as informações da diferença da distância entre o centro da imagem e o centro do objeto obtidas a partir da segmentação, é feita uma interface entre a FPGA (arranjo de portas programável em campo) e um sistema mecânico e eletrônico capaz de alinhar a câmera extratora da imagem e o objeto de interesse.This paper presents a HSV(hue, saturation and value) color segmentation application implemented in a reconfigurable system. That strategy is compared to another responsible for a RGB(red, green and blue) color segmentation based on the euclidean distance computation. Analysing the the distance between the image center and object center it is possible to build an interface between the FPGA (field-programmable gate array) and the mechanical and electronic system capable of align the camera used to extract the images and object of interest

    OPTIMIZATION OF FPGA-BASED PROCESSOR ARCHITECTURE FOR SOBEL EDGE DETECTION OPERATOR

    Get PDF
    This dissertation introduces an optimized processor architecture for Sobel edge detection operator on field programmable gate arrays (FPGAs). The processor is optimized by the use of several optimization techniques that aim to increase the processor throughput and reduce the processor logic utilization and memory usage. FPGAs offer high levels of parallelism which is exploited by the processor to implement the parallel process of edge detection in order to increase the processor throughput and reduce the logic utilization. To achieve this, the proposed processor consists of several Sobel instances that are able to produce multiple output pixels in parallel. This parallelism enables data reuse within the processor block. Moreover, the processor gains performance with a factor equal to the number of instances contained in the processor block. The processor that consists of one row of Sobel instances exploits data reuse within one image line in the calculations of the horizontal gradient. Data reuse within one and multiple image lines is enabled by using a processor with multiple rows of Sobel instances which allow the reuse of both the horizontal and vertical gradients. By the application of the optimization techniques, the proposed Sobel processor is able to meet real-time performance constraints due to its high throughput even with a considerably low clock frequency. In addition, logic utilization of the processor is low compared to other Sobel processors when implemented on ALTERA Cyclone II DE2-70

    Integrated architectures for computer vision : Automatic synthesis with three examples

    Get PDF
    Computer aided computer design is an open problem because computers are becoming more and more powerfull, more and more complex and .. smaller. We explain what "automatic (high-level) synthesis of integrated circuits" means . It is now feasible and necessary for computer vision dedicated architectures in particular. Since it requires an optimization within an ill-formalized and ill-defined design space, we describe the experimental method aiming at : 1) proving the existence of a solution for each application case, 2) finding and instanciating the optimization parameters - including the initial state -,3) effectively designing an integrated circuit and 4) redesigning the solutions for more complex architectures to still meet real-time constraints . The method is self-illustrated with three increasingly complex examples all along this paper.La construction automatique d'ordinateur assistée par ordinateur C(AO)2 est un problème ouvert parce que ceux-ci deviennent de plus en plus puissants, donc plus complexes et... plus petits. Nous expliquons ce qu'est la synthèse automatique de circuits intégrés dite de « haut niveau », technique désormais plausible et nécessaire notamment pour les architectures spécialisées en vision par ordinateur. S'agissant d'une optimisation dans un ensemble difficile à formaliser et à circonscrire nous décrivons la démarche expérimentale suivie afin de : 1) prouver l'existence d'une solution par cas d'application, 2) déterminer les paramètres de l'optimisation, dont l'état initial, et les instancier, 3) concevoir effectivement un circuit et 4) retraiter les solutions pour des architectures progressivement plus complexes n'en respectant pas moins des contraintes de temps réel. La démarche s'illustre par elle-même selon trois exemples de difficulté croissante qui jalonnent cet article

    High quality framegrabber for an IR imaging camera

    Get PDF
    Diseño electrónico de un digitalizador de video para una camara de infrarrojos con potenciales aplicaciones en el campo de "Gas Sensing

    Computer vision algorithms on reconfigurable logic arrays

    Full text link

    OPTIMIZATION OF FPGA-BASED PROCESSOR ARCHITECTURE FOR SOBEL EDGE DETECTION OPERATOR

    Get PDF
    This dissertation introduces an optimized processor architecture for Sobel edge detection operator on field programmable gate arrays (FPGAs). The processor is optimized by the use of several optimization techniques that aim to increase the processor throughput and reduce the processor logic utilization and memory usage. FPGAs offer high levels of parallelism which is exploited by the processor to implement the parallel process of edge detection in order to increase the processor throughput and reduce the logic utilization. To achieve this, the proposed processor consists of several Sobel instances that are able to produce multiple output pixels in parallel. This parallelism enables data reuse within the processor block. Moreover, the processor gains performance with a factor equal to the number of instances contained in the processor block. The processor that consists of one row of Sobel instances exploits data reuse within one image line in the calculations of the horizontal gradient. Data reuse within one and multiple image lines is enabled by using a processor with multiple rows of Sobel instances which allow the reuse of both the horizontal and vertical gradients. By the application of the optimization techniques, the proposed Sobel processor is able to meet real-time performance constraints due to its high throughput even with a considerably low clock frequency. In addition, logic utilization of the processor is low compared to other Sobel processors when implemented on ALTERA Cyclone II DE2-70

    CRD : um co-processador reconfiguravel dinamicamente para a melhoria de desempenho

    Get PDF
    Orientador : Paulo Cesar CentoducatteDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O desempenho de sistemas computacionais tem sido um requisito recorrente para um grande número de aplicações. Porém, nem sempre as soluções tradicionais para se melhorar o desempenho como por exemplo: o aumento na freqüência de operação dos processadores, a utilização de processamento paralelo etc, podem ser viáveis técnica ou economicamente, principalmente em se tratando de um sistema dedicado. Uma alternativa para a melhoria de desempenho em tais sistemas é a identificação dos trechos da aplicação que são executados de forma pouco eficientes por software e implementá-los diretamente em hardware. Os candidatos naturais para esta abordagem são os laços interiores, que normalmente são pequenos e responsáveis por grande parte do tempo de execução e, que quando implementados em hardware, não fazem uso de uma grande área de silício. Neste trabalho propomos um co-processador reconfigurável, mapeado em memória, denominado Co-processador Reconfigurável Dinamicamente (CRD), capaz de executar trechos de códigos pouco eficientes em software, tais como laços internos (kernels), diretamente em hardware. Com o intuito de reduzir a área ocupada pelo co-processador, diminuindo desta forma o custo do sistema, o CRD é dotado de uma unidade de reprogramação, que permite reutilizar os recursos disponíveis para implementar diferentes trechos de programa em hardware em uma mesma instância de execução. Os trechos de programas escolhidos para serem executados diretamente em hardware (no CRD) são aqueles responsáveis pela maior parte do tempo de execução do programa como um todo. O uso desta técnica mostrou um ganho total, no tempo de execução dos programas do benchmark DSPStone de até 20 vezesAbstract: Performance has beem a current requirement for a great number of applications. However, in some cases, the traditional solutions to improve performance, like: increase frequency of processor's operation, parallel processing etc, can be applied, or to be viable economically, when the improvement object is a embedded system. An alternative solution that can be adopted is to identify the blocks in source code inefficient when implemented in software and to implement them in the hardware directly. Natural candidates are the inner loops, thats normally are small and responsible for great parte of the execution time and that implemented in the hardware doesn't use great silicon area. In this work we propose a reconfigurable coprocessor system mapped in memory called CRD, capable to execute inefficient codes in software, such as internal loops (kernels), directly in the hardware. With intention to reduce the filled area for the ASIC, reducing by this way the price of the system, it has a reprogrammable unit inside of this, destined to fill the lack of memory that is not being more used for a hardware instruction, for other that it will be used in the future. The parts of chosen programs to be executed in the hardware are those responsible ones mostly of the time of program execution. The use of this technique shows a total speedup of up to 20 times, in the execution time of the DSPstone benchmark programsMestradoEngenharia de ComputaçãoMestre em Computaçã

    Efficient implementation of video processing algorithms on FPGA

    Get PDF
    The work contained in this portfolio thesis was carried out as part of an Engineering Doctorate (Eng.D) programme from the Institute for System Level Integration. The work was sponsored by Thales Optronics, and focuses on issues surrounding the implementation of video processing algorithms on field programmable gate arrays (FPGA). A description is given of FPGA technology and the currently dominant methods of designing and verifying firmware. The problems of translating a description of behaviour into one of structure are discussed, and some of the latest methodologies for tackling this problem are introduced. A number of algorithms are then looked at, including methods of contrast enhancement, deconvolution, and image fusion. Algorithms are characterised according to the nature of their execution flow, and this is used as justification for some of the design choices that are made. An efficient method of performing large two-dimensional convolutions is also described. The portfolio also contains a discussion of an FPGA implementation of a PID control algorithm, an overview of FPGA dynamic reconfigurability, and the development of a demonstration platform for rapid deployment of video processing algorithms in FPGA hardware
    corecore