8,527 research outputs found

    Bit-level pipelined digit-serial array processors

    Get PDF
    A new architecture for high performance digit-serial vector inner product (VIP) which can be pipelined to the bit-level is introduced. The design of the digit-serial vector inner product is based on a new systematic design methodology using radix-2n arithmetic. The proposed architecture allows a high level of bit-level pipelining to increase the throughput rate with minimum initial delay and minimum area. This will give designers greater flexibility in finding the best tradeoff between hardware cost and throughput rate. It is shown that sub-digit pipelined digit-serial structure can achieve a higher throughput rate with much less area consumption than an equivalent bit-parallel structure. A twin-pipe architecture to double the throughput rate of digit-serial multipliers and consequently that of the digit-serial vector inner product is also presented. The effect of the number of pipelining levels and the twin-pipe architecture on the throughput rate and hardware cost are discussed. A two's complement digit-serial architecture which can operate on both negative and positive numbers is also presented

    Adapting the interior point method for the solution of LPs on serial, coarse grain parallel and massively parallel computers

    Get PDF
    In this paper we describe a unified scheme for implementing an interior point algorithm (IPM) over a range of computer architectures. In the inner iteration of the IPM a search direction is computed using Newton's method. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system, and the design of data structures to take advantage of serial, coarse grain parallel and massively parallel computer architectures, are considered in detail. We put forward arguments as to why integration of the system within a sparse simplex solver is important and outline how the system is designed to achieve this integration

    Real-time human action recognition on an embedded, reconfigurable video processing architecture

    Get PDF
    Copyright @ 2008 Springer-Verlag.In recent years, automatic human motion recognition has been widely researched within the computer vision and image processing communities. Here we propose a real-time embedded vision solution for human motion recognition implemented on a ubiquitous device. There are three main contributions in this paper. Firstly, we have developed a fast human motion recognition system with simple motion features and a linear Support Vector Machine (SVM) classifier. The method has been tested on a large, public human action dataset and achieved competitive performance for the temporal template (eg. “motion history image”) class of approaches. Secondly, we have developed a reconfigurable, FPGA based video processing architecture. One advantage of this architecture is that the system processing performance can be reconfiured for a particular application, with the addition of new or replicated processing cores. Finally, we have successfully implemented a human motion recognition system on this reconfigurable architecture. With a small number of human actions (hand gestures), this stand-alone system is performing reliably, with an 80% average recognition rate using limited training data. This type of system has applications in security systems, man-machine communications and intelligent environments.DTI and Broadcom Ltd

    FPGA implementation of real-time human motion recognition on a reconfigurable video processing architecture

    Get PDF
    In recent years, automatic human motion recognition has been widely researched within the computer vision and image processing communities. Here we propose a real-time embedded vision solution for human motion recognition implemented on a ubiquitous device. There are three main contributions in this paper. Firstly, we have developed a fast human motion recognition system with simple motion features and a linear Support Vector Machine(SVM) classifier. The method has been tested on a large, public human action dataset and achieved competitive performance for the temporal template (eg. ``motion history image") class of approaches. Secondly, we have developed a reconfigurable, FPGA based video processing architecture. One advantage of this architecture is that the system processing performance can be reconfigured for a particular application, with the addition of new or replicated processing cores. Finally, we have successfully implemented a human motion recognition system on this reconfigurable architecture. With a small number of human actions (hand gestures), this stand-alone system is performing reliably, with an 80% average recognition rate using limited training data. This type of system has applications in security systems, man-machine communications and intelligent environments

    Architecture independent environment for developing engineering software on MIMD computers

    Get PDF
    Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management

    Spectral element methods: Algorithms and architectures

    Get PDF
    Spectral element methods are high-order weighted residual techniques for partial differential equations that combine the geometric flexibility of finite element methods with the rapid convergence of spectral techniques. Spectral element methods are described for the simulation of incompressible fluid flows, with special emphasis on implementation of spectral element techniques on medium-grained parallel processors. Two parallel architectures are considered: the first, a commercially available message-passing hypercube system; the second, a developmental reconfigurable architecture based on Geometry-Defining Processors. High parallel efficiency is obtained in hypercube spectral element computations, indicating that load balancing and communication issues can be successfully addressed by a high-order technique/medium-grained processor algorithm-architecture coupling

    Preliminary Report on High-Performance Computational Structures for Robot Control

    Get PDF
    In this report we present some initial results of our work completed thus far on Computational Structures for Robot Control . A SIMD architecture with the crossbar interprocessor network which achieves the parallel processing execution time lower bound of o( [a1n ]), where a1 is a constant and n is the number of manipulator joints, for the computation of the inverse dynamics problem, is discussed. A novel SIMD task scheduling algorithm that optimizes the parallel processing performance on the indicated architecture is also delineated. Simulations performed on this architecture show speedup factor of 3.4 over previous related work completed for the evaluation of the specified problem, is achieved. Parallel processing of PUMA forward and inverse kinematics solutions is next investigated using a particular scheduling algorithm. In addition, a custom bit-serial array architecture is designed for the computation of the inverse dynamics problem within the bit-serial execution time lower bound of o(c1k + c2kn), where c1 and c2 are specified constants, k is the word length, and n is the number of manipulator joints. Finally, mapping of the Newton-Euler equations onto a fixed systolic array is investigated. A balanced architecture for the inverse dynamics problem which achieves the systolic execution time lower bound for the specified problem is depicted. Please note again that these results are only preliminary and improvements to our algorithms and architectures are currently still being made

    Novel sparse OBC based distributed arithmetic architecture for matrix transforms

    Get PDF
    Inner product (IP) forms the basis of a number of signal processing algorithms and applications such as transforms, filters, communication systems etc. Distributed arithmetic (DA) provides an effective methodology to implement IP of vectors and matrices using a simple combination of memory elements, adders and shifters instead of lumped multipliers. This bit level rearrangement results in much higher computational efficiencies and yields compact designs highly suited for high performance resource constrained applications. Offset binary coding (OBC) is an effective technique to further optimize the DA, and allows us to reduce the memory requirements by a factor of two, with minimum additional computational complexity. This makes OBC-DA attractive for applications that are both resource and memory constrained. In addition, sparse matrix factorization techniques can be exploited to further reduce the size of the DA-ROMs. In this paper, the design and implementation of a novel OBC based DA is demonstrated using a generic architecture for implementing discrete orthogonal transforms (DOTs). Implementation is performed on the Xilinx Virtex-II Pro field programmable gate array (FPGA), and a detailed comparison between conventional and OBC based DA is presented to highlight the trade offs in various design metrics including performance, area and power
    corecore