450 research outputs found

    A General-Purpose Graphics Processing Unit (GPGPU)-Accelerated Robotic Controller Using a Low Power Mobile Platform

    Get PDF
    Robotic controllers have to execute various complex independent tasks repeatedly. Massive processing power is required by the motion controllers to compute the solution of these computationally intensive algorithms. General-purpose graphics processing unit (GPGPU)-enabled mobile phones can be leveraged for acceleration of these motion controllers. Embedded GPUs can replace several dedicated computing boards by a single powerful and less power-consuming GPU. In this paper, the inverse kinematic algorithm based numeric controllers is proposed and realized using the GPGPU of a handheld mobile device. This work is the extension of a desktop GPU-accelerated robotic controller presented at DAS’16 where the comparative analysis of different sequential and concurrent controllers is discussed. First of all, the inverse kinematic algorithm is sequentially realized using Arduino-Due microcontroller and the field-programmable gate array (FPGA) is used for its parallel implementation. Execution speeds of these controllers are compared with two different GPGPU architectures (Nvidia Quadro K2200 and Nvidia Shield K1 Tablet), programmed with Compute Unified Device Architecture (CUDA) computing language. Experimental data shows that the proposed mobile platform-based scheme outperform s the FPGA by 5 and boasts a 100 speedup over the Arduino-based sequential implementation

    Exploring Processor and Memory Architectures for Multimedia

    Get PDF
    Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

    Application-Directed DVFS using Multiple Clock Domains on Graphics Hardware

    Get PDF
    As handheld devices have become increasingly popular, powerful programmable graphics hardware for mobile and handheld devices has been deployed. While many resources on mobile devices are limited, the predominant problem for mobile devices is their limited battery power. Several techniques have been proposed to increase the energy efficiency of mobile applications and improve battery life. In this thesis, we propose a new dynamic voltage and frequency scaling (DVFS) on Graphics Processing Units (GPU). In most cases, cues within the graphics appli- cation can be used to predict portions of a GPU that will be used or unused when the application is run. We partition the GPU into six clock domains that can be clocked at different rates. Specifically, each domain it has its own voltage and frequency set- ting based on its predicted workload to save energy without reducing applications frame rates. In addition, we propose an signature-based algorithm for predicting the workload offered to our six clock domains by a given application to decide voltage and frequency settings. We conduct experiments and compare the results of our new signature based workload prediction algorithm with some other traditional interval based workload prediction algorithms. Our results show that our signature-based prediction can save 30-50% energy without afecting application frame rates

    An efficient hardware logarithm generator with modified quasi-symmetrical approach for digital signal processing

    Get PDF
    This paper presents a low-error, low-area FPGA-based hardware logarithm generator for digital signal processing systems which require high-speed, real time logarithm operations. The proposed logarithm generator employs the modified quasi-symmetrical approach for an efficient hardware implementation. The error analysis and implementation results are also presented and discussed. The achieved results show that the proposed approach can reduce the approximation error and hardware area compared with traditional methods

    A Mini-History of Computing

    Get PDF
    This book was produced by George K. Thiruvathukal for the American Institute of Physics to promote interest in the interdisciplinary publication, Computing in Science and Engineering. It accompanied a limited edition set of playing cards that is no longer available (except in PDF). This book features a set of 54 significant computers by era/category, including ancient calculating instruments, pre-electronic mechanical calculators and computers, electronic era computers, and modern computing (minicomputers, maniframes, personal computers, devices, and gaming consoles)

    Homogeneous and heterogeneous MPSoC architectures with network-on-chip connectivity for low-power and real-time multimedia signal processing

    Get PDF
    Two multiprocessor system-on-chip (MPSoC) architectures are proposed and compared in the paper with reference to audio and video processing applications. One architecture exploits a homogeneous topology; it consists of 8 identical tiles, each made of a 32-bit RISC core enhanced by a 64-bit DSP coprocessor with local memory. The other MPSoC architecture exploits a heterogeneous-tile topology with on-chip distributed memory resources; the tiles act as application specific processors supporting a different class of algorithms. In both architectures, the multiple tiles are interconnected by a network-on-chip (NoC) infrastructure, through network interfaces and routers, which allows parallel operations of the multiple tiles. The functional performances and the implementation complexity of the NoC-based MPSoC architectures are assessed by synthesis results in submicron CMOS technology. Among the large set of supported algorithms, two case studies are considered: the real-time implementation of an H.264/MPEG AVC video codec and of a low-distortion digital audio amplifier. The heterogeneous architecture ensures a higher power efficiency and a smaller area occupation and is more suited for low-power multimedia processing, such as in mobile devices. The homogeneous scheme allows for a higher flexibility and easier system scalability and is more suited for general-purpose DSP tasks in power-supplied devices

    Video Processing Acceleration using Reconfigurable Logic and Graphics Processors

    No full text
    A vexing question is `which architecture will prevail as the core feature of the next state of the art video processing system?' This thesis examines the substitutive and collaborative use of the two alternatives of the reconfigurable logic and graphics processor architectures. A structured approach to executing architecture comparison is presented - this includes a proposed `Three Axes of Algorithm Characterisation' scheme and a formulation of perfor- mance drivers. The approach is an appealing platform for clearly defining the problem, assumptions and results of a comparison. In this work it is used to resolve the advanta- geous factors of the graphics processor and reconfigurable logic for video processing, and the conditions determining which one is superior. The comparison results prompt the exploration of the customisable options for the graphics processor architecture. To clearly define the architectural design space, the graphics processor is first identifed as part of a wider scope of homogeneous multi-processing element (HoMPE) architectures. A novel exploration tool is described which is suited to the investigation of the customisable op- tions of HoMPE architectures. The tool adopts a systematic exploration approach and a high-level parameterisable system model, and is used to explore pre- and post-fabrication customisable options for the graphics processor. A positive result of the exploration is the proposal of a reconfigurable engine for data access (REDA) to optimise graphics processor performance for video processing-specific memory access patterns. REDA demonstrates the viability of the use of reconfigurable logic as collaborative `glue logic' in the graphics processor architecture
    • …
    corecore