672 research outputs found

    A review of parallel processing approaches to robot kinematics and Jacobian

    Get PDF
    Due to continuously increasing demands in the area of advanced robot control, it became necessary to speed up the computation. One way to reduce the computation time is to distribute the computation onto several processing units. In this survey we present different approaches to parallel computation of robot kinematics and Jacobian. Thereby, we discuss both the forward and the reverse problem. We introduce a classification scheme and classify the references by this scheme

    Efficient Parallel Algorithms and VLSI Architectures for Manipulator Jacobian Computation

    Get PDF
    Real-time computations of manipulator Jacobian are examined for executing on uniprocessor computers, parallel computers, and VLSI pipelines. The characteristics of the Jacobian equations are found to be in the form of the first-order linear recurrence. The time lower bound of computing the first-order linear recurrence, and hence the Jacobian, is of order O(N) on uniprocessor computers, and of order O(log2N) on parallel SIMD computers, where TV is the number of degrees-of-freedom of the manipulator. The Generalized-^ method, which achieves the time lower bound on uniprocessor computers, is derived to compute the Jacobian at any desired reference coordinate frame A; from the base coordinate frame to the end-effector coordinate frame. We find that if the reference coordinate frame k is in the range [3 , N—4], then the computational effort is the minimum. To reduce the computational complexity from the order of O (N) to O (log2N), we derive the parallel forward and backward recursive doubling algorithm to compute the Jacobian on parallel computers. Again, any reference coordinate frame k can be used, and the minimum computation occurs at k = (N—1)/2. To further reduce the Jacobian computation complexity, we design two VLSI systolic pipelined architectures. A linear VLSI pipe, which uses the least number of modular processors, takes 3N floating-point operations to compute the Jacobian, and a parallel VLSI pipe takes 3 floating-point operations. We also show that if the reference coordinate frame is selected at k — (N—1)/2, then the parallel pipe will require the least number of modular processors, and the communication paths are much shorter

    Neuro-inspired system for real-time vision sensor tilt correction

    Get PDF
    Neuromorphic engineering tries to mimic biological information processing. Address-Event-Representation (AER) is an asynchronous protocol for transferring the information of spiking neuro-inspired systems. Currently AER systems are able sense visual and auditory stimulus, to process information, to learn, to control robots, etc. In this paper we present an AER based layer able to correct in real time the tilt of an AER vision sensor, using a high speed algorithmic mapping layer. A codesign platform (the AER-Robot platform), with a Xilinx Spartan 3 FPGA and an 8051 USB microcontroller, has been used to implement the system. Testing it with the help of the USBAERmini2 board and the jAER software.Junta de Andalucía P06-TIC-01417Ministerio de Educación y Ciencia TEC2006-11730-C03-02Ministerio de Ciencia e Innovación TEC2009-10639-C04-0

    Embedding Multi-Task Address-Event- Representation Computation

    Get PDF
    Address-Event-Representation, AER, is a communication protocol that is intended to transfer neuronal spikes between bioinspired chips. There are several AER tools to help to develop and test AER based systems, which may consist of a hierarchical structure with several chips that transmit spikes among them in real-time, while performing some processing. Although these tools reach very high bandwidth at the AER communication level, they require the use of a personal computer to allow the higher level processing of the event information. We propose the use of an embedded platform based on a multi-task operating system to allow both, the AER communication and processing without the requirement of either a laptop or a computer. In this paper, we present and study the performance of an embedded multi-task AER tool, connecting and programming it for processing Address-Event information from a spiking generator.Ministerio de Ciencia e Innovación TEC2006-11730-C03-0

    Parallel algorithm and architecture for the control of kinematically redundant manipulators, A

    Get PDF
    Includes bibliographical references (pages 413-414).Kinematically redundant manipulators are inherently capable of more dexterous manipulation due to their additional degrees of freedom. To achieve this dexterity, however, one must be able to efficiently calculate the most desirable configuration from the infinite number of possible configurations that satisfy the end-effector constraint. It has been previously shown that the singular value decomposition (SVD) plays a crucial role in doing such calculations. In this work, a parallel algorithm for calculating the SVD is incorporated into a computational scheme for solving the equations of motion for kinematically redundant systems. This algorithm, which generalizes the damped least squares formulation to include solutions that utilize null-space projections and task prioritization as well as augmented or extended Jacobians, is then implemented on a simple linear array of processing elements. By taking advantage of the error bounds on the perturbation of the SVD, it is shown that an array of only four AT&T DSP chips can result in control cycle times of less than 3 ms for a seven degree-of-freedom manipulator

    Robot manipulator prototyping (Complete Design Review)

    Get PDF
    Journal ArticlePrototyping is an important activity in engineering. Prototype development is a good test for checking the viability of a proposed system. Prototypes can also help in determining system parameters, ranges, or in designing better systems. The interaction between several modules (e.g., S/W, VLSI, CAD, CAM, Robotics, and Control) illustrates an interdisciplinary prototyping environment that includes radically different types of information, combined in a coordinated way. Developing an environment that enables optimal and flexible design of robot manipulators using reconfigurable links, joints, actuators, and sensors is an essential step for efficient robot design and prototyping. Such an environment should have the right "mix" of software and hardware components for designing the physical parts and the controllers, and for the algorithmic control of the robot modules (kinematics, inverse kinematics, dynamics, trajectory planning, analog control and digital computer control). Specifying object-based communications and catalog mechanisms between the software modules, controllers, physical parts, CAD designs, and actuator and sensor components is a necessary step in the prototyping activities. We propose a flexible prototyping environment for robot manipulators with the required subsystems and interfaces between the different components of this environment

    NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

    Get PDF
    Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the MAC units, and achieves a power efficiency of over 3TOp/s/W in a core area of 6.3mm2^2. As further proof of NullHop's usability, we interfaced its FPGA implementation with a neuromorphic event camera for real time interactive demonstrations

    Hardware neural systems for applications: a pulsed analog approach

    Get PDF

    Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add

    Get PDF
    The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and “real-world” application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using “real-world” benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P. The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft
    corecore