672 research outputs found
A review of parallel processing approaches to robot kinematics and Jacobian
Due to continuously increasing demands in the area of advanced robot
control, it became necessary to speed up the computation. One way to
reduce the computation time is to distribute the computation onto
several processing units. In this survey we present different approaches
to parallel computation of robot kinematics and Jacobian. Thereby, we
discuss both the forward and the reverse problem. We introduce a
classification scheme and classify the references by this scheme
Efficient Parallel Algorithms and VLSI Architectures for Manipulator Jacobian Computation
Real-time computations of manipulator Jacobian are examined for executing on uniprocessor computers, parallel computers, and VLSI pipelines. The characteristics of the Jacobian equations are found to be in the form of the first-order linear recurrence. The time lower bound of computing the first-order linear recurrence, and hence the Jacobian, is of order O(N) on uniprocessor computers, and of order O(log2N) on parallel SIMD computers, where TV is the number of degrees-of-freedom of the manipulator. The Generalized-^ method, which achieves the time lower bound on uniprocessor computers, is derived to compute the Jacobian at any desired reference coordinate frame A; from the base coordinate frame to the end-effector coordinate frame. We find that if the reference coordinate frame k is in the range [3 , N—4], then the computational effort is the minimum. To reduce the computational complexity from the order of O (N) to O (log2N), we derive the parallel forward and backward recursive doubling algorithm to compute the Jacobian on parallel computers. Again, any reference coordinate frame k can be used, and the minimum computation occurs at k = (N—1)/2. To further reduce the Jacobian computation complexity, we design two VLSI systolic pipelined architectures. A linear VLSI pipe, which uses the least number of modular processors, takes 3N floating-point operations to compute the Jacobian, and a parallel VLSI pipe takes 3 floating-point operations. We also show that if the reference coordinate frame is selected at k — (N—1)/2, then the parallel pipe will require the least number of modular processors, and the communication paths are much shorter
Neuro-inspired system for real-time vision sensor tilt correction
Neuromorphic engineering tries to mimic biological
information processing. Address-Event-Representation (AER)
is an asynchronous protocol for transferring the information of
spiking neuro-inspired systems. Currently AER systems are able
sense visual and auditory stimulus, to process information, to
learn, to control robots, etc. In this paper we present an AER
based layer able to correct in real time the tilt of an AER vision
sensor, using a high speed algorithmic mapping layer. A codesign
platform (the AER-Robot platform), with a Xilinx
Spartan 3 FPGA and an 8051 USB microcontroller, has been
used to implement the system. Testing it with the help of the
USBAERmini2 board and the jAER software.Junta de Andalucía P06-TIC-01417Ministerio de Educación y Ciencia TEC2006-11730-C03-02Ministerio de Ciencia e Innovación TEC2009-10639-C04-0
Embedding Multi-Task Address-Event- Representation Computation
Address-Event-Representation, AER, is a communication protocol that is
intended to transfer neuronal spikes between bioinspired chips. There are
several AER tools to help to develop and test AER based systems, which may
consist of a hierarchical structure with several chips that transmit spikes
among them in real-time, while performing some processing. Although these
tools reach very high bandwidth at the AER communication level, they require
the use of a personal computer to allow the higher level processing of the
event information. We propose the use of an embedded platform based on a
multi-task operating system to allow both, the AER communication and
processing without the requirement of either a laptop or a computer. In this
paper, we present and study the performance of an embedded multi-task AER
tool, connecting and programming it for processing Address-Event
information from a spiking generator.Ministerio de Ciencia e Innovación TEC2006-11730-C03-0
Parallel algorithm and architecture for the control of kinematically redundant manipulators, A
Includes bibliographical references (pages 413-414).Kinematically redundant manipulators are inherently capable of more dexterous manipulation due to their additional degrees of freedom. To achieve this dexterity, however, one must be able to efficiently calculate the most desirable configuration from the infinite number of possible configurations that satisfy the end-effector constraint. It has been previously shown that the singular value decomposition (SVD) plays a crucial role in doing such calculations. In this work, a parallel algorithm for calculating the SVD is incorporated into a computational scheme for solving the equations of motion for kinematically redundant systems. This algorithm, which generalizes the damped least squares formulation to include solutions that utilize null-space projections and task prioritization as well as augmented or extended Jacobians, is then implemented on a simple linear array of processing elements. By taking advantage of the error bounds on the perturbation of the SVD, it is shown that an array of only four AT&T DSP chips can result in control cycle times of less than 3 ms for a seven degree-of-freedom manipulator
Robot manipulator prototyping (Complete Design Review)
Journal ArticlePrototyping is an important activity in engineering. Prototype development is a good test for checking the viability of a proposed system. Prototypes can also help in determining system parameters, ranges, or in designing better systems. The interaction between several modules (e.g., S/W, VLSI, CAD, CAM, Robotics, and Control) illustrates an interdisciplinary prototyping environment that includes radically different types of information, combined in a coordinated way. Developing an environment that enables optimal and flexible design of robot manipulators using reconfigurable links, joints, actuators, and sensors is an essential step for efficient robot design and prototyping. Such an environment should have the right "mix" of software and hardware components for designing the physical parts and the controllers, and for the algorithmic control of the robot modules (kinematics, inverse kinematics, dynamics, trajectory planning, analog control and digital computer control). Specifying object-based communications and catalog mechanisms between the software modules, controllers, physical parts, CAD designs, and actuator and sensor components is a necessary step in the prototyping activities. We propose a flexible prototyping environment for robot manipulators with the required subsystems and interfaces between the different components of this environment
NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
Convolutional neural networks (CNNs) have become the dominant neural network
architecture for solving many state-of-the-art (SOA) visual processing tasks.
Even though Graphical Processing Units (GPUs) are most often used in training
and deploying CNNs, their power efficiency is less than 10 GOp/s/W for
single-frame runtime inference. We propose a flexible and efficient CNN
accelerator architecture called NullHop that implements SOA CNNs useful for
low-power and low-latency application scenarios. NullHop exploits the sparsity
of neuron activations in CNNs to accelerate the computation and reduce memory
requirements. The flexible architecture allows high utilization of available
computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can
process up to 128 input and 128 output feature maps per layer in a single pass.
We implemented the proposed architecture on a Xilinx Zynq FPGA platform and
present results showing how our implementation reduces external memory
transfers and compute time in five different CNNs ranging from small ones up to
the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using
Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that
the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop
achieves an efficiency of 368%, maintains over 98% utilization of the MAC
units, and achieves a power efficiency of over 3TOp/s/W in a core area of
6.3mm. As further proof of NullHop's usability, we interfaced its FPGA
implementation with a neuromorphic event camera for real time interactive
demonstrations
Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add
The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and “real-world” application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using “real-world” benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P.
The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft
- …