5 research outputs found

    Evaluation of a SoC for Real-time 3D SLAM

    Get PDF
    SLAM, or Simultaneous Localization and Mapping, is the combined problem of constructing a map of an agent’s environment while localizing, or tracking that same agent’s pose in tandem. It is among the most challenging and fundamental tasks in computer vision, with applications ranging from augmented reality to robotic navigation. With the increasing capability and ubiquity of mobile computers such as cell phones, portable 3D SLAM systems are becoming feasible for widespread use. The Microsoft Hololens, Google Project Tango, and other 3D aware devices are modern day examples of the potential of SLAM and the challenges it has yet to face. The ICP, or Iterative Closest Point Algorithm, is a popular solution for retrieving the relative transformation between two scans of the same object. It has gained a resurgence in popularity due to the rise of affordable depth sensors such as the Kinect in robotics and augmented reality research. ICP, while providing a high certainty of correctness given similar point clouds, is challenging to implement in real time due to its computational complexity. In this thesis, a basic 3D SLAM algorithm is implemented and evaluated, and two proposed FPGA architectures to accelerate the Nearest Neighbor component of ICP for use in a mobile ARM-based System-on-Chip (SoC) are presented. These architectures are predicted to achieve speedups of up to 7.89x and 17.22x over a naive embedded software implementation

    Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures

    Full text link

    Energy area and speed optimized signal processing on FPGA

    Get PDF
    Matrix multiplication and Fast Fourier transform are two computational intensive DSP functions widely used as kernel operations in the applications such as graphics, imaging and wireless communication. Traditionally the performance metrics for signal processing has been latency and throughput. Energy efficiency has become increasingly important with proliferation of portable mobile devices as in software defined radio. A FPGA based system is a viable solution for requirement of adaptability and high computational power. But one limitation in FPGA is the limitation of resources. So there is need for optimization between energy, area and latency. There are numerous ways to map an algorithm to FPGA. So for the process of optimization the parameters must be determined by low level simulation of each of the designs possible which gives rise to vast time consumption. So there is need for a high level energy model in which parameters can be determined at algorithm and architectural level rather than low level simulation. In this dissertation matrix multiplication algorithms are implemented with pipelining and parallel processing features to increase throughput and reduce latency there by reduce the energy dissipation. But it increases area by the increased numbers of processing elements. The major area of the design is used by multiplier which further increases with increase in input word width which is difficult for VLSI implementation. So a word width decomposition technique is used with these algorithms to keep the size of multipliers fixed irrespective of the width of input data. FFT algorithms are implemented with pipelining to increase throughput. To reduce energy and area due to the complex multipliers used in the design for multiplication with twiddle factors, distributed arithmetic is used to provide multiplier less architecture. To compensate speed performance parallel distributed arithmetic models are used. This dissertation also proposes method of optimization of the parameters at high level for these two kernel applications by constructing a high level energy model using specified algorithms and architectures. Results obtained from the model are compared with those obtained from low level simulation for estimation of error

    Designing parallel programs of parameterized granularity

    Get PDF
    corecore