129 research outputs found

    Building Models and Building Modelling

    Get PDF

    Design and Development of an FPGA-based Hardware Accelerator for Corner Feature Extraction and Genetic Algorithm-based SLAM System

    Get PDF
    Simultaneous Localization and Mapping (SLAM) systems are crucial parts of mobile robots. These systems require a large number of computing units, have significant real-time requirements and are also a vital factor which can determine the stability, operability and power consumption of robots. This thesis aims to improve the calculation speed of a lidar-based SLAM system in domestic scenes, reduce the power consumption of the SLAM algorithm, and reduce the overall cost of the whole platform. Lightweight, low-power and parallel optimization of SLAM algorithms are researched. In the thesis, two SLAM systems are designed and developed with a focus on energy-efficient and fast hardware-level design: a geometric method based on corner extraction and a genetic algorithm-based approach. Finally, an FPGA-based hardware accelerated SLAM is implemented and realized, and compared to a software-based system. As for the front-end SLAM system, a method of using a Corner Feature Extraction (CFE) algorithm on FPGA platforms is first proposed to improve the speed of the feature extraction. Considering building a back-end SLAM system with low power consumption, a SLAM system based on genetic algorithm combined with algorithms such as Extended Kalman Filter (EKF) and FastSLAM to reduce the amount of calculation in the SLAM system is also proposed. Finally, the thesis also proposes and implements an adaptive feature map which can replace a grid point map to reduce the amount of calculation and utilization of hardware resources. In this thesis, the lidar SLAM system with front-end and back-end parts mentioned above is implemented on the Xilinx PYNQ Z2 Platform. The implementation is operated on a mobile robot prototype and evaluated in real scenes. Compared with the implementation on the Raspberry Pi 3B+, the implementation in this thesis can save 86.25% of power consumption. The lidar SLAM system only takes 20 ms for location calculation in each scan which is 5.31 times faster compared with the software implementation with EKF

    Doctor of Philosophy

    Get PDF
    dissertationDataflow pipeline models are widely used in visualization systems. Despite recent advancements in parallel architecture, most systems still support only a single CPU or a small collection of CPUs such as a SMP workstation. Even for systems that are specifically tuned towards parallel visualization, their execution models only provide support for data-parallelism while ignoring taskparallelism and pipeline-parallelism. With the recent popularization of machines equipped with multicore CPUs and multi-GPU units, these visualization systems are undoubtedly falling further behind in reaching maximum efficiency. On the other hand, there exist several libraries that can schedule program executions on multiple CPUs and/or multiple GPUs. However, due to differences in executing a task graph and a pipeline along with their APIs being considerably low-level, it still remains a challenge to integrate these run-time libraries into current visualization systems. Thus, there is a need for a redesigned dataflow architecture to fully support and exploit the power of highly parallel machines in large-scale visualization. The new design must be able to schedule executions on heterogeneous platforms while at the same time supporting arbitrarily large datasets through the use of streaming data structures. The primary goal of this dissertation work is to develop a parallel dataflow architecture for streaming large-scale visualizations. The framework includes supports for platforms ranging from multicore processors to clusters consisting of thousands CPUs and GPUs. We achieve this in our system by introducing the notion of Virtual Processing Elements and Task-Oriented Modules along with a highly customizable scheduler that controls the assignment of tasks to elements dynamically. This creates an intuitive way to maintain multiple CPU/GPU kernels yet still provide coherency and synchronization across module executions. We have implemented these techniques into HyperFlow which is made of an API with all basic dataflow constructs described in the dissertation, and a distributed run-time library that can be used to deploy those pipelines on multicore, multi-GPU and cluster-based platforms

    GPU Accelerated protocol analysis for large and long-term traffic traces

    Get PDF
    This thesis describes the design and implementation of GPF+, a complete general packet classification system developed using Nvidia CUDA for Compute Capability 3.5+ GPUs. This system was developed with the aim of accelerating the analysis of arbitrary network protocols within network traffic traces using inexpensive, massively parallel commodity hardware. GPF+ and its supporting components are specifically intended to support the processing of large, long-term network packet traces such as those produced by network telescopes, which are currently difficult and time consuming to analyse. The GPF+ classifier is based on prior research in the field, which produced a prototype classifier called GPF, targeted at Compute Capability 1.3 GPUs. GPF+ greatly extends the GPF model, improving runtime flexibility and scalability, whilst maintaining high execution efficiency. GPF+ incorporates a compact, lightweight registerbased state machine that supports massively-parallel, multi-match filter predicate evaluation, as well as efficient arbitrary field extraction. GPF+ tracks packet composition during execution, and adjusts processing at runtime to avoid redundant memory transactions and unnecessary computation through warp-voting. GPF+ additionally incorporates a 128-bit in-thread cache, accelerated through register shuffling, to accelerate access to packet data in slow GPU global memory. GPF+ uses a high-level DSL to simplify protocol and filter creation, whilst better facilitating protocol reuse. The system is supported by a pipeline of multi-threaded high-performance host components, which communicate asynchronously through 0MQ messaging middleware to buffer, index, and dispatch packet data on the host system. The system was evaluated using high-end Kepler (Nvidia GTX Titan) and entry level Maxwell (Nvidia GTX 750) GPUs. The results of this evaluation showed high system performance, limited only by device side IO (600MBps) in all tests. GPF+ maintained high occupancy and device utilisation in all tests, without significant serialisation, and showed improved scaling to more complex filter sets. Results were used to visualise captures of up to 160 GB in seconds, and to extract and pre-filter captures small enough to be easily analysed in applications such as Wireshark

    Proceedings of SAT Competition 2021 : Solver and Benchmark Descriptions

    Get PDF
    Non peer reviewe

    The hArtes Tool Chain

    Get PDF
    This chapter describes the different design steps needed to go from legacy code to a transformed application that can be efficiently mapped on the hArtes platform

    Scientific Grand Challenges: Crosscutting Technologies for Computing at the Exascale - February 2-4, 2010, Washington, D.C.

    Full text link
    The goal of the "Scientific Grand Challenges - Crosscutting Technologies for Computing at the Exascale" workshop in February 2010, jointly sponsored by the U.S. Department of Energy’s Office of Advanced Scientific Computing Research and the National Nuclear Security Administration, was to identify the elements of a research and development agenda that will address these challenges and create a comprehensive exascale computing environment. This exascale computing environment will enable the science applications identified in the eight previously held Scientific Grand Challenges Workshop Series
    corecore