349 research outputs found

    Predictive control using an FPGA with application to aircraft control

    Get PDF
    Alternative and more efficient computational methods can extend the applicability of MPC to systems with tight real-time requirements. This paper presents a “system-on-a-chip” MPC system, implemented on a field programmable gate array (FPGA), consisting of a sparse structure-exploiting primal dual interior point (PDIP) QP solver for MPC reference tracking and a fast gradient QP solver for steady-state target calculation. A parallel reduced precision iterative solver is used to accelerate the solution of the set of linear equations forming the computational bottleneck of the PDIP algorithm. A numerical study of the effect of reducing the number of iterations highlights the effectiveness of the approach. The system is demonstrated with an FPGA-inthe-loop testbench controlling a nonlinear simulation of a large airliner. This study considers many more manipulated inputs than any previous FPGA-based MPC implementation to date, yet the implementation comfortably fits into a mid-range FPGA, and the controller compares well in terms of solution quality and latency to state-of-the-art QP solvers running on a standard PC

    Predictive control using an FPGA with application to aircraft control

    Get PDF
    Alternative and more efficient computational methods can extend the applicability of MPC to systems with tight real-time requirements. This paper presents a ``system-on-a-chip'' MPC system, implemented on a field programmable gate array (FPGA), consisting of a sparse structure-exploiting primal dual interior point (PDIP) QP solver for MPC reference tracking and a fast gradient QP solver for steady-state target calculation. A parallel reduced precision iterative solver is used to accelerate the solution of the set of linear equations forming the computational bottleneck of the PDIP algorithm. A numerical study of the effect of reducing the number of iterations highlights the effectiveness of the approach. The system is demonstrated with an FPGA-in-the-loop testbench controlling a nonlinear simulation of a large airliner. This study considers many more manipulated inputs than any previous FPGA-based MPC implementation to date, yet the implementation comfortably fits into a mid-range FPGA, and the controller compares well in terms of solution quality and latency to state-of-the-art QP solvers running on a standard PC.This work was supported by EPSRC (Grants EP/G030308/1, EP/G031576/1 and EP/I012036/1) and the EU FP7 Project EMBOCON grant agreement number FP7-ICT-2009-4 248940, as well as industrial support from Xilinx, the Mathworks, and the European Space Agency.This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at: http://dx.doi.org/10.1109/TCST.2013.2271791. Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]

    O(N) methods in electronic structure calculations

    Full text link
    Linear scaling methods, or O(N) methods, have computational and memory requirements which scale linearly with the number of atoms in the system, N, in contrast to standard approaches which scale with the cube of the number of atoms. These methods, which rely on the short-ranged nature of electronic structure, will allow accurate, ab initio simulations of systems of unprecedented size. The theory behind the locality of electronic structure is described and related to physical properties of systems to be modelled, along with a survey of recent developments in real-space methods which are important for efficient use of high performance computers. The linear scaling methods proposed to date can be divided into seven different areas, and the applicability, efficiency and advantages of the methods proposed in these areas is then discussed. The applications of linear scaling methods, as well as the implementations available as computer programs, are considered. Finally, the prospects for and the challenges facing linear scaling methods are discussed.Comment: 85 pages, 15 figures, 488 references. Resubmitted to Rep. Prog. Phys (small changes

    Genetic Improvement of Software: a Comprehensive Survey

    Get PDF
    Genetic improvement (GI) uses automated search to find improved versions of existing software. We present a comprehensive survey of this nascent field of research with a focus on the core papers in the area published between 1995 and 2015. We identified core publications including empirical studies, 96% of which use evolutionary algorithms (genetic programming in particular). Although we can trace the foundations of GI back to the origins of computer science itself, our analysis reveals a significant upsurge in activity since 2012. GI has resulted in dramatic performance improvements for a diverse set of properties such as execution time, energy and memory consumption, as well as results for fixing and extending existing system functionality. Moreover, we present examples of research work that lies on the boundary between GI and other areas, such as program transformation, approximate computing, and software repair, with the intention of encouraging further exchange of ideas between researchers in these fields

    Compilation Techniques for High-Performance Embedded Systems with Multiple Processors

    Get PDF
    Institute for Computing Systems ArchitectureDespite the progress made in developing more advanced compilers for embedded systems, programming of embedded high-performance computing systems based on Digital Signal Processors (DSPs) is still a highly skilled manual task. This is true for single-processor systems, and even more for embedded systems based on multiple DSPs. Compilers often fail to optimise existing DSP codes written in C due to the employed programming style. Parallelisation is hampered by the complex multiple address space memory architecture, which can be found in most commercial multi-DSP configurations. This thesis develops an integrated optimisation and parallelisation strategy that can deal with low-level C codes and produces optimised parallel code for a homogeneous multi-DSP architecture with distributed physical memory and multiple logical address spaces. In a first step, low-level programming idioms are identified and recovered. This enables the application of high-level code and data transformations well-known in the field of scientific computing. Iterative feedback-driven search for “good” transformation sequences is being investigated. A novel approach to parallelisation based on a unified data and loop transformation framework is presented and evaluated. Performance optimisation is achieved through exploitation of data locality on the one hand, and utilisation of DSP-specific architectural features such as Direct Memory Access (DMA) transfers on the other hand. The proposed methodology is evaluated against two benchmark suites (DSPstone & UTDSP) and four different high-performance DSPs, one of which is part of a commercial four processor multi-DSP board also used for evaluation. Experiments confirm the effectiveness of the program recovery techniques as enablers of high-level transformations and automatic parallelisation. Source-to-source transformations of DSP codes yield an average speedup of 2.21 across four different DSP architectures. The parallelisation scheme is – in conjunction with a set of locality optimisations – able to produce linear and even super-linear speedups on a number of relevant DSP kernels and applications

    Custom optimization algorithms for efficient hardware implementation

    No full text
    The focus is on real-time optimal decision making with application in advanced control systems. These computationally intensive schemes, which involve the repeated solution of (convex) optimization problems within a sampling interval, require more efficient computational methods than currently available for extending their application to highly dynamical systems and setups with resource-constrained embedded computing platforms. A range of techniques are proposed to exploit synergies between digital hardware, numerical analysis and algorithm design. These techniques build on top of parameterisable hardware code generation tools that generate VHDL code describing custom computing architectures for interior-point methods and a range of first-order constrained optimization methods. Since memory limitations are often important in embedded implementations we develop a custom storage scheme for KKT matrices arising in interior-point methods for control, which reduces memory requirements significantly and prevents I/O bandwidth limitations from affecting the performance in our implementations. To take advantage of the trend towards parallel computing architectures and to exploit the special characteristics of our custom architectures we propose several high-level parallel optimal control schemes that can reduce computation time. A novel optimization formulation was devised for reducing the computational effort in solving certain problems independent of the computing platform used. In order to be able to solve optimization problems in fixed-point arithmetic, which is significantly more resource-efficient than floating-point, tailored linear algebra algorithms were developed for solving the linear systems that form the computational bottleneck in many optimization methods. These methods come with guarantees for reliable operation. We also provide finite-precision error analysis for fixed-point implementations of first-order methods that can be used to minimize the use of resources while meeting accuracy specifications. The suggested techniques are demonstrated on several practical examples, including a hardware-in-the-loop setup for optimization-based control of a large airliner.Open Acces

    Methods for variational computation of molecular properties on near term quantum computers

    Get PDF
    In this thesis we explore the near term applications of quantum computing to Quantum Chemistry problems, with a focus on electronic structure calculations. We begin by discussing the core subroutine of near-term quantum computing methods: the variational quantum eigensolver (VQE). By drawing upon the literature, we discuss the relevance of the method in computing electronic structure properties, compare it to alternative conventional or quantum methods and outline best practices. We then discuss the key limitations of this method, namely: the exploding number of measurements required, showing that parallelisation will be relevant for VQE to compete with conventional methods; the barren plateau problem; and the management of errors through error mitigation - we present a light touch error mitigation technique which is used to improve the results of experiments presented later in the thesis. From this point, we propose three methods for near term applications of quantum computing, with a focus on limiting the requirements on quantum resources. The first two methods concern the computation of ground state energy. We adapt the conventional methods of complete active space self consistent field (CASSCF) and energy-weighted density matrix embedding theory (EwDMET) by integrating a VQE subroutine to compute the electronic wavefunctions from which reduced density matrices are sampled. These method allow recovering additional electron correlation energy for a given number of qubits and are tested on quantum devices. The last method is focused on computing excited electronic states and uses techniques inspired from the generative adversarial machine learning literature. It is a fully variational method, which is shown to work on current quantum devices

    Addressing concerns in performance prediction : the impact of data dependencies and denormal arithmetic in scientific codes

    Get PDF
    To meet the increasing computational requirements of the scientific community, the use of parallel programming has become commonplace, and in recent years distributed applications running on clusters of computers have become the norm. Both parallel and distributed applications face the problem of predictive uncertainty and variations in runtime. Modern scientific applications have varying I/O, cache, and memory profiles that have significant and difficult to predict effects on their runtimes. Data-dependent sensitivities such as the costs of denormal floating point calculations introduce more variations in runtime, further hindering predictability. Applications with unpredictable performance or which have highly variable runtimes can cause several problems. If the runtime of an application is unknown or varies widely, workflow schedulers cannot e�ciently allocate them to compute nodes, leading to the under-utilisation of expensive resources. Similarly, a lack of accurate knowledge of the performance of an application on new hardware can lead to misguided procurement decisions. In heavily parallel applications, minor variations in runtime on individual nodes can have disproportionate effects on the overall application runtime. Even on a smaller scale, a lack of certainty about an application's runtime can preclude its use in real-time or time-critical applications such as clinical diagnosis. This thesis investigates two sources of data-dependent performance variability. The first source is algorithmic and is seen in a state-of-the-art C++ biomedical imaging application. It identifies the cause of the variability in the application and develops a means of characterising the variability. This 'probe task' based model is adapted for use with a workflow scheduler, and the scheduling improvements it brings are examined. The second source of variability is more subtle as it is micro-architectural in nature. Depending on the input data, two runs of an application executing exactly the same sequence of instructions and with exactly the same memory access patterns can have large differences in runtime due to deficiencies in common hardware implementations of denormal arithmetic1. An exception-based profiler is written to detect occurrences of denormal arithmetic and it is shown how this is insufficient to isolate the sources of denormal arithmetic in an application. A novel tool based on theValgrind binary instrumentation framework is developed which can trace the origins of denormal values and the frequency of their occurrence in an application's data structures. This second tool is used to isolate and remove the cause of denormal arithmetic both from a simple numerical code, and then from a face recognition application

    GPU accelerated multispectral EO imagery optimised CCSDS-123 lossless compression implementation

    Get PDF
    Continual advancements in Earth Observation (EO) optical imager payloads has led to a significant increase in the volume of multispectral data generated onboard EO satellites. As a result, a growing onboard data bottleneck need to be alleviated. One technique commonly used is onboard image compression. However, the performance of traditional space qualified processors, such as radiation hardened FPGAs, are not able to meet current nor future onboard data processing requirements. Therefore, a new high capability hardware architecture is required. In previous work a new GPU accelerated scalable heterogeneous hardware architecture for onboard data processing was proposed. In this paper, two new CUDA GPU implementations of the state-of-the-art lossless multidimensional image compression algorithm CCSDS-123, are discussed. The first implementation is a generic CUDA implementation of the CCSDS-123 algorithm whilst the second is optimised specifically for multispectral EO imagery. Both implementations utilise image tiling to leverage an additional axis for algorithm parallelisation to increase processing throughput. The CUDA implementation and optimisation techniques deployed are discussed in the paper. In addition, compression ratio and throughput performance results are presented for each implementation. Further experimental studies into the relationships between algorithm user definable compression parameters, tile sizes, tile dimensions and the achieved compression ratio and throughput, were performed
    • …
    corecore