4 research outputs found

    Design and implementation of an FPGA-based piecewise affine Kalman Filter for Cyber-Physical Systems

    Get PDF
    The Kalman Filter is a robust tool often employed as a process observer in Cyber-Physical Systems. However, in the general case the high computational cost, especially for large plant models or fast sample rates, makes it an impractical choice for typical low-power microcontrollers. Furthermore, although industry trends towards tighter integration are supported by powerful high-end System-on-Chip software processors, this consolidation complicates the ability for a controls engineer to verify correct behavior of the system under all conditions, which is important in safety-critical systems and systems demanding a high degree of reliability. Dedicated Field-Programmable Gate Array (FPGA) hardware can provide application speedup, design partitioning in mixed-criticality systems, and fully deterministic timing, which helps ensure a control system behaves identically to offline simulations. This dissertation presents a new design methodology which can be leveraged to yield such benefits. Although this dissertation focuses on the Kalman Filter, the method is general enough to be extended to other compute-intensive algorithms which rely on state-space modeling. For the first part, the core idea is that decomposing the Kalman Filter algorithm from a strictly linear perspective leads to a more generalized architecture with increased performance compared to approaches which focus on nonlinear filters (e.g. Extended Kalman Filter). Our contribution is a broadly-applicable hardware-software architecture for a linear Kalman Filter whose operating domain is extended through online model swapping. A supporting application-agnostic performance and resource analysis is provided. For the second part, we identify limitations of the mixed hardware-software method and demonstrate how to leverage hardware-based region identification in order to develop a strictly hardware-only Kalman Filter which maintains a large operating domain. The resulting hardware processor is partitioned from low criticality software tasks running on a supervising software processor and enables vastly simplified timing validation

    A statically scheduling compiler for a parameterized numerical accelerator

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 99-100).In this work, I present a statically scheduling compiler for a numerical accelerator that parallelizes and maps algorithms to instances of a processor template. The processor template that makes up the numerical accelerator is a collection of floating point units (FPUs) connected to memories through an interconnect structure. The task of the compiler is to create schedules for the interconnect structure and the memories to perform the desired algorithm as fast as possible. The compiler does this by representing the algorithm as a data flow graph (DFG) and scheduling the graph using depth-first list scheduling. The compiler then assigns memory addresses to the intermediate values of the DFG through a graph coloring heuristic to avoid structural hazards. The final result of the compiler is a set of instructions that can be loaded onto an instance of the processor template to create the desired numerical accelerator. This work also covers how algorithms are inputted into the compiler. One of the methods of algorithm input uses C++ function templates that express the numerical algorithm on a template data type. That type can be replaced with the graphMaker class to create a DFG, or it can be replaced with float, double, or int to check the correctness and the performance of the algorithm with different precisions. This enables algorithm designers to create a single version of the algorithm for both simulation and compilation. Multiple algorithms were compiled with this custom compiler to show how schedules change with the size of the processor, to show how the distribution of units within processors can be specialized for algorithms, and to show how algorithms can be optimized by the compiler to rival hand optimization of algorithms.by Andrew Charles Wright.S.M

    Template-based hardware-software codesign for high-performance embedded numerical accelerators

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 129-132).Sophisticated algorithms for control, state estimation and equalization have tremendous potential to improve performance and create new capabilities in embedded and mobile systems. Traditional implementation approaches are not well suited for porting these algorithmic solutions into practical implementations within embedded system constraints. Most of the technical challenges arise from design approach that manipulates only one level in the design stack, thus being forced to conform to constraints imposed by other levels without question. In tightly constrained environments, like embedded and mobile systems, such approaches have a hard time efficiently delivering and delivering efficiency. In this work we offer a solution that cuts through all the design stack layers. We build flexible structures at the hardware, software and algorithm level, and approach the solution through design space exploration. To do this efficiently we use a template-based hardware-software development flow. The main incentive for template use is, as in software development, to relax the generality vs. efficiency/performance type tradeoffs that appear in solutions striving to achieve run-time flexibility. As a form of static polymorphism, templates typically incur very little performance overhead once the design is instantiated, thus offering the possibility to defer many design decisions until later stages when more is known about the overall system design. However, simply including templates into design flow is not sufficient to result in benefits greater than some level of code reuse. In our work we propose using templates as flexible interfaces between various levels in the design stack. As such, template parameters become the common language that designers at different levels of design hierarchy can use to succinctly express their assumptions and ideas. Thus, it is of great benefit if template parameters map directly and intuitively into models at every level. To showcase the approach we implement a numerical accelerator for embedded Model Predictive Control (MPC) algorithm. While most of this work and design flow are quite general, their full power is realized in search for good solutions to a specific problem. This is best understood in direct comparison with recent works on embedded and high-speed MPC implementations. The controllers we generate outperform published works by a handsome margin in both speed and power consumption, while taking very little time to generate.by Ranko Radovin Sredojević.Ph.D

    Custom optimization algorithms for efficient hardware implementation

    No full text
    The focus is on real-time optimal decision making with application in advanced control systems. These computationally intensive schemes, which involve the repeated solution of (convex) optimization problems within a sampling interval, require more efficient computational methods than currently available for extending their application to highly dynamical systems and setups with resource-constrained embedded computing platforms. A range of techniques are proposed to exploit synergies between digital hardware, numerical analysis and algorithm design. These techniques build on top of parameterisable hardware code generation tools that generate VHDL code describing custom computing architectures for interior-point methods and a range of first-order constrained optimization methods. Since memory limitations are often important in embedded implementations we develop a custom storage scheme for KKT matrices arising in interior-point methods for control, which reduces memory requirements significantly and prevents I/O bandwidth limitations from affecting the performance in our implementations. To take advantage of the trend towards parallel computing architectures and to exploit the special characteristics of our custom architectures we propose several high-level parallel optimal control schemes that can reduce computation time. A novel optimization formulation was devised for reducing the computational effort in solving certain problems independent of the computing platform used. In order to be able to solve optimization problems in fixed-point arithmetic, which is significantly more resource-efficient than floating-point, tailored linear algebra algorithms were developed for solving the linear systems that form the computational bottleneck in many optimization methods. These methods come with guarantees for reliable operation. We also provide finite-precision error analysis for fixed-point implementations of first-order methods that can be used to minimize the use of resources while meeting accuracy specifications. The suggested techniques are demonstrated on several practical examples, including a hardware-in-the-loop setup for optimization-based control of a large airliner.Open Acces
    corecore