1,109 research outputs found

    Custom optimization algorithms for efficient hardware implementation

    No full text
    The focus is on real-time optimal decision making with application in advanced control systems. These computationally intensive schemes, which involve the repeated solution of (convex) optimization problems within a sampling interval, require more efficient computational methods than currently available for extending their application to highly dynamical systems and setups with resource-constrained embedded computing platforms. A range of techniques are proposed to exploit synergies between digital hardware, numerical analysis and algorithm design. These techniques build on top of parameterisable hardware code generation tools that generate VHDL code describing custom computing architectures for interior-point methods and a range of first-order constrained optimization methods. Since memory limitations are often important in embedded implementations we develop a custom storage scheme for KKT matrices arising in interior-point methods for control, which reduces memory requirements significantly and prevents I/O bandwidth limitations from affecting the performance in our implementations. To take advantage of the trend towards parallel computing architectures and to exploit the special characteristics of our custom architectures we propose several high-level parallel optimal control schemes that can reduce computation time. A novel optimization formulation was devised for reducing the computational effort in solving certain problems independent of the computing platform used. In order to be able to solve optimization problems in fixed-point arithmetic, which is significantly more resource-efficient than floating-point, tailored linear algebra algorithms were developed for solving the linear systems that form the computational bottleneck in many optimization methods. These methods come with guarantees for reliable operation. We also provide finite-precision error analysis for fixed-point implementations of first-order methods that can be used to minimize the use of resources while meeting accuracy specifications. The suggested techniques are demonstrated on several practical examples, including a hardware-in-the-loop setup for optimization-based control of a large airliner.Open Acces

    Architecture and performance of Devito, a system for automated stencil computation

    Get PDF
    Stencil computations are a key part of many high-performance computing applications, such as image processing, convolutional neural networks, and finite-difference solvers for partial differential equations. Devito is a framework capable of generating highly-optimized code given symbolic equations expressed in Python, specialized in, but not limited to, affine (stencil) codes. The lowering process -- from mathematical equations down to C++ code -- is performed by the Devito compiler through a series of intermediate representations. Several performance optimizations are introduced, including advanced common sub-expressions elimination, tiling and parallelization. Some of these are obtained through well-established stencil optimizers, integrated in the back-end of the Devito compiler. The architecture of the Devito compiler, as well as the performance optimizations that are applied when generating code, are presented. The effectiveness of such performance optimizations is demonstrated using operators drawn from seismic imaging applications

    Approximate logic synthesis: a survey

    Get PDF
    Approximate computing is an emerging paradigm that, by relaxing the requirement for full accuracy, offers benefits in terms of design area and power consumption. This paradigm is particularly attractive in applications where the underlying computation has inherent resilience to small errors. Such applications are abundant in many domains, including machine learning, computer vision, and signal processing. In circuit design, a major challenge is the capability to synthesize the approximate circuits automatically without manually relying on the expertise of designers. In this work, we review methods devised to synthesize approximate circuits, given their exact functionality and an approximability threshold. We summarize strategies for evaluating the error that circuit simplification can induce on the output, which guides synthesis techniques in choosing the circuit transformations that lead to the largest benefit for a given amount of induced error. We then review circuit simplification methods that operate at the gate or Boolean level, including those that leverage classical Boolean synthesis techniques to realize the approximations. We also summarize strategies that take high-level descriptions, such as C or behavioral Verilog, and synthesize approximate circuits from these descriptions

    Three-dimensional modelling and inversion of controlled source electromagnetic data

    No full text
    The marine Controlled Source Electromagnetic (CSEM) method is an important and almost self-contained discipline in the toolkit of methods used by geophysicists for probing the earth. It has increasingly attracted attention from industry during the past decade due to its potential in detecting valuable natural resources such as oil and gas. A method for three-dimensional CSEM modelling in the frequency domain is presented. The electric field is decomposed in primary and secondary components, as this leads to a more stable solution near the source position. The primary field is computed using a resistivity model for which a closed form of solution exists, for example a homogeneous or layered resistivity model. The secondary electric field is computed by discretizing a second order partial differential equation for the electric field, also referred in the literature as the vector Helmholtz equation, using the edge finite element method. A range of methods for the solution of the linear system derived from the edge finite element discretization are investigated. The magnetic field is computed subsequently, from the solution for the electric field, using a local finite difference approximation of Faraday’s law and an interpolation method. Tests, that compare the solution obtained using the presented method with the solution computed using alternative codes for 1D and 3D synthetic models, show that the implemented approach is suitable for CSEM forward modelling and is an alternative to existing codes. An algorithm for 3D inversion of CSEM data in the frequency domain was developed and implemented. The inverse problem is solved using the L-BFGS method and is regularized with a smoothing constraint. The inversion algorithm uses the presented forward modelling scheme for the computation of the field responses and the adjoint field for the computation of the gradient of the misfit function. The presented algorithm was tested for a synthetic example, showing that it is capable of reconstructing a resistivity model which fits the synthetic data and is close to the original resistivity model in the least-squares sense. Inversion of CSEM data is known to lead to images with low spatial resolution. It is well known that integration with complementary data sets mitigates this problem. It is presented an algorithm for the integration of an acoustic velocity model, which is known a priori, in the inversion scheme. The algorithm was tested in a synthetic example and the results demonstrate that the presented methodology is promising for the improvement of resistivity models obtained from CSEM data

    Group implicit concurrent algorithms in nonlinear structural dynamics

    Get PDF
    During the 70's and 80's, considerable effort was devoted to developing efficient and reliable time stepping procedures for transient structural analysis. Mathematically, the equations governing this type of problems are generally stiff, i.e., they exhibit a wide spectrum in the linear range. The algorithms best suited to this type of applications are those which accurately integrate the low frequency content of the response without necessitating the resolution of the high frequency modes. This means that the algorithms must be unconditionally stable, which in turn rules out explicit integration. The most exciting possibility in the algorithms development area in recent years has been the advent of parallel computers with multiprocessing capabilities. So, this work is mainly concerned with the development of parallel algorithms in the area of structural dynamics. A primary objective is to devise unconditionally stable and accurate time stepping procedures which lend themselves to an efficient implementation in concurrent machines. Some features of the new computer architecture are summarized. A brief survey of current efforts in the area is presented. A new class of concurrent procedures, or Group Implicit algorithms is introduced and analyzed. The numerical simulation shows that GI algorithms hold considerable promise for application in coarse grain as well as medium grain parallel computers

    Matrix Polynomials and their Lower Rank Approximations

    Get PDF
    This thesis is a wide ranging work on computing a “lower-rank” approximation of a matrix polynomial using second-order non-linear optimization techniques. Two notions of rank are investigated. The first is the rank as the number of linearly independent rows or columns, which is the classical definition. The other notion considered is the lowest rank of a matrix polynomial when evaluated at a complex number, or the McCoy rank. Together, these two notions of rank allow one to compute a nearby matrix polynomial where the structure of both the left and right kernels is prescribed, along with the structure of both the infinite and finite eigenvalues. The computational theory of the calculus of matrix polynomial valued functions is developed and used in optimization algorithms based on second-order approximations. Special functions studied with a detailed error analysis are the determinant and adjoint of matrix polynomials. The unstructured and structured variants of matrix polynomials are studied in a very general setting in the context of an equality constrained optimization problem. The most general instances of these optimization problems are NP hard to approximate solutions to in a global setting. In most instances we are able to prove that solutions to our optimization problems exist (possibly at infinity) and discuss techniques in conjunction with an implementation to compute local minimizers to the problem. Most of the analysis of these problems is local and done through the Karush-Kuhn-Tucker optimality conditions for constrained optimization problems. We show that most formulations of the problems studied satisfy regularity conditions and admit Lagrange multipliers. Furthermore, we show that under some formulations that the second-order sufficient condition holds for instances of interest of the optimization problems in question. When Lagrange multipliers do not exist, we discuss why, and if it is reasonable to do so, how to regularize the problem. In several instances closed form expressions for the derivatives of matrix polynomial valued functions are derived to assist in analysis of the optimality conditions around a solution. From this analysis it is shown that variants of Newton’s method will have a local rate of convergence that is quadratic with a suitable initial guess for many problems. The implementations are demonstrated on some examples from the literature and several examples are cross-validated with different optimization formulations of the same mathematical problem. We conclude with a special application of the theory developed in this thesis is computing a nearby pair of differential polynomials with a non-trivial greatest common divisor, a non-commutative symbolic-numeric computation problem. We formulate this problem as finding a nearby structured matrix polynomial that is rank deficient in the classical sense

    Paved with Good Intentions: Analysis of a Randomized Block Kaczmarz Method

    Get PDF
    The block Kaczmarz method is an iterative scheme for solving overdetermined least-squares problems. At each step, the algorithm projects the current iterate onto the solution space of a subset of the constraints. This paper describes a block Kaczmarz algorithm that uses a randomized control scheme to choose the subset at each step. This algorithm is the first block Kaczmarz method with an (expected) linear rate of convergence that can be expressed in terms of the geometric properties of the matrix and its submatrices. The analysis reveals that the algorithm is most effective when it is given a good row paving of the matrix, a partition of the rows into well-conditioned blocks. The operator theory literature provides detailed information about the existence and construction of good row pavings. Together, these results yield an efficient block Kaczmarz scheme that applies to many overdetermined least-squares problem
    corecore