1,887 research outputs found

    A Householder-based algorithm for Hessenberg-triangular reduction

    Full text link
    The QZ algorithm for computing eigenvalues and eigenvectors of a matrix pencil AλBA - \lambda B requires that the matrices first be reduced to Hessenberg-triangular (HT) form. The current method of choice for HT reduction relies entirely on Givens rotations regrouped and accumulated into small dense matrices which are subsequently applied using matrix multiplication routines. A non-vanishing fraction of the total flop count must nevertheless still be performed as sequences of overlapping Givens rotations alternately applied from the left and from the right. The many data dependencies associated with this computational pattern leads to inefficient use of the processor and poor scalability. In this paper, we therefore introduce a fundamentally different approach that relies entirely on (large) Householder reflectors partially accumulated into block reflectors, by using (compact) WY representations. Even though the new algorithm requires more floating point operations than the state of the art algorithm, extensive experiments on both real and synthetic data indicate that it is still competitive, even in a sequential setting. The new algorithm is conjectured to have better parallel scalability, an idea which is partially supported by early small-scale experiments using multi-threaded BLAS. The design and evaluation of a parallel formulation is future work

    Automatic Verification Of Linear Controller Software

    Get PDF
    Many safety-critical cyber-physical systems have a software-based controller at their core. Since the system behavior relies on the operation of the controller, it is imperative to ensure the correctness of the controller to have a high assurance for such systems. Nowadays, controllers are developed in a model-based fashion. Controller models are designed, and their performances are analyzed first at the model level. Once the control design is complete, software implementation is automatically generated from the mathematical model of the controller by a code generator. To assure the correctness of the controller implementation, it is necessary to check that the code generation is correctly done. Commercial code generators are complex black-box software that are generally not formally verified. Subtle bugs have been found in commercially available code generators that consequently generate incorrect code. In the absence of verified code generators, it is desirable to verify instances of implementations against their original models. Such verification is desired to be performed from the input-output perspective because correct implementations may have different state representations to each other for several possible reasons (e.g., code generator\u27s choice of state representation, optimization used in code generator and code transformation). In this dissertation, we propose several methods to verify a given controller implementation against its given model from the input-output perspective. First of all, we propose a method to derive assertions from the controller model, and check if the assertions are invariant to the controller implementation via a proposed toolchain based on a popular deductive program verification framework. Moreover, we propose an alternative more scalable method that extracts a model from the controller implementation using the symbolic execution technique, and compare the extracted model to the original controller model using state-of-the-art constraint solvers. Lastly, we extend our latter method to correctly account for the rounding errors in the floating-point computation of the controller implementation. We demonstrate the scalability of our proposed approaches through evaluation with randomly generated controller specifications of realistic size

    Up-to-date Interval Arithmetic From Closed Intervals to Connected Sets of Real Numbers

    Get PDF
    We consider biperiodic integral equations of the second kind with weakly singular kernels such as they arise in boundary integral equation methods. The equations are solved numerically using a collocation scheme based on trigonometric polynomials. The weak singularity is removed by a local change to polar coordinates. The resulting operators have smooth kernels and are discretized using the tensor product composite trapezodial rule. We prove stability and convergence of the scheme under suitable parameter choices, achieving algebraic convergence of any order under appropriate regularity assumptions. The method can be applied to typical boundary value problems such as potential and scattering problems both for bounded obstacles and for periodic surfaces. We present numerical results demonstrating that the expected convergence rates can be observed in practice

    Astaroth: Ohjelmistokirjasto stensiililaskentaan grafiikkasuorittimilla

    Get PDF
    Graphics processing units (GPUs) are coprocessors, which offer higher throughput and better power efficiency than central processing units in dataparallel tasks. For this reason, graphics processors provide a good platform for high-performance computing. However, programming GPUs such that all the available performance is utilized requires in-depth knowledge of the architecture of the hardware. Additionally, the problem of high-order stencil computations on GPUs in challenging multiphysics applications has not been adequately explored in previous work. In this thesis, we address these issues by presenting a library, an efficient algorithm and a domain-specific language for solving stencil computations within a structured grid. We tested our implementation by simulating magnetohydrodynamics, which involved the computation of first, second, and cross partial derivatives using second-, fourth-, sixth-, and eight-order finite differences with single and double precision. The running time of our integration kernel was 2.8–9.1 times slower than the theoretical minimum time, which it would take to read the computational domain and write it back to device memory exactly once, without taking into account the effects of finite caches or arithmetic operations on performance. Additionally, we made a performance comparison with a CPU solver widely used for scientific computations, which we benchmarked on a total of 24 cores of two Intel Xeon E5-2690 v3 processors. Our solver, benchmarked on a Tesla P100 PCIe GPU, outperformed the CPU solver by factors of 6.7 and 10.4 when using single and double precision, respectively.Grafiikkasuorittimet ovat apusuorittimia, jotka tarjoavat rinnakkain laskettavissa tehtävissä parempaa suoritus- ja energiatehokkuutta kuin keskussuorittimet. Tästä syystä grafiikkasuorittimet tarjoavat hyvän alustan suurteholaskennan tarpeisiin. Toisaalta grafiikkasuorittimen ohjelmointi siten, että kaikki tarjolla oleva suorituskyky saadaan hyödynnettyä, vaatii syvällistä asiantuntemusta ohjelmoitavan laitteiston arkkitehtuurista. Korkean asteen stensiililaskentaa haastavissa fysiikkasovelluksissa ei ole myöskään tutkittu laajalti aiemmissa julkaisuissa. Tässä työssä otamme kantaa näihin ongelmiin esittelemällä ohjelmistokirjaston, tehokkaan algoritmin, sekä tehtävään räätälöidyn ohjelmointikielen stensiililaskujen ratkaisemiseen säännöllisessä hilassa. Testasimme toteutustamme simuloimalla magnetohydrodynamiikkaa, johon kuului ensimmäisen ja toisen kertaluvun derivaattojen lisäksi ristiderivaattojen ratkaisutoisen, neljännen, kuudennen ja kahdeksannen kertaluvun differenssimenetelmällä käyttäen sekä 32- että 64-bittisiä liukulukuja. Integrointifunktiomme suoritusaika oli 2.8–9.1 kertaa hitaampi kuin teoreettinen vähimmäisajoaika, joka menisi laskennallisen alueen lukemiseen ja kirjoittamiseen apusuorittimen muistista täsmälleen kerran, ottamatta huomioon äärellisen välimuistin tai laskennan vaikutusta suoritusaikaan. Vertasimme kirjastomme suoritusaikaa laajalti tieteellisessä laskennassa käytettyyn keskussuorittimille tarkoitettuun ratkaisijaan, jonka ajoimme kokonaisuudessaan 24:llä ytimellä kahdella Intel Xeon E5-2690 v3 -suorittimella. Tähän ratkaisijaan verrattuna Tesla P100 PCIe -grafiikkasuorittimella ajettu ratkaisijamme oli 6.7 ja 10.4 kertaa nopeampi 32- ja 64-bittisillä liukuluvuilla laskettaessa, tässä järjestyksessä

    Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

    Full text link
    Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.Comment: Accepted for publication in Nature Communication

    Computational error bounds for multiple or nearly multiple eigenvalues

    Get PDF
    AbstractIn this paper bounds for clusters of eigenvalues of non-selfadjoint matrices are investigated. We describe a method for the computation of rigorous error bounds for multiple or nearly multiple eigenvalues, and for a basis of the corresponding invariant subspaces. The input matrix may be real or complex, dense or sparse. The method is based on a quadratically convergent Newton-like method; it includes the case of defective eigenvalues, uncertain input matrices and the generalized eigenvalue problem. Computational results show that verified bounds are still computed even if other eigenvalues or clusters are nearby the eigenvalues under consideration
    corecore