    Computing the Conditioning of the Components of a Linear Least Squares Solution

    In this paper, we address the accuracy of the results for the overdetermined full rank linear least squares problem. We recall theoretical results obtained in Arioli, Baboulin and Gratton, SIMAX 29(2):413--433, 2007, on conditioning of the least squares solution and the components of the solution when the matrix perturbations are measured in Frobenius or spectral norms. Then we define computable estimates for these condition numbers and we interpret them in terms of statistical quantities. In particular, we show that, in the classical linear statistical model, the ratio of the variance of one component of the solution by the variance of the right-hand side is exactly the condition number of this solution component when perturbations on the right-hand side are considered. We also provide fragment codes using LAPACK routines to compute the variance-covariance matrix and the least squares conditioning and we give the corresponding computational cost. Finally we present a small historical numerical example that was used by Laplace in Theorie Analytique des Probabilites, 1820, for computing the mass of Jupiter and experiments from the space industry with real physical data

    Solving large dense linear least squares problems on parallel distributed computers. Application to the Earth's gravity field computation.

    Dans cette thèse, nous présentons le résultat de nos recherches dans le domaine du calcul scientifique haute performance pour les moindres carrés linéaires. En particulier, nous nous intéressons au développement de logiciels parallèles efficaces permettant de traiter des problèmes de moindres carrés denses de très grande taille. Nous fournissons également des outils numériques permettant d'étudier la qualité de la solution. Cette thèse est aussi une contribution au projet GOCE1 dont l'objectif est de fournir un modèle très précis du champ de gravité terrestre. Le lancement de ce satellite est prévu pour 2007 et à cet égard, notre travail constitue une étape dans la définition d'algorithmes pour ce projet. Nous présentons d'abord les stratégies numériques susceptibles d'être utilisées pour mettre à jour la solution en prenant en compte des nouvelles observations fournies par GOCE. Puis nous décrivons un solveur parallèle distribué que nous avons développé afin d'être intégré dans le logiciel du CNES2 chargé de la détermination d'orbite et du calcul de champ de gravité. Les performances de notre solveur sont compétitives par rapport à celles des librairies parallèles standards ScaLAPACK et PLAPACK sur les machines opérationnelles utilisées dans l'industrie spatiale, tout en nécessitant un stockage mémoire deux fois moindre grâce à la prise en compte des symétries du problème. Afin d'améliorer le passage à l'échelle et la portabilité de notre solveur, nous définissons un format « packed » distribué qui repose sur des noyaux ScaLAPACK. Cette approche constitue une amélioration significative car il n'existe pas à ce jour de format « packed » distribué pour les matrices symétriques et triangulaires denses. Nous présentons les exemples pour la factorisation de Cholesky et la mise à jour d'une factorisation QR. Ce format peut être aisément étendu à d'autres opérations d'algèbre linéaire. Cette thèse propose enfin des résultats nouveaux dans le domaine de l'analyse de sensibilité des moindres carrés linéaires résultant de problèmes d'estimation de paramètres. Nous proposons notamment une formule exacte, des bornes précises et des estimateurs statistiques pour évaluer le conditionnement d'une fonction linéaire de la solution d'un problème de moindres carrés. Le choix entre ces différentes formules dépendra de la taille du problème et du niveau de précision souhaité. ABSTRACT : In this thesis, we present our research in high performance scientific computing for linear least squares. More precisely we are concerned with developing efficient parallel software that can solve very large dense linear least squares problems and with providing numerical tools that can assess the quality of the solution. This thesis is also a contribution to the GOCE3 mission that strives for a very accurate model of the Earth's gravity field. This satellite is scheduled for launch in 2007 and in this respect, our work represents a step in the definition of algorithms for the project. We present an overview of the numerical strategies that can be used for updating the solution with new observations coming from GOCE mesurements. Then we describe a parallel distributed solver that we implemented in order to be used in the CNES4 software package for orbit determination and gravity field computation. This solver compares well in terms of performance with the standard parallel libraries ScaLAPACK and PLAPACK on the operational platforms used in the space industry while saving about half the memory, thanks to taking into account the symmetry of the problem. In order to improve the scalability and the portability of our solver, we define a packed distributed format that is based on ScaLAPACK kernel routines. This approach is a significant improvement since there is no packed distributed format available for symmetric or triangular matrices in the existing dense parallel libraries. Examples are given for the Cholesky factorization and for the updating of a QR factorization. This format can easily be extended to other linear algebra calculations. This thesis also contains new results in the area of sensitivity analysis for linear least squares resulting from parameter estimation problems. Specifically we provide a closed formula, bounds of correct order of magnitude and also statistical estimates that enable us to evaluate the condition number of linear functionals of least squares solution. The choice between the different expressions will depend on the problem size and on the desired level of accuracy

    Efficient computation of condition estimates for linear least squares problems

    Linear least squares (LLS) is a classical linear algebra problem in scientific computing, arising for instance in many parameter estimation problems. In addition to computing efficiently LLS solutions, an important issue is to assess the numerical quality of the computed solution. The notion of conditioning provides a theoretical framework that can be used to measure the numerical sensitivity of a problem solution to perturbations in its data. We recall some results for least squares conditioning and we derive a statistical estimate for the conditioning of an LLS solution. We present numerical experiments to compare exact values and statistical estimates. We also propose performance results using new routines on top of the multicore-GPU library MAGMA. This set of routines is based on an efficient computation of the variance-covariance matrix for which, to our knowledge, there is no implementation in current public domain libraries LAPACK and ScaLAPACK

    Accelerating linear system solutions using randomization technique

    International audienceWe illustrate how linear algebra calculations can be enhanced by statistical techniques in the case of a square linear system Ax = b. We study a random transformation of A that enables us to avoid pivoting and then to reduce the amount of communication. Numerical experiments show that this randomization can be performed at a very affordable computational price while providing us with a satisfying accuracy when compared to partial pivoting. This random transformation called Partial Random Butterfly Transformation (PRBT) is optimized in terms of data storage and flops count. We propose a solver where PRBT and the LU factorization with no pivoting take advantage of the current hybrid multicore/GPU machines and we compare its Gflop/s performance with a solver implemented in a current parallel library

    Using dual techniques to derive componentwise and mixed condition numbers for a linear functional of a linear least squares solution

    We prove duality results for adjoint operators and product norms in the framework of Euclidean spaces. We show how these results can be used to derive condition numbers especially when perturbations on data are measured componentwise relatively to the original data. We apply this technique to obtain formulas for componentwise and mixed condition numbers for a linear functional of a linear least squares solution. These expressions are closed when perturbations of the solution are measured using a componentwise norm or the in nity norm and we get an upper bound for the Euclidean norm

    Résolutions rapides et fiables pour les solveurs d'algèbre linéaire numérique en calcul haute performance.

    Towards dense linear algebra for hybrid GPU accelerated manycore systems

    Get PDF
    If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced with accelerators! This is happening today as accelerators, in particular Graphical Processing Units (GPUs), are steadily making their way into the high performance computing (HPC) world. We highlight the trends leading to the idea of hybrid manycore/GPU systems, and we present a set of techniques that can be used to e ciently program them. The presentation is in the context of Dense Linear Algebra (DLA), a major building block for many scienti c computing applications. We motivate the need for new algorithms that would split the computation in a way that would fully exploit the power that each of the hybrid components o ers. As the area of hybrid multicore/GPU computing is still in its infancy, we also argue for its importance in view of what future architectures may look like. We therefore envision the need for a DLA library similar to LAPACK but for hybrid manycore/GPU systems. We illustrate the main ideas with an LUfactorization algorithm where particular techniques are used to reduce the amount of pivoting, resulting in an algorithm achieving up to 388 GFlop/s for single and up to 99:4 GFlop/s for double precision factorization on a hybrid Intel Xeon (2x4 cores @ 2.33 GHz) { NVIDIA GeForce GTX 280 (240 cores @ 1.30 GHz) system

    Some issues in dense linear algebra for multicore and special purpose architectures

    Get PDF
    We address some key issues in designing dense linear algebra (DLA) algorithms that are common for both multi/many-cores and special purpose architectures (in particular GPUs). We present them in the context of an LU factorization algorithm, where randomization techniques are used as an alternative to pivoting. This approach yields an algorithm based entirely on a collection of small Level 3 BLAS type computational tasks, which has emerged as a common goal in designing DLA algorithms for new architectures. Other common trends, also considered here, are block asynchronous task execution and “Block” layouts for the data associated with the separate tasks. We present numerical results and other specific experiments with DLA algorithms on NVIDIA GPUs using CUDA. The GPU results are also of interest themselves as we show a performance of up to 160 Glop/s on a single Quadro FX 5600 card

    Résolutions rapides et fiables pour les solveurs d'algèbre linéaire numérique en calcul haute performance.

