262 research outputs found

    Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

    Full text link
    The QR decomposition with column pivoting (QRP) of a matrix is widely used for rank revealing. The performance of LAPACK implementation (DGEQP3) of the Householder QRP algorithm is limited by Level 2 BLAS operations required for updating the column norms. In this paper, we propose an implementation of the QRP algorithm using a distribution of the matrix columns in a round-robin fashion for better data locality and parallel memory bus utilization on multicore architectures. Our performance results show a 60% improvement over the routine DGEQP3 of Intel MKL (version 10.3) on a 12 core Intel Xeon X5670 machine. In addition, we show that the same data distribution is also suitable for general purpose GPU processors, where our implementation obtains up to 90 GFlops on a NVIDIA GeForce GTX480. This is about 2 times faster than the QRP implementation of MAGMA (version 1.2.1).Tom ́as and Bai were supported in part by the U.S. DOES ciDAC grant DOE-DE-FC0206ER25793 and NSF grant PHY1005502. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC02-05CH11231.Tomás Domínguez, AE.; Bai, Z.; Hernández García, V. (2013). Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors. En High Performance Computing for Computational Science - VECPAR 2012. Springer Verlag (Germany): Series. 50-58. https://doi.org/10.1007/978-3-642-38718-0_8S5058Bischof, C.H.: A parallel QR factorization algorithm with controlled local pivoting. SIAM J. Sci. Stat. Comput. 12, 36–57 (1991)Chandrasekaran, S., Ipsen, I.C.F.: On rank-revealing factorisations. SIAM J. Matrix Anal. Appl. 15, 592–622 (1994)Castaldo, A.M., Whaley, R.C.: Scaling LAPACK panel operations using parallel cache assignment. In: 15th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, pp. 223–231 (2010)Drmač, Z., Bujanović, Z.: On the failure of rank-revealing QR factorization software – a case study. ACM Trans. Math. Softw. 35, 12:1–12:28 (2008)Drmač, Z., Veselić, K.: New fast and accurate Jacobi SVD algorithm I. SIAM J. Matrix Anal. Appl. 29, 1322–1342 (2008)Drmač, Z., Veselić, K.: New fast and accurate Jacobi SVD algorithm II. SIAM J. Matrix Anal. Appl. 29, 1343–1362 (2008)Golub, G.H.: Numerical methods for solving linear least squares problems. Numer. Math. 7, 206–216 (1965)Gu, M., Eisenstat, S.: Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM J. Sci. Comput. 17, 848–869 (1996)Quintana-Orti, G., Sun, X., Bischof, C.H.: A BLAS-3 version of the QR factorization with column pivoting. SIAM J. Sci. Comput. 19, 1486–1494 (1998)Schreiber, R., van Loan, C.: A storage-efficient WY representation for products of Householder transformations. SIAM J. Sci. Stat. Comput. 10, 53–57 (1989

    Interval Slopes as Numerical Abstract Domain for Floating-Point Variables

    Full text link
    The design of embedded control systems is mainly done with model-based tools such as Matlab/Simulink. Numerical simulation is the central technique of development and verification of such tools. Floating-point arithmetic, that is well-known to only provide approximated results, is omnipresent in this activity. In order to validate the behaviors of numerical simulations using abstract interpretation-based static analysis, we present, theoretically and with experiments, a new partially relational abstract domain dedicated to floating-point variables. It comes from interval expansion of non-linear functions using slopes and it is able to mimic all the behaviors of the floating-point arithmetic. Hence it is adapted to prove the absence of run-time errors or to analyze the numerical precision of embedded control systems

    A weakly stable algorithm for general Toeplitz systems

    Full text link
    We show that a fast algorithm for the QR factorization of a Toeplitz or Hankel matrix A is weakly stable in the sense that R^T.R is close to A^T.A. Thus, when the algorithm is used to solve the semi-normal equations R^T.Rx = A^Tb, we obtain a weakly stable method for the solution of a nonsingular Toeplitz or Hankel linear system Ax = b. The algorithm also applies to the solution of the full-rank Toeplitz or Hankel least squares problem.Comment: 17 pages. An old Technical Report with postscript added. For further details, see http://wwwmaths.anu.edu.au/~brent/pub/pub143.htm

    Measurement of the polarisation of W bosons produced with large transverse momentum in pp collisions at sqrt(s) = 7 TeV with the ATLAS experiment

    Get PDF
    This paper describes an analysis of the angular distribution of W->enu and W->munu decays, using data from pp collisions at sqrt(s) = 7 TeV recorded with the ATLAS detector at the LHC in 2010, corresponding to an integrated luminosity of about 35 pb^-1. Using the decay lepton transverse momentum and the missing transverse energy, the W decay angular distribution projected onto the transverse plane is obtained and analysed in terms of helicity fractions f0, fL and fR over two ranges of W transverse momentum (ptw): 35 < ptw < 50 GeV and ptw > 50 GeV. Good agreement is found with theoretical predictions. For ptw > 50 GeV, the values of f0 and fL-fR, averaged over charge and lepton flavour, are measured to be : f0 = 0.127 +/- 0.030 +/- 0.108 and fL-fR = 0.252 +/- 0.017 +/- 0.030, where the first uncertainties are statistical, and the second include all systematic effects.Comment: 19 pages plus author list (34 pages total), 9 figures, 11 tables, revised author list, matches European Journal of Physics C versio

    Observation of a new chi_b state in radiative transitions to Upsilon(1S) and Upsilon(2S) at ATLAS

    Get PDF
    The chi_b(nP) quarkonium states are produced in proton-proton collisions at the Large Hadron Collider (LHC) at sqrt(s) = 7 TeV and recorded by the ATLAS detector. Using a data sample corresponding to an integrated luminosity of 4.4 fb^-1, these states are reconstructed through their radiative decays to Upsilon(1S,2S) with Upsilon->mu+mu-. In addition to the mass peaks corresponding to the decay modes chi_b(1P,2P)->Upsilon(1S)gamma, a new structure centered at a mass of 10.530+/-0.005 (stat.)+/-0.009 (syst.) GeV is also observed, in both the Upsilon(1S)gamma and Upsilon(2S)gamma decay modes. This is interpreted as the chi_b(3P) system.Comment: 5 pages plus author list (18 pages total), 2 figures, 1 table, corrected author list, matches final version in Physical Review Letter

    Search for displaced vertices arising from decays of new heavy particles in 7 TeV pp collisions at ATLAS

    Get PDF
    We present the results of a search for new, heavy particles that decay at a significant distance from their production point into a final state containing charged hadrons in association with a high-momentum muon. The search is conducted in a pp-collision data sample with a center-of-mass energy of 7 TeV and an integrated luminosity of 33 pb^-1 collected in 2010 by the ATLAS detector operating at the Large Hadron Collider. Production of such particles is expected in various scenarios of physics beyond the standard model. We observe no signal and place limits on the production cross-section of supersymmetric particles in an R-parity-violating scenario as a function of the neutralino lifetime. Limits are presented for different squark and neutralino masses, enabling extension of the limits to a variety of other models.Comment: 8 pages plus author list (20 pages total), 8 figures, 1 table, final version to appear in Physics Letters
    corecore