12 research outputs found

    Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures

    Full text link
    The QR factorization and the SVD are two fundamental matrix decompositions with applications throughout scientific computing and data analysis. For matrices with many more rows than columns, so-called "tall-and-skinny matrices," there is a numerically stable, efficient, communication-avoiding algorithm for computing the QR factorization. It has been used in traditional high performance computing and grid computing environments. For MapReduce environments, existing methods to compute the QR decomposition use a numerically unstable approach that relies on indirectly computing the Q factor. In the best case, these methods require only two passes over the data. In this paper, we describe how to compute a stable tall-and-skinny QR factorization on a MapReduce architecture in only slightly more than 2 passes over the data. We can compute the SVD with only a small change and no difference in performance. We present a performance comparison between our new direct TSQR method, a standard unstable implementation for MapReduce (Cholesky QR), and the classic stable algorithm implemented for MapReduce (Householder QR). We find that our new stable method has a large performance advantage over the Householder QR method. This holds both in a theoretical performance model as well as in an actual implementation

    Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors

    Full text link
    [EN] We present a novel method for the QR factorization of large tall-and-skinny matrices that introduces an approximation technique for computing the Householder vectors. This approach is very competitive on a hybrid platform equipped with a graphics processor, with a performance advantage over the conventional factorization due to the reduced amount of data transfers between the graphics accelerator and the main memory of the host. Our experiments show that, for tall¿skinny matrices, the new approach outperforms the code in MAGMA by a large margin, while it is very competitive for square matrices when the memory transfers and CPU computations are the bottleneck of the Householder QR factorizationThis research was supported by the Project TIN2017-82972-R from the MINECO (Spain) and the EU H2020 Project 732631 "OPRECOMP. Open Transprecision Computing".Tomás Domínguez, AE.; Quintana-Ortí, ES. (2020). Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors. The Journal of Supercomputing (Online). 76(11):8771-8786. https://doi.org/10.1007/s11227-020-03176-3S877187867611Abdelfattah A, Haidar A, Tomov S, Dongarra J (2018) Analysis and design techniques towards high-performance and energy-efficient dense linear solvers on GPUs. IEEE Trans Parallel Distrib Syst 29(12):2700–2712. https://doi.org/10.1109/TPDS.2018.2842785Ballard G, Demmel J, Grigori L, Jacquelin M, Knight N, Nguyen H (2015) Reconstructing Householder vectors from tall-skinny QR. J Parallel Distrib Comput 85:3–31. https://doi.org/10.1016/j.jpdc.2015.06.003Barrachina S, Castillo M, Igual FD, Mayo R, Quintana-Ortí ES (2008) Solving dense linear systems on graphics processors. In: Luque E, Margalef T, Benítez D (eds) Euro-Par 2008—parallel processing. Springer, Heidelberg, pp 739–748Benson AR, Gleich DF, Demmel J (2013) Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures. In: 2013 IEEE International Conference on Big Data, pp 264–272. https://doi.org/10.1109/BigData.2013.6691583Businger P, Golub GH (1965) Linear least squares solutions by householder transformations. Numer Math 7(3):269–276. https://doi.org/10.1007/BF01436084Demmel J, Grigori L, Hoemmen M, Langou J (2012) Communication-optimal parallel and sequential QR and LU factorizations. SIAM J Sci Comput 34(1):206–239. https://doi.org/10.1137/080731992Dongarra J, Du Croz J, Hammarling S, Duff IS (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17. https://doi.org/10.1145/77626.79170Drmač Z, Bujanović Z (2008) On the failure of rank-revealing qr factorization software—a case study. ACM Trans Math Softw 35(2):12:1–12:28. https://doi.org/10.1145/1377612.1377616Fukaya T, Nakatsukasa Y, Yanagisawa Y, Yamamoto Y (2014) CholeskyQR2: A simple and communication-avoiding algorithm for computing a tall-skinny QR factorization on a large-scale parallel system. In: 2014 5th workshop on latest advances in scalable algorithms for large-scale systems, pp 31–38. https://doi.org/10.1109/ScalA.2014.11Fukaya T, Kannan R, Nakatsukasa Y, Yamamoto Y, Yanagisawa Y (2018) Shifted CholeskyQR for computing the QR factorization of ill-conditioned matrices, arXiv:1809.11085Golub G, Van Loan C (2013) Matrix computations. Johns Hopkins studies in the mathematical sciences. Johns Hopkins University Press, BaltimoreGunter BC, van de Geijn RA (2005) Parallel out-of-core computation and updating the QR factorization. ACM Trans Math Softw 31(1):60–78. https://doi.org/10.1145/1055531.1055534Joffrain T, Low TM, Quintana-Ortí ES, Rvd Geijn, Zee FGV (2006) Accumulating householder transformations, revisited. ACM Trans Math Softw 32(2):169–179. https://doi.org/10.1145/1141885.1141886Puglisi C (1992) Modification of the householder method based on the compact WY representation. SIAM J Sci Stat Comput 13(3):723–726. https://doi.org/10.1137/0913042Saad Y (2003) Iterative methods for sparse linear systems, 3rd edn. Society for Industrial and Applied Mathematics, PhiladelphiaSchreiber R, Van Loan C (1989) A storage-efficient WY representation for products of householder transformations. SIAM J Sci Comput 10(1):53–57. https://doi.org/10.1137/0910005Stathopoulos A, Wu K (2001) A block orthogonalization procedure with constant synchronization requirements. SIAM J Sci Comput 23(6):2165–2182. https://doi.org/10.1137/S1064827500370883Strazdins P (1998) A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Tech. Rep. TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, AustraliaTomás Dominguez AE, Quintana Orti ES (2018) Fast blocking of householder reflectors on graphics processors. In: 2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 385–393. https://doi.org/10.1109/PDP2018.2018.00068Volkov V, Demmel JW (2008) LU, QR and Cholesky factorizations using vector capabilities of GPUs. Tech. Rep. 202, LAPACK Working Note. http://www.netlib.org/lapack/lawnspdf/lawn202.pdfYamamoto Y, Nakatsukasa Y, Yanagisawa Y, Fukaya T (2015) Roundoff error analysis of the Cholesky QR2 algorithm. Electron Trans Numer Anal 44:306–326Yamazaki I, Tomov S, Dongarra J (2015) Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs. SIAM J Sci Comput 37(3):C307–C330. https://doi.org/10.1137/14M097377

    Fully Distributed Robust Singular Value Decomposition

    Get PDF
    Articulated Funiculator is a new and innovative concept developed by TyrĂ©ns forachieving a more efficient vertical transportation with a higher space utilization.Having a variety of merits, i.e.: simple construction, direct electromagneticthrust propulsion, and high safety and reliability in contrast to rotary inductionmotor, linear induction motor (LIM) is considered to be one of the cases as thepropulsion system for Articulated Funiculator. The thesis is then carried outwith the purpose of determining the feasibility of this study case by designing theLIMs meeting some specific requirements. The detailed requirements include: aset of identical LIMs are required to jointly produce the thrust that is sufficientto vertically raise the moving system up to 2 m/s2; the size of the LIMs cannotexceed the specification of the funiculator; the maximum flux density in the airgap for each LIM is kept slightly below 0.6 T; no iron saturation of any part ofthe LIMs is allowed.In this thesis report, an introduction of LIM is firstly presented. Followingthe introduction, relevant literature has been reviewed for a strengthenedtheoretical fundamentals and a better understanding of LIM’s history and applications. A general classification of LIMs is subsequently introduced. In addtion,an analytical model of the single-sided linear induction motor (SLIM) has beenbuilt based on an approximate equivalent circuit, and the preliminary geometryof the SLIM is thereby obtained. In order to acquire a more comprehensiveunderstanding of the machine characteristics and a more precise SLIM design, atwo-dimensional finite element method (2D-FEM) analysis is performed initiallyaccording to the preliminary geometry. The results, unfortunately, turn out tobe iron severely saturated in the teeth and yoke, and a excessive maximumvalue of air-gap flux density. Specific to the problems, different parameters ofthe SLIM are marginally adjusted and a series of design scenarios are run inFlux2D for 8-pole and 6-pole SLIM. The comparisons between the results areconducted and the final solution is lastly chosen among them.Articulated Funiculator Ă€r ett nytt och innovativt koncept som utvecklats av TyrĂ©ns för att möjilggöra en mer effektiv vertikal transport och bĂ€ttre utnyttjautrymme. Tack vare fördelar sĂ„som en enkel konstruktion, direkt elektromagnetiskdragkraftsframdrivning, samt hög sĂ€kerhet och tillförlitlighet i motsatstill roterande induktionsmotor, Ă€r en linjĂ€r induktionsmotor (LIM) aktuell somframdrivningssystem. Detta examensarbete Ă€r utfört med syfte att utforma enLIM för att uppfylla vissa specifika krav. De detaljerade kraven inkluderar: enuppsĂ€ttning identiska LIM krĂ€vs för att gemensamt producera tillrĂ€cklig dragkraftför att vertikalt höja det rörliga systemet upp till 2 m/s2; storleken pĂ„LIM fĂ„r inte överstiga specifikation; den maximala flödestĂ€theten i luftgapet förvarje LIM hĂ„lls Ă€r begrĂ€nsad till knappt 0.6 T; ingen jĂ€rnmĂ€ttnad av nĂ„gon delav LIM Ă€r tillĂ„tet. I denna rapport ges först en introduktion av LIM-konceptet. Efter introduktionenhar relevant litteratur granskats för att stĂ€rka teoretiska grundkunskapersamt ge en bĂ€ttre belysning av historiken kring LIMs samt dess applikationer. Utöver detta har en analytisk modell av den ensidiga linjĂ€ra induktionsmotorn(SLIM) byggts, baserat pĂ„ en ungefĂ€rlig ekvivalent krets med vilket den preliminĂ€rageometrin för SLIM. För att erhĂ„lla en mer grundlĂ€ggande förstĂ„else avmaskinens egenskaper Ă€r en tvĂ„dimensionell analys med finita elementmetoden(2D-FEM) utförd, initialt med anvĂ€ndande av en preliminĂ€r geometri erhĂ„llenmed hjĂ€lp av analytisk dimensionering. Resultaten frĂ„n dessa simuleringar visadedock att jĂ€rnet mĂ€ttats kraftigt i bĂ„de tĂ€nderna och oket och ett överdrivetstort maximivĂ€rde av luftgapets flödestĂ€thet erhĂ„lls. Specifikt för applikationenjusteras olika parametrar och en rad driftscenarier körs i Flux2D för en 8-poligoch en 6-polig SLIM. En slutgiltig jĂ€mförelse mellan de olika maskindesignernapresenteras och den rekommenderade lösningen vĂ€ljs slutligen

    Random projections for Bayesian regression

    Get PDF
    This article deals with random projections applied as a data reduction technique for Bayesian regression analysis. We show sufficient conditions under which the entire dd-dimensional distribution is approximately preserved under random projections by reducing the number of data points from nn to k∈O(poly⁥(d/Δ))k\in O(\operatorname{poly}(d/\varepsilon)) in the case n≫dn\gg d. Under mild assumptions, we prove that evaluating a Gaussian likelihood function based on the projected data instead of the original data yields a (1+O(Δ))(1+O(\varepsilon))-approximation in terms of the ℓ2\ell_2 Wasserstein distance. Our main result shows that the posterior distribution of Bayesian linear regression is approximated up to a small error depending on only an Δ\varepsilon-fraction of its defining parameters. This holds when using arbitrary Gaussian priors or the degenerate case of uniform distributions over Rd\mathbb{R}^d for ÎČ\beta. Our empirical evaluations involve different simulated settings of Bayesian linear regression. Our experiments underline that the proposed method is able to recover the regression model up to small error while considerably reducing the total running time
    corecore