Search CORE

12 research outputs found

Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures

Author: Benson Austin R.
Demmel James
Gleich David F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/01/2013
Field of study

The QR factorization and the SVD are two fundamental matrix decompositions with applications throughout scientific computing and data analysis. For matrices with many more rows than columns, so-called "tall-and-skinny matrices," there is a numerically stable, efficient, communication-avoiding algorithm for computing the QR factorization. It has been used in traditional high performance computing and grid computing environments. For MapReduce environments, existing methods to compute the QR decomposition use a numerically unstable approach that relies on indirectly computing the Q factor. In the best case, these methods require only two passes over the data. In this paper, we describe how to compute a stable tall-and-skinny QR factorization on a MapReduce architecture in only slightly more than 2 passes over the data. We can compute the SVD with only a small change and no difference in performance. We present a performance comparison between our new direct TSQR method, a standard unstable implementation for MapReduce (Cholesky QR), and the classic stable algorithm implemented for MapReduce (Householder QR). We find that our new stable method has a large performance advantage over the Householder QR method. This holds both in a theoretical performance model as well as in an actual implementation

arXiv.org e-Print Archive

Crossref

Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors

Author: Quintana-Ortí Enrique S.
Tomás Domínguez Andrés Enrique
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/01/2020
Field of study

[EN] We present a novel method for the QR factorization of large tall-and-skinny matrices that introduces an approximation technique for computing the Householder vectors. This approach is very competitive on a hybrid platform equipped with a graphics processor, with a performance advantage over the conventional factorization due to the reduced amount of data transfers between the graphics accelerator and the main memory of the host. Our experiments show that, for tall¿skinny matrices, the new approach outperforms the code in MAGMA by a large margin, while it is very competitive for square matrices when the memory transfers and CPU computations are the bottleneck of the Householder QR factorizationThis research was supported by the Project TIN2017-82972-R from the MINECO (Spain) and the EU H2020 Project 732631 "OPRECOMP. Open Transprecision Computing".Tomás Domínguez, AE.; Quintana-Ortí, ES. (2020). Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors. The Journal of Supercomputing (Online). 76(11):8771-8786. https://doi.org/10.1007/s11227-020-03176-3S877187867611Abdelfattah A, Haidar A, Tomov S, Dongarra J (2018) Analysis and design techniques towards high-performance and energy-efficient dense linear solvers on GPUs. IEEE Trans Parallel Distrib Syst 29(12):2700–2712. https://doi.org/10.1109/TPDS.2018.2842785Ballard G, Demmel J, Grigori L, Jacquelin M, Knight N, Nguyen H (2015) Reconstructing Householder vectors from tall-skinny QR. J Parallel Distrib Comput 85:3–31. https://doi.org/10.1016/j.jpdc.2015.06.003Barrachina S, Castillo M, Igual FD, Mayo R, Quintana-Ortí ES (2008) Solving dense linear systems on graphics processors. In: Luque E, Margalef T, Benítez D (eds) Euro-Par 2008—parallel processing. Springer, Heidelberg, pp 739–748Benson AR, Gleich DF, Demmel J (2013) Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures. In: 2013 IEEE International Conference on Big Data, pp 264–272. https://doi.org/10.1109/BigData.2013.6691583Businger P, Golub GH (1965) Linear least squares solutions by householder transformations. Numer Math 7(3):269–276. https://doi.org/10.1007/BF01436084Demmel J, Grigori L, Hoemmen M, Langou J (2012) Communication-optimal parallel and sequential QR and LU factorizations. SIAM J Sci Comput 34(1):206–239. https://doi.org/10.1137/080731992Dongarra J, Du Croz J, Hammarling S, Duff IS (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17. https://doi.org/10.1145/77626.79170Drmač Z, Bujanović Z (2008) On the failure of rank-revealing qr factorization software—a case study. ACM Trans Math Softw 35(2):12:1–12:28. https://doi.org/10.1145/1377612.1377616Fukaya T, Nakatsukasa Y, Yanagisawa Y, Yamamoto Y (2014) CholeskyQR2: A simple and communication-avoiding algorithm for computing a tall-skinny QR factorization on a large-scale parallel system. In: 2014 5th workshop on latest advances in scalable algorithms for large-scale systems, pp 31–38. https://doi.org/10.1109/ScalA.2014.11Fukaya T, Kannan R, Nakatsukasa Y, Yamamoto Y, Yanagisawa Y (2018) Shifted CholeskyQR for computing the QR factorization of ill-conditioned matrices, arXiv:1809.11085Golub G, Van Loan C (2013) Matrix computations. Johns Hopkins studies in the mathematical sciences. Johns Hopkins University Press, BaltimoreGunter BC, van de Geijn RA (2005) Parallel out-of-core computation and updating the QR factorization. ACM Trans Math Softw 31(1):60–78. https://doi.org/10.1145/1055531.1055534Joffrain T, Low TM, Quintana-Ortí ES, Rvd Geijn, Zee FGV (2006) Accumulating householder transformations, revisited. ACM Trans Math Softw 32(2):169–179. https://doi.org/10.1145/1141885.1141886Puglisi C (1992) Modification of the householder method based on the compact WY representation. SIAM J Sci Stat Comput 13(3):723–726. https://doi.org/10.1137/0913042Saad Y (2003) Iterative methods for sparse linear systems, 3rd edn. Society for Industrial and Applied Mathematics, PhiladelphiaSchreiber R, Van Loan C (1989) A storage-efficient WY representation for products of householder transformations. SIAM J Sci Comput 10(1):53–57. https://doi.org/10.1137/0910005Stathopoulos A, Wu K (2001) A block orthogonalization procedure with constant synchronization requirements. SIAM J Sci Comput 23(6):2165–2182. https://doi.org/10.1137/S1064827500370883Strazdins P (1998) A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Tech. Rep. TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, AustraliaTomás Dominguez AE, Quintana Orti ES (2018) Fast blocking of householder reflectors on graphics processors. In: 2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 385–393. https://doi.org/10.1109/PDP2018.2018.00068Volkov V, Demmel JW (2008) LU, QR and Cholesky factorizations using vector capabilities of GPUs. Tech. Rep. 202, LAPACK Working Note. http://www.netlib.org/lapack/lawnspdf/lawn202.pdfYamamoto Y, Nakatsukasa Y, Yanagisawa Y, Fukaya T (2015) Roundoff error analysis of the Cholesky QR2 algorithm. Electron Trans Numer Anal 44:306–326Yamazaki I, Tomov S, Dongarra J (2015) Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs. SIAM J Sci Comput 37(3):C307–C330. https://doi.org/10.1137/14M097377

Repositori Institucional de la Universitat Jaume I

RiuNet

Fully Distributed Robust Singular Value Decomposition

Author: Hegedűs István
Jelasity Márk
Kocsis Levente
Benczúr András, ifj.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Articulated Funiculator is a new and innovative concept developed by Tyréns forachieving a more efficient vertical transportation with a higher space utilization.Having a variety of merits, i.e.: simple construction, direct electromagneticthrust propulsion, and high safety and reliability in contrast to rotary inductionmotor, linear induction motor (LIM) is considered to be one of the cases as thepropulsion system for Articulated Funiculator. The thesis is then carried outwith the purpose of determining the feasibility of this study case by designing theLIMs meeting some specific requirements. The detailed requirements include: aset of identical LIMs are required to jointly produce the thrust that is sufficientto vertically raise the moving system up to 2 m/s2; the size of the LIMs cannotexceed the specification of the funiculator; the maximum flux density in the airgap for each LIM is kept slightly below 0.6 T; no iron saturation of any part ofthe LIMs is allowed.In this thesis report, an introduction of LIM is firstly presented. Followingthe introduction, relevant literature has been reviewed for a strengthenedtheoretical fundamentals and a better understanding of LIM’s history and applications. A general classification of LIMs is subsequently introduced. In addtion,an analytical model of the single-sided linear induction motor (SLIM) has beenbuilt based on an approximate equivalent circuit, and the preliminary geometryof the SLIM is thereby obtained. In order to acquire a more comprehensiveunderstanding of the machine characteristics and a more precise SLIM design, atwo-dimensional finite element method (2D-FEM) analysis is performed initiallyaccording to the preliminary geometry. The results, unfortunately, turn out tobe iron severely saturated in the teeth and yoke, and a excessive maximumvalue of air-gap flux density. Specific to the problems, different parameters ofthe SLIM are marginally adjusted and a series of design scenarios are run inFlux2D for 8-pole and 6-pole SLIM. The comparisons between the results areconducted and the final solution is lastly chosen among them.Articulated Funiculator är ett nytt och innovativt koncept som utvecklats av Tyréns för att möjilggöra en mer effektiv vertikal transport och bättre utnyttjautrymme. Tack vare fördelar såsom en enkel konstruktion, direkt elektromagnetiskdragkraftsframdrivning, samt hög säkerhet och tillförlitlighet i motsatstill roterande induktionsmotor, är en linjär induktionsmotor (LIM) aktuell somframdrivningssystem. Detta examensarbete är utfört med syfte att utforma enLIM för att uppfylla vissa specifika krav. De detaljerade kraven inkluderar: enuppsättning identiska LIM krävs för att gemensamt producera tillräcklig dragkraftför att vertikalt höja det rörliga systemet upp till 2 m/s2; storleken påLIM får inte överstiga specifikation; den maximala flödestätheten i luftgapet förvarje LIM hålls är begränsad till knappt 0.6 T; ingen järnmättnad av någon delav LIM är tillåtet. I denna rapport ges först en introduktion av LIM-konceptet. Efter introduktionenhar relevant litteratur granskats för att stärka teoretiska grundkunskapersamt ge en bättre belysning av historiken kring LIMs samt dess applikationer. Utöver detta har en analytisk modell av den ensidiga linjära induktionsmotorn(SLIM) byggts, baserat på en ungefärlig ekvivalent krets med vilket den preliminärageometrin för SLIM. För att erhålla en mer grundläggande förståelse avmaskinens egenskaper är en tvådimensionell analys med finita elementmetoden(2D-FEM) utförd, initialt med användande av en preliminär geometri erhållenmed hjälp av analytisk dimensionering. Resultaten från dessa simuleringar visadedock att järnet mättats kraftigt i både tänderna och oket och ett överdrivetstort maximivärde av luftgapets flödestäthet erhålls. Specifikt för applikationenjusteras olika parametrar och en rad driftscenarier körs i Flux2D för en 8-poligoch en 6-polig SLIM. En slutgiltig jämförelse mellan de olika maskindesignernapresenteras och den rekommenderade lösningen väljs slutligen

Publikationer från KTH

Crossref

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

SZTAKI Publication Repository

Digitala Vetenskapliga Arkivet - Academic Archive On-line

A PARALLEL AND DISTRIBUTED FRAMEWORK FOR IMPLEMENTING BACK PROPAGATION ALGORITHM IN BIG DATA

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

Random projections for Bayesian regression

Author: Geppert Leo N.
Ickstadt Katja
Munteanu Alexander
Quedenfeld Jens
Sohler Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/11/2015
Field of study

This article deals with random projections applied as a data reduction technique for Bayesian regression analysis. We show sufficient conditions under which the entire

d

-dimensional distribution is approximately preserved under random projections by reducing the number of data points from

n

k\in O(\operatorname{poly}(d/\varepsilon))

in the case

n\gg d

. Under mild assumptions, we prove that evaluating a Gaussian likelihood function based on the projected data instead of the original data yields a

(1+O(\varepsilon))

-approximation in terms of the

\ell_2

Wasserstein distance. Our main result shows that the posterior distribution of Bayesian linear regression is approximated up to a small error depending on only an

\varepsilon

-fraction of its defining parameters. This holds when using arbitrary Gaussian priors or the degenerate case of uniform distributions over

\mathbb{R}^d

for

\beta

. Our empirical evaluations involve different simulated settings of Bayesian linear regression. Our experiments underline that the proposed method is able to recover the regression model up to small error while considerably reducing the total running time

arXiv.org e-Print Archive

Springer - Publisher Connector