67 research outputs found

    Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs

    Get PDF
    International audienceA low-rank approximation of a dense matrix plays an important role in many applications. To compute such an approximation , a common approach uses the QR factorization with column pivoting (QRCP). Though the reliability and efficiency of QRCP have been demonstrated, this determin-istic approach requires costly communication at each step of the factorization. Since such communication is becoming increasingly expensive on modern computers, an alternative approach based on random sampling, which can be implemented using communication-optimal kernels, is becoming attractive. To study its potential, in this paper, we compare the performance of random sampling with that of QRCP on an NVIDIA Kepler GPU. Our performance results demonstrate that random sampling can be up to 12.8× faster than the deterministic approach for computing the approximation of the same accuracy. We also present the parallel scaling of the random sampling over multiple GPUs on a single compute node, showing a speedup of 3.8× over three Kepler GPUs. These results demonstrate the potential of the random sampling as an excellent computational tool for many applications, and its potential is likely to grow on the emerging computers with the increasing communication costs

    The Forward Physics Facility at the High-Luminosity LHC

    Get PDF

    COVID-19 symptoms at hospital admission vary with age and sex: results from the ISARIC prospective multinational observational study

    Get PDF
    Background: The ISARIC prospective multinational observational study is the largest cohort of hospitalized patients with COVID-19. We present relationships of age, sex, and nationality to presenting symptoms. Methods: International, prospective observational study of 60 109 hospitalized symptomatic patients with laboratory-confirmed COVID-19 recruited from 43 countries between 30 January and 3 August 2020. Logistic regression was performed to evaluate relationships of age and sex to published COVID-19 case definitions and the most commonly reported symptoms. Results: ‘Typical’ symptoms of fever (69%), cough (68%) and shortness of breath (66%) were the most commonly reported. 92% of patients experienced at least one of these. Prevalence of typical symptoms was greatest in 30- to 60-year-olds (respectively 80, 79, 69%; at least one 95%). They were reported less frequently in children (≀ 18 years: 69, 48, 23; 85%), older adults (≄ 70 years: 61, 62, 65; 90%), and women (66, 66, 64; 90%; vs. men 71, 70, 67; 93%, each P < 0.001). The most common atypical presentations under 60 years of age were nausea and vomiting and abdominal pain, and over 60 years was confusion. Regression models showed significant differences in symptoms with sex, age and country. Interpretation: This international collaboration has allowed us to report reliable symptom data from the largest cohort of patients admitted to hospital with COVID-19. Adults over 60 and children admitted to hospital with COVID-19 are less likely to present with typical symptoms. Nausea and vomiting are common atypical presentations under 30 years. Confusion is a frequent atypical presentation of COVID-19 in adults over 60 years. Women are less likely to experience typical symptoms than men

    Solveurs multifrontaux exploitant des blocs de rang faible: complexité, performance et parallélisme

    Get PDF
    We investigate the use of low-rank approximations to reduce the cost of sparsedirect multifrontal solvers. Among the different matrix representations that havebeen proposed to exploit the low-rank property within multifrontal solvers, we focus on the Block Low-Rank (BLR) format whose simplicity and flexibility make iteasy to use in a general purpose, algebraic multifrontal solver. We present different variants of the BLR factorization, depending on how the low-rank updates areperformed and on the constraints to handle numerical pivoting.We first investigate the theoretical complexity of the BLR format which, unlikeother formats such as hierarchical ones, was previously unknown. We prove thatthe theoretical complexity of the BLR multifrontal factorization is asymptoticallylower than that of the full-rank solver. We then show how the BLR variants canfurther reduce that complexity. We provide an experimental study with numericalresults to support our complexity bounds.After proving that BLR multifrontal solvers can achieve a low complexity, weturn to the problem of translating that low complexity in actual performance gainson modern architectures. We first present a multithreaded BLR factorization, andanalyze its performance in shared-memory multicore environments on a large setof real-life problems. We put forward several algorithmic properties of the BLRvariants necessary to efficiently exploit multicore systems by improving the arithmetic intensity and the scalability of the BLR factorization. We then move on to thedistributed-memory BLR factorization, for which additional challenges are identified and addressed.The algorithms presented throughout this thesis have been implemented withinthe MUMPS solver. We illustrate the use of our approach in three industrial applications coming from geosciences and structural mechanics. We also compare oursolver with the STRUMPACK package, based on Hierarchically Semi-Separableapproximations. We conclude this thesis by reporting results on a very large problem (130 millions of unknowns) which illustrates future challenges posed by BLRmultifrontal solvers at scale.Nous nous intĂ©ressons Ă  l’utilisation d’approximations de rang faible pour rĂ©duire le coĂ»t des solveurs creux directs multifrontaux. Parmi les diffĂ©rents formatsmatriciels qui ont Ă©tĂ© proposĂ©s pour exploiter la propriĂ©tĂ© de rang faible dans lessolveurs multifrontaux, nous nous concentrons sur le format Block Low-Rank (BLR)dont la simplicitĂ© et la flexibilitĂ© permettent de l’utiliser facilement dans un solveurmultifrontal algĂ©brique et gĂ©nĂ©raliste. Nous prĂ©sentons diffĂ©rentes variantes de lafactorisation BLR, selon comment les mises Ă  jour de rang faible sont effectuĂ©es, etcomment le pivotage numĂ©rique est gĂ©rĂ©.D’abord, nous Ă©tudions la complexitĂ© thĂ©orique du format BLR qui, contrairement Ă  d’autres formats comme les formats hiĂ©rarchiques, Ă©tait inconnue jusqu’àprĂ©sent. Nous prouvons que la complexitĂ© thĂ©orique de la factorisation multifrontale BLR est asymptotiquement infĂ©rieure Ă  celle du solveur de rang plein. Nousmontrons ensuite comment les variantes BLR peuvent encore rĂ©duire cette complexitĂ©. Nous Ă©tayons nos bornes de complexitĂ© par une Ă©tude expĂ©rimentale.AprĂšs avoir montrĂ© que les solveurs multifrontaux BLR peuvent atteindre unefaible complexitĂ©, nous nous intĂ©ressons au problĂšme de la convertir en gains deperformance rĂ©els sur les architectures modernes. Nous prĂ©sentons d’abord unefactorisation BLR multithreadĂ©e, et analysons sa performance dans des environnements multicƓurs Ă  mĂ©moire partagĂ©e. Nous montrons que les variantes BLR sontcruciales pour exploiter efficacement les machines multicƓurs en amĂ©liorant l’intensitĂ© arithmĂ©tique et la scalabilitĂ© de la factorisation. Nous considĂ©rons ensuiteĂ  la factorisation BLR sur des architectures Ă  mĂ©moire distribuĂ©e.Les algorithmes prĂ©sentĂ©s dans cette thĂšse ont Ă©tĂ© implĂ©mentĂ©s dans le solveurMUMPS. Nous illustrons l’utilisation de notre approche dans trois applications industrielles provenant des gĂ©osciences et de la mĂ©canique des structures. Nous comparons Ă©galement notre solveur avec STRUMPACK, basĂ© sur des approximationsHierarchically Semi-Separable. Nous concluons cette thĂšse en rapportant un rĂ©sultat sur un problĂšme de trĂšs grande taille (130 millions d’inconnues) qui illustre lesfuturs dĂ©fis posĂ©s par le passage Ă  l’échelle des solveurs multifrontaux BLR

    Block low-rank multifrontal solvers : complexity, performance, and scalability

    No full text
    Nous nous intĂ©ressons Ă  l'utilisation d'approximations de rang faible pour rĂ©duire le coĂ»t des solveurs creux directs multifrontaux. Parmi les diffĂ©rents formats matriciels qui ont Ă©tĂ© proposĂ©s pour exploiter la propriĂ©tĂ© de rang faible dans les solveurs multifrontaux, nous nous concentrons sur le format Block Low-Rank (BLR) dont la simplicitĂ© et la flexibilitĂ© permettent de l'utiliser facilement dans un solveur multifrontal algĂ©brique et gĂ©nĂ©raliste. Nous prĂ©sentons diffĂ©rentes variantes de la factorisation BLR, selon comment les mises Ă  jour de rang faible sont effectuĂ©es, et comment le pivotage numĂ©rique est gĂ©rĂ©. D'abord, nous Ă©tudions la complexitĂ© thĂ©orique du format BLR qui, contrairement Ă  d'autres formats comme les formats hiĂ©rarchiques, Ă©tait inconnue jusqu'Ă  prĂ©sent. Nous prouvons que la complexitĂ© thĂ©orique de la factorisation multifrontale BLR est asymptotiquement infĂ©rieure Ă  celle du solveur de rang plein. Nous montrons ensuite comment les variantes BLR peuvent encore rĂ©duire cette complexitĂ©. Nous Ă©tayons nos bornes de complexitĂ© par une Ă©tude expĂ©rimentale. AprĂšs avoir montrĂ© que les solveurs multifrontaux BLR peuvent atteindre une faible complexitĂ©, nous nous intĂ©ressons au problĂšme de la convertir en gains de performance rĂ©els sur les architectures modernes. Nous prĂ©sentons d'abord une factorisation BLR multithreadĂ©e, et analysons sa performance dans des environnements multicƓurs Ă  mĂ©moire partagĂ©e. Nous montrons que les variantes BLR sont cruciales pour exploiter efficacement les machines multicƓurs en amĂ©liorant l'intensitĂ© arithmĂ©tique et la scalabilitĂ© de la factorisation. Nous considĂ©rons ensuite Ă  la factorisation BLR sur des architectures Ă  mĂ©moire distribuĂ©e. Les algorithmes prĂ©sentĂ©s dans cette thĂšse ont Ă©tĂ© implĂ©mentĂ©s dans le solveur MUMPS. Nous illustrons l'utilisation de notre approche dans trois applications industrielles provenant des gĂ©osciences et de la mĂ©canique des structures. Nous comparons Ă©galement notre solveur avec STRUMPACK, basĂ© sur des approximations Hierarchically Semi-Separable. Nous concluons cette thĂšse en rapportant un rĂ©sultat sur un problĂšme de trĂšs grande taille (130 millions d'inconnues) qui illustre les futurs dĂ©fis posĂ©s par le passage Ă  l'Ă©chelle des solveurs multifrontaux BLR.We investigate the use of low-rank approximations to reduce the cost of sparse direct multifrontal solvers. Among the different matrix representations that have been proposed to exploit the low-rank property within multifrontal solvers, we focus on the Block Low-Rank (BLR) format whose simplicity and flexibility make it easy to use in a general purpose, algebraic multifrontal solver. We present different variants of the BLR factorization, depending on how the low-rank updates are performed and on the constraints to handle numerical pivoting. We first investigate the theoretical complexity of the BLR format which, unlike other formats such as hierarchical ones, was previously unknown. We prove that the theoretical complexity of the BLR multifrontal factorization is asymptotically lower than that of the full-rank solver. We then show how the BLR variants can further reduce that complexity. We provide an experimental study with numerical results to support our complexity bounds. After proving that BLR multifrontal solvers can achieve a low complexity, we turn to the problem of translating that low complexity in actual performance gains on modern architectures. We first present a multithreaded BLR factorization, and analyze its performance in shared-memory multicore environments on a large set of real-life problems. We put forward several algorithmic properties of the BLR variants necessary to efficiently exploit multicore systems by improving the arithmetic intensity and the scalability of the BLR factorization. We then move on to the distributed-memory BLR factorization, for which additional challenges are identified and addressed. The algorithms presented throughout this thesis have been implemented within the MUMPS solver. We illustrate the use of our approach in three industrial applications coming from geosciences and structural mechanics. We also compare our solver with the STRUMPACK package, based on Hierarchically Semi-Separable approximations. We conclude this thesis by reporting results on a very large problem (130 millions of unknowns) which illustrates future challenges posed by BLR multifrontal solvers at scale

    Solveurs multifrontaux exploitant des blocs de rang faible : complexité, performance et parallélisme

    No full text
    We investigate the use of low-rank approximations to reduce the cost of sparse direct multifrontal solvers. Among the different matrix representations that have been proposed to exploit the low-rank property within multifrontal solvers, we focus on the Block Low-Rank (BLR) format whose simplicity and flexibility make it easy to use in a general purpose, algebraic multifrontal solver. We present different variants of the BLR factorization, depending on how the low-rank updates are performed and on the constraints to handle numerical pivoting. We first investigate the theoretical complexity of the BLR format which, unlike other formats such as hierarchical ones, was previously unknown. We prove that the theoretical complexity of the BLR multifrontal factorization is asymptotically lower than that of the full-rank solver. We then show how the BLR variants can further reduce that complexity. We provide an experimental study with numerical results to support our complexity bounds. After proving that BLR multifrontal solvers can achieve a low complexity, we turn to the problem of translating that low complexity in actual performance gains on modern architectures. We first present a multithreaded BLR factorization, and analyze its performance in shared-memory multicore environments on a large set of real-life problems. We put forward several algorithmic properties of the BLR variants necessary to efficiently exploit multicore systems by improving the arithmetic intensity and the scalability of the BLR factorization. We then move on to the distributed-memory BLR factorization, for which additional challenges are identified and addressed. The algorithms presented throughout this thesis have been implemented within the MUMPS solver. We illustrate the use of our approach in three industrial applications coming from geosciences and structural mechanics. We also compare our solver with the STRUMPACK package, based on Hierarchically Semi-Separable approximations. We conclude this thesis by reporting results on a very large problem (130 millions of unknowns) which illustrates future challenges posed by BLR multifrontal solvers at scale.Nous nous intĂ©ressons Ă  l'utilisation d'approximations de rang faible pour rĂ©duire le coĂ»t des solveurs creux directs multifrontaux. Parmi les diffĂ©rents formats matriciels qui ont Ă©tĂ© proposĂ©s pour exploiter la propriĂ©tĂ© de rang faible dans les solveurs multifrontaux, nous nous concentrons sur le format Block Low-Rank (BLR) dont la simplicitĂ© et la flexibilitĂ© permettent de l'utiliser facilement dans un solveur multifrontal algĂ©brique et gĂ©nĂ©raliste. Nous prĂ©sentons diffĂ©rentes variantes de la factorisation BLR, selon comment les mises Ă  jour de rang faible sont effectuĂ©es, et comment le pivotage numĂ©rique est gĂ©rĂ©. D'abord, nous Ă©tudions la complexitĂ© thĂ©orique du format BLR qui, contrairement Ă  d'autres formats comme les formats hiĂ©rarchiques, Ă©tait inconnue jusqu'Ă  prĂ©sent. Nous prouvons que la complexitĂ© thĂ©orique de la factorisation multifrontale BLR est asymptotiquement infĂ©rieure Ă  celle du solveur de rang plein. Nous montrons ensuite comment les variantes BLR peuvent encore rĂ©duire cette complexitĂ©. Nous Ă©tayons nos bornes de complexitĂ© par une Ă©tude expĂ©rimentale. AprĂšs avoir montrĂ© que les solveurs multifrontaux BLR peuvent atteindre une faible complexitĂ©, nous nous intĂ©ressons au problĂšme de la convertir en gains de performance rĂ©els sur les architectures modernes. Nous prĂ©sentons d'abord une factorisation BLR multithreadĂ©e, et analysons sa performance dans des environnements multicƓurs Ă  mĂ©moire partagĂ©e. Nous montrons que les variantes BLR sont cruciales pour exploiter efficacement les machines multicƓurs en amĂ©liorant l'intensitĂ© arithmĂ©tique et la scalabilitĂ© de la factorisation. Nous considĂ©rons ensuite Ă  la factorisation BLR sur des architectures Ă  mĂ©moire distribuĂ©e. Les algorithmes prĂ©sentĂ©s dans cette thĂšse ont Ă©tĂ© implĂ©mentĂ©s dans le solveur MUMPS. Nous illustrons l'utilisation de notre approche dans trois applications industrielles provenant des gĂ©osciences et de la mĂ©canique des structures. Nous comparons Ă©galement notre solveur avec STRUMPACK, basĂ© sur des approximations Hierarchically Semi-Separable. Nous concluons cette thĂšse en rapportant un rĂ©sultat sur un problĂšme de trĂšs grande taille (130 millions d'inconnues) qui illustre les futurs dĂ©fis posĂ©s par le passage Ă  l'Ă©chelle des solveurs multifrontaux BLR

    Code for reproducible research for the article "OPTIMAL QUANTIZATION OF RANK-ONE MATRICES IN FLOATING-POINT ARITHMETIC—WITH APPLICATIONS TO BUTTERFLY FACTORIZATIONS"

    No full text
    In the interest of reproducible research, this is exactly the version of the code used for numerical experiments in the paper "Optimal quantization of rank-one matrices in floating-point arithmetic—with applications to butterfly factorizations" by the same authors. Any update to the code will be made availabel on https://gitlab.inria.fr/ericciet/rank-1-quantizatio

    Code for reproducible research for the article "OPTIMAL QUANTIZATION OF RANK-ONE MATRICES IN FLOATING-POINT ARITHMETIC—WITH APPLICATIONS TO BUTTERFLY FACTORIZATIONS"

    No full text
    In the interest of reproducible research, this is exactly the version of the code used for numerical experiments in the paper "Optimal quantization of rank-one matrices in floating-point arithmetic—with applications to butterfly factorizations" by the same authors. Any update to the code will be made availabel on https://gitlab.inria.fr/ericciet/rank-1-quantizatio

    Code for reproducible research for the article "OPTIMAL QUANTIZATION OF RANK-ONE MATRICES IN FLOATING-POINT ARITHMETIC—WITH APPLICATIONS TO BUTTERFLY FACTORIZATIONS"

    No full text
    In the interest of reproducible research, this is exactly the version of the code used for numerical experiments in the paper "Optimal quantization of rank-one matrices in floating-point arithmetic—with applications to butterfly factorizations" by the same authors. Any update to the code will be made availabel on https://gitlab.inria.fr/ericciet/rank-1-quantizatio
    • 

    corecore