13 research outputs found

    Hierarchical approach for deriving a reproducible unblocked LU factorization

    Full text link
    [EN] We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The simulations were performed on resources provided by the Swed-ish National Infrastructure for Computing (SNIC) at PDC Centre for High Performance Computing (PDC-HPC). This work was also granted access to the HPC resources of The Institute for Scientific Computing and Simulation financed by Region Ile-de-France and the project Equip@Meso (reference ANR-10-EQPX-29-01) overseen by the French National Agency for Research (ANR) as part of the Investissements d Avenir pro-gram. This work was also partly supported by the FastRelax (ANR-14-CE25-0018-01) project of ANR.Iakymchuk, R.; Graillat, S.; Defour, D.; Quintana-Orti, ES. (2019). Hierarchical approach for deriving a reproducible unblocked LU factorization. International Journal of High Performance Computing Applications. 33(5):791-803. https://doi.org/10.1177/1094342019832968S791803335Arteaga, A., Fuhrer, O., & Hoefler, T. (2014). Designing Bit-Reproducible Portable High-Performance Applications. 2014 IEEE 28th International Parallel and Distributed Processing Symposium. doi:10.1109/ipdps.2014.127Bientinesi, P., Quintana-Ortí, E. S., & Geijn, R. A. van de. (2005). Representing linear algebra algorithms in code: the FLAME application program interfaces. ACM Transactions on Mathematical Software, 31(1), 27-59. doi:10.1145/1055531.1055533Chohra, C., Langlois, P., & Parello, D. (2016). Efficiency of Reproducible Level 1 BLAS. Lecture Notes in Computer Science, 99-108. doi:10.1007/978-3-319-31769-4_8Collange, S., Defour, D., Graillat, S., & Iakymchuk, R. (2015). Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Computing, 49, 83-97. doi:10.1016/j.parco.2015.09.001Demmel, J., & Hong Diep Nguyen. (2013). Fast Reproducible Floating-Point Summation. 2013 IEEE 21st Symposium on Computer Arithmetic. doi:10.1109/arith.2013.9Demmel, J., & Nguyen, H. D. (2015). Parallel Reproducible Summation. IEEE Transactions on Computers, 64(7), 2060-2070. doi:10.1109/tc.2014.2345391Dongarra, J. J., Du Croz, J., Hammarling, S., & Duff, I. S. (1990). A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1), 1-17. doi:10.1145/77626.79170Dongarra, J., Hittinger, J., Bell, J., Chacon, L., Falgout, R., Heroux, M., … Wild, S. (2014). Applied Mathematics Research for Exascale Computing. doi:10.2172/1149042Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., & Zimmermann, P. (2007). MPFR. ACM Transactions on Mathematical Software, 33(2), 13. doi:10.1145/1236463.1236468Haidar, A., Dong, T., Luszczek, P., Tomov, S., & Dongarra, J. (2015). Batched matrix computations on hardware accelerators based on GPUs. The International Journal of High Performance Computing Applications, 29(2), 193-208. doi:10.1177/1094342014567546Hida, Y., Li, X. S., & Bailey, D. H. (s. f.). Algorithms for quad-double precision floating point arithmetic. Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001. doi:10.1109/arith.2001.930115Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms. doi:10.1137/1.9780898718027Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2015). Reproducible Triangular Solvers for High-Performance Computing. 2015 12th International Conference on Information Technology - New Generations. doi:10.1109/itng.2015.63Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2016). Reproducible and Accurate Matrix Multiplication. Lecture Notes in Computer Science, 126-137. doi:10.1007/978-3-319-31769-4_11Kulisch, U., & Snyder, V. (2010). The exact dot product as basic tool for long interval arithmetic. Computing, 91(3), 307-313. doi:10.1007/s00607-010-0127-7Li, X. S., Demmel, J. W., Bailey, D. H., Henry, G., Hida, Y., Iskandar, J., … Yoo, D. J. (2002). Design, implementation and testing of extended and mixed precision BLAS. ACM Transactions on Mathematical Software, 28(2), 152-205. doi:10.1145/567806.567808Muller, J.-M., Brisebarre, N., de Dinechin, F., Jeannerod, C.-P., Lefèvre, V., Melquiond, G., … Torres, S. (2010). Handbook of Floating-Point Arithmetic. doi:10.1007/978-0-8176-4705-6Ogita, T., Rump, S. M., & Oishi, S. (2005). Accurate Sum and Dot Product. SIAM Journal on Scientific Computing, 26(6), 1955-1988. doi:10.1137/030601818Ortega, J. . (1988). The ijk forms of factorization methods I. Vector computers. Parallel Computing, 7(2), 135-147. doi:10.1016/0167-8191(88)90035-xRump, S. M. (2009). Ultimately Fast Accurate Summation. SIAM Journal on Scientific Computing, 31(5), 3466-3502. doi:10.1137/080738490Skeel, R. D. (1979). Scaling for Numerical Stability in Gaussian Elimination. Journal of the ACM, 26(3), 494-526. doi:10.1145/322139.322148Zhu, Y.-K., & Hayes, W. B. (2010). Algorithm 908. ACM Transactions on Mathematical Software, 37(3), 1-13. doi:10.1145/1824801.182481

    Brote nosocomial de sarampión

    Get PDF
    Desde 2010, España y el resto de la Unión Europea presentan niveles de incidencia de sarampión no observados en las dos últimas décadas. En la Comunidad Valenciana en junio del 2011 se notifica el primer caso de sarampión confirmado correspondiente al brote epidémico iniciado en la ciudad de Valencia. El 15 de agosto se produce la transmisión al personal sanitario iniciándose un brote nosocomial. Se diseñó un estudio longitudinal, anidado en el seguimiento de la situación epidémica, para casos incidentes vinculados al territorio hospitalario. Desde el 26 de junio hasta el cierre del brote nosocomial se produjeron 177 notificaciones de sarampión, que permitieron confirmar 151 casos. Cincuenta casos correspondieron al brote comunitario que dio origen a la situación epidémica; al brote nosocomial, 21 casos. El genotipo obtenido fue Paramixovirus D4. El estado inmunitario frente al sarampión era desconocido en el 52,38%. El personal afectado pertenecía al área de urgencias. La finalización de la situación epidémica fue consecuencia más de la eliminación natural de población susceptible que de medidas activas de protección. En nuestra opinión este brote pone de manifiesto que la extensión a la población general se produjo, en gran parte, como consecuencia del brote nosocomial

    Psychometric Properties and factor structure of the spanish version of the HC-PAIRS questionnaire

    Get PDF
    Objective To develop a Spanish version of the Health Care Providers" Pain and Impairment Relationship Scale (HC-PAIRS) and to test its psychometric properties. Methods A forward and backward translation methodology was used to translate the questionnaire, which was then applied to 206 participants (174physiotherapy students and 32 family physicians). The intraclass correlation coefficient was calculated to assess testretest reliability. Internal consistency was evaluated using Cronbach"s alpha and item analysis. Construct validity was measured using Pearson correlation coefficients between HC-PAIRS and FABQ, FABQ-Phys, FABQ-Work and the responses given by participants to three clinical case scenarios. An exploratory factor analysis was carried out following the Kaiser normalization criteria and principal axis factoring with an oblique rotation (quartimax). Sensitivity to change was assessed after a teaching module. Results Testretest reliability was ICC 0.50 (p\0.01)and Cronbach"s alpha was 0.825. The HC-PAIRS scores correlated significantly with the scores of the FABQ and also with the recommendations for work and activity given by the participants in the three clinical case scenarios. Sensitivity to change test showed an effect size of 1.5, which is considered a large change. Factor analysis suggests that the Spanish version of HC-PAIRS measures a unidimensional construct. Conclusion The Spanish version of the HC-PAIRS has proven to be a reliable, valid and sensitive instrument to assess health care providers" attitudes and beliefs about LBP. It can be used in evaluating clinical practice and in undergraduate acquisition of skills and knowledge

    Modeling power and energy of the task-parallel Cholesky factorization on multicore processors

    Full text link
    [EN] In this paper we introduce a model for the total energy consumption of the Cholesky factorization on a multicore processor. Our model assumes a task- parallel execution of the factorization process, with con- currency leveraged via a run-time as those recently pro- posed in projects like SMPSs, PLASMA or libflame, and decomposes the power usage into its uncore, static and dynamic components. A few simple experiments provide experimental data (parameters) with enough accuracy to assemble the model, which can then be used to estimate the actual power dissipation and en- ergy consumption of the global algorithm. Experimen- tal results on an 8-core platform equipped with Intel Xeon processors reveal the precision of the model.The authors were supported by the CICYT project TIN2011-23283 of the Ministerio de Economía y Competitividad and FEDERAlonso-Jordá, P.; Dolz Zaragozá, MF.; Mayo, R.; Quintana Ortí, ES. (2014). Modeling power and energy of the task-parallel Cholesky factorization on multicore processors. Computer Science - Research and Development. 29(2):105-112. https://doi.org/10.1007/s00450-012-0227-zS105112292The green500 list (2010). Available at http://www.green500.orgAliaga JI, Bollhoefer M, Martín A, Quintana-Ortí ES (2011) Exploiting thread-level parallelism in the iterative solution of sparse linear systems. Parallel Comput 37(3):183–202. doi: 10.1016/j.parco.2010.11.002AnandTech Forums: Power-consumption scaling with clockspeed and Vcc for the i7-2600K (2011). http://forums.anandtech.com/showthread.php?t=2195927Badia RM, Herrero JR, Labarta J, Pérez JM, Quintana-Ortí ES, Quintana-Ortí G (2009) Parallelizing dense and banded linear algebra libraries using SMPSs. Concurr Comput, Pract Exp 21:2438–2456Badia RM, Herrero JR, Labarta J, Pérez JM, Quintana-Ortí ES, Quintana-Ortí G (2009) Parallelizing dense and banded linear algebra libraries using SMPSs. Concurr Comput, Pract Exp 21(18):2438–2456Borkar S, Chien AA (2011) The future of microprocessors. Commun ACM 54(5):67–77Buttari A, Langou J, Kurzak J, Dongarra J (2009) A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput 35(1):38–53Cilk project home page. http://supertech.csail.mit.edu/cilk/Dongarra J et al. (2011) The international ExaScale software project roadmap. Int J High Perform Comput Appl 25(1):3–60. doi: 10.1177/1094342010391989Duranton M et al (2010) The HiPEAC vision. Available from http://www.hipeac.net/roadmapEsmaeilzadeh H, Blem E, St Amant R, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: Proc 38th annual int symp computer architecture, ISCA’11, pp 365–376Feng W, Feng X, Ce R (2008) Green supercomputing comes of age. IT Prof 10(1):17–23FLAME project home page. http://www.cs.utexas.edu/users/flame/Golub GH, Loan CFV (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, BaltimoreParaver: the flexible analysis tool. http://www.cepba.upc.es/paraverPLASMA project home page. http://icl.cs.utk.edu/plasma/Quintana-Ortí G, Quintana-Ortí ES, van de Geijn RA, Zee FGV, Chan E (2009) Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans Math Softw 36(3):14:1–14:26SMP superscalar project home page. http://www.bsc.es/plantillaG.php?cat_id=38
    corecore