    Hierarchical approach for deriving a reproducible unblocked LU factorization

    [EN] We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The simulations were performed on resources provided by the Swed-ish National Infrastructure for Computing (SNIC) at PDC Centre for High Performance Computing (PDC-HPC). This work was also granted access to the HPC resources of The Institute for Scientific Computing and Simulation financed by Region Ile-de-France and the project Equip@Meso (reference ANR-10-EQPX-29-01) overseen by the French National Agency for Research (ANR) as part of the Investissements d Avenir pro-gram. This work was also partly supported by the FastRelax (ANR-14-CE25-0018-01) project of ANR. Representing linear algebra algorithms in code: the FLAME application program interfaces. ACM Transactions on Mathematical Software, 31(1), 27-59. doi:10.1145/1055531.1055533Chohra, C., Langlois, P., & Parello, D. (2016). Efficiency of Reproducible Level 1 BLAS. Lecture Notes in Computer Science, 99-108. doi:10.1007/978-3-319-31769-4_8Collange, S., Defour, D., Graillat, S., & Iakymchuk, R. (2015). Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Computing, 49, 83-97. doi:10.1016/j.parco.2015.09.001Demmel, J., & Hong Diep Nguyen. (2013). Fast Reproducible Floating-Point Summation. 2013 IEEE 21st Symposium on Computer Arithmetic. doi:10.1109/arith.2013.9Demmel, J., & Nguyen, H. D. (2015). Parallel Reproducible Summation. IEEE Transactions on Computers, 64(7), 2060-2070. doi:10.1109/tc.2014.2345391Dongarra, J. J., Du Croz, J., Hammarling, S., & Duff, I. S. (1990). A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1), 1-17. doi:10.1145/77626.79170Dongarra, J., Hittinger, J., Bell, J., Chacon, L., Falgout, R., Heroux, M., … Wild, S. (2014). Applied Mathematics Research for Exascale Computing. doi:10.2172/1149042Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., & Zimmermann, P. (2007). MPFR. ACM Transactions on Mathematical Software, 33(2), 13. doi:10.1145/1236463.1236468Haidar, A., Dong, T., Luszczek, P., Tomov, S., & Dongarra, J. (2015). Batched matrix computations on hardware accelerators based on GPUs. The International Journal of High Performance Computing Applications, 29(2), 193-208. doi:10.1177/1094342014567546Hida, Y., Li, X. S., & Bailey, D. H. (s. f.). Algorithms for quad-double precision floating point arithmetic. Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001. doi:10.1109/arith.2001.930115Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms. doi:10.1137/1.9780898718027Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2015). Reproducible Triangular Solvers for High-Performance Computing. 2015 12th International Conference on Information Technology - New Generations. doi:10.1109/itng.2015.63Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2016). Reproducible and Accurate Matrix Multiplication. Lecture Notes in Computer Science, 126-137. doi:10.1007/978-3-319-31769-4_11Kulisch, U., & Snyder, V. (2010). The exact dot product as basic tool for long interval arithmetic. Computing, 91(3), 307-313. doi:10.1007/s00607-010-0127-7Li, X. S., Demmel, J. W., Bailey, D. H., Henry, G., Hida, Y., Iskandar, J., … Yoo, D. J. (2002). Design, implementation and testing of extended and mixed precision BLAS. ACM Transactions on Mathematical Software, 28(2), 152-205. doi:10.1145/567806.567808Muller, J.-M., Brisebarre, N., de Dinechin, F., Jeannerod, C.-P., Lefèvre, V., Melquiond, G., … Torres, S. (2010). Handbook of Floating-Point Arithmetic. doi:10.1007/978-0-8176-4705-6Ogita, T., Rump, S. M., & Oishi, S. (2005). Accurate Sum and Dot Product. SIAM Journal on Scientific Computing, 26(6), 1955-1988. doi:10.1137/030601818Ortega, J. . (1988). The ijk forms of factorization methods I. Vector computers. Parallel Computing, 7(2), 135-147. doi:10.1016/0167-8191(88)90035-xRump, S. M. (2009). Ultimately Fast Accurate Summation. SIAM Journal on Scientific Computing, 31(5), 3466-3502. doi:10.1137/080738490Skeel, R. D. (1979). Scaling for Numerical Stability in Gaussian Elimination. Journal of the ACM, 26(3), 494-526. doi:10.1145/322139.322148Zhu, Y.-K., & Hayes, W. B. (2010). Algorithm 908. ACM Transactions on Mathematical Software, 37(3), 1-13. doi:10.1145/1824801.182481

    Brote nosocomial de sarampión

    Desde 2010, España y el resto de la Unión Europea presentan niveles de incidencia de sarampión no observados en las dos últimas décadas. En la Comunidad Valenciana en junio del 2011 se notifica el primer caso de sarampión confirmado correspondiente al brote epidémico iniciado en la ciudad de Valencia. El 15 de agosto se produce la transmisión al personal sanitario iniciándose un brote nosocomial. Se diseñó un estudio longitudinal, anidado en el seguimiento de la situación epidémica, para casos incidentes vinculados al territorio hospitalario. Desde el 26 de junio hasta el cierre del brote nosocomial se produjeron 177 notificaciones de sarampión, que permitieron confirmar 151 casos. Cincuenta casos correspondieron al brote comunitario que dio origen a la situación epidémica; al brote nosocomial, 21 casos. El genotipo obtenido fue Paramixovirus D4. El estado inmunitario frente al sarampión era desconocido en el 52,38%. El personal afectado pertenecía al área de urgencias. La finalización de la situación epidémica fue consecuencia más de la eliminación natural de población susceptible que de medidas activas de protección. En nuestra opinión este brote pone de manifiesto que la extensión a la población general se produjo, en gran parte, como consecuencia del brote nosocomial

    Psychometric Properties and factor structure of the spanish version of the HC-PAIRS questionnaire

    Objective To develop a Spanish version of the Health Care Providers" Pain and Impairment Relationship Scale (HC-PAIRS) and to test its psychometric properties. Methods A forward and backward translation methodology was used to translate the questionnaire, which was then applied to 206 participants (174physiotherapy students and 32 family physicians). The intraclass correlation coefficient was calculated to assess testretest reliability. Internal consistency was evaluated using Cronbach"s alpha and item analysis. Construct validity was measured using Pearson correlation coefficients between HC-PAIRS and FABQ, FABQ-Phys, FABQ-Work and the responses given by participants to three clinical case scenarios. An exploratory factor analysis was carried out following the Kaiser normalization criteria and principal axis factoring with an oblique rotation (quartimax). Sensitivity to change was assessed after a teaching module. Results Testretest reliability was ICC 0.50 (p\0.01)and Cronbach"s alpha was 0.825. The HC-PAIRS scores correlated significantly with the scores of the FABQ and also with the recommendations for work and activity given by the participants in the three clinical case scenarios. Sensitivity to change test showed an effect size of 1.5, which is considered a large change. Factor analysis suggests that the Spanish version of HC-PAIRS measures a unidimensional construct. Conclusion The Spanish version of the HC-PAIRS has proven to be a reliable, valid and sensitive instrument to assess health care providers" attitudes and beliefs about LBP. It can be used in evaluating clinical practice and in undergraduate acquisition of skills and knowledge

    Modeling power and energy of the task-parallel Cholesky factorization on multicore processors

    The authors were supported by the CICYT project TIN2011-23283 of the Ministerio de Economía y Competitividad and FEDER