6 research outputs found

    Assessment of Two Task Frameworks with Dependencies for Matrix Factorizations on a Multicore Architecture

    Get PDF
    In this study, we evaluate two task frameworks with dependencies for important application kernels coming from the numerical linear algebra. In this approach, the algorithms of the matrix factorization are considered, namely the tiled LU and the WZ factorizations both without pivoting. In tiled algorithms, the operations are represented as a sequence of small tasks which operate on square blocks (tiles) of the data. The dependencies among tasks are expressed as a direct acyclic graph and the runtime system runs the graph on a multicore architecture. The performance of applications based on the task dependencies is related to efficient compilers and the runtime systems. We report the performance and the scalability of two task frameworks with dependencies on the multicore architecture for the matrix factorizations. Namely, we compare OpenMP and Intel Thread Building Blocks. Our results show that the number of tiles in both factorizations always have an impact on the performance and the speedup. Both the frameworks show their suitability for efficient parallelization of such applications, although both have their own merits and flaws

    Using desktop computers to solve large-scale dense linear algebra problems

    No full text
    We provide experimental evidence that current desktop computers feature enough computational power to solve large-scale dense linear algebra problems. While the high computational cost of the numerical methods for solving these problems can be tackled by the multiple cores of current processors, we propose to use the disk to store the large data structures associated with these applications. Our results also show that the limited amount of RAM and the comparatively slow disk of the system pose no problem for the solution of very large dense linear systems and linear least-squares problems. Thus, current desktop computers are revealed as an appealing, cost-effective platform for research groups that have to deal with large dense linear algebra problems but have no direct access to large computing facilities

    Computed tomography medical image reconstruction on affordable equipment by using Out-Of-Core techniques

    Get PDF
    [EN] Background and objective: As Computed Tomography scans are an essential medical test, many techniques have been proposed to reconstruct high-quality images using a smaller amount of radiation. One approach is to employ algebraic factorization methods to reconstruct the images, using fewer views than the traditional analytical methods. However, their main drawback is the high computational cost and hence the time needed to obtain the images, which is critical in the daily clinical practice. For this reason, faster methods for solving this problem are required. Methods: In this paper, we propose a new reconstruction method based on the QR factorization that is very efficient on affordable equipment (standard multicore processors and standard Solid-State Drives) by using Out-Of-Core techniques. Results: Combining both affordable hardware and the new software proposed in our work, the images can be reconstructed very quickly and with high quality. We analyze the reconstructions using real Computed Tomography images selected from a dataset, comparing the QR method to the LSQR and FBP. We measure the quality of the images using the metrics Peak Signal-To-Noise Ratio and Structural Similarity Index, obtaining very high values. We also compare the efficiency of using spinning disks versus Solid-State Drives, showing how the latter performs the Input/Output operations in a significantly lower amount of time. Conclusions: The results indicate that our proposed me thod and software are valid to efficiently solve large-scale systems and can be applied to the Computed Tomography reconstruction problem to obtain high-quality images.This research has been supported by "Universitat Politecnica de Valencia", "Generalitat Valenciana" under PROMETEO/2018/035 and ACIF/2017/075, co-financed by FEDER and FSE funds, and the "Spanish Ministry of Science, Innovation and Universities" under Grant RTI2018-098156-B-C54 co-financed by FEDER funds.Chillarón-Pérez, M.; Quintana Ortí, G.; Vidal-Gimeno, V.; Verdú Martín, GJ. (2020). Computed tomography medical image reconstruction on affordable equipment by using Out-Of-Core techniques. Computer Methods and Programs in Biomedicine. 193:1-11. https://doi.org/10.1016/j.cmpb.2020.105488S111193Berrington de González, A. (2009). Projected Cancer Risks From Computed Tomographic Scans Performed in the United States in 2007. Archives of Internal Medicine, 169(22), 2071. doi:10.1001/archinternmed.2009.440HALL, E. J., & BRENNER, D. J. (2008). Cancer risks from diagnostic radiology. The British Journal of Radiology, 81(965), 362-378. doi:10.1259/bjr/01948454Tang, X., Hsieh, J., Nilsen, R. A., Dutta, S., Samsonov, D., & Hagiwara, A. (2006). A three-dimensional-weighted cone beam filtered backprojection (CB-FBP) algorithm for image reconstruction in volumetric CT—helical scanning. Physics in Medicine and Biology, 51(4), 855-874. doi:10.1088/0031-9155/51/4/007Zhuang, T., Leng, S., Nett, B. E., & Chen, G.-H. (2004). Fan-beam and cone-beam image reconstruction via filtering the backprojection image of differentiated projection data. Physics in Medicine and Biology, 49(24), 5489-5503. doi:10.1088/0031-9155/49/24/007Mori, S., Endo, M., Komatsu, S., Kandatsu, S., Yashiro, T., & Baba, M. (2006). A combination-weighted Feldkamp-based reconstruction algorithm for cone-beam CT. Physics in Medicine and Biology, 51(16), 3953-3965. doi:10.1088/0031-9155/51/16/005Willemink, M. J., de Jong, P. A., Leiner, T., de Heer, L. M., Nievelstein, R. A. J., Budde, R. P. J., & Schilham, A. M. R. (2013). Iterative reconstruction techniques for computed tomography Part 1: Technical principles. European Radiology, 23(6), 1623-1631. doi:10.1007/s00330-012-2765-yWillemink, M. J., Leiner, T., de Jong, P. A., de Heer, L. M., Nievelstein, R. A. J., Schilham, A. M. R., & Budde, R. P. J. (2013). Iterative reconstruction techniques for computed tomography part 2: initial results in dose reduction and image quality. European Radiology, 23(6), 1632-1642. doi:10.1007/s00330-012-2764-zWu, W., Liu, F., Zhang, Y., Wang, Q., & Yu, H. (2019). Non-Local Low-Rank Cube-Based Tensor Factorization for Spectral CT Reconstruction. IEEE Transactions on Medical Imaging, 38(4), 1079-1093. doi:10.1109/tmi.2018.2878226Wu, W., Zhang, Y., Wang, Q., Liu, F., Chen, P., & Yu, H. (2018). Low-dose spectral CT reconstruction using image gradient ℓ0–norm and tensor dictionary. Applied Mathematical Modelling, 63, 538-557. doi:10.1016/j.apm.2018.07.006Andersen, A. H. (1989). Algebraic reconstruction in CT from limited views. IEEE Transactions on Medical Imaging, 8(1), 50-55. doi:10.1109/42.20361Andersen, A. H., & Kak, A. C. (1984). Simultaneous Algebraic Reconstruction Technique (SART): A Superior Implementation of the Art Algorithm. Ultrasonic Imaging, 6(1), 81-94. doi:10.1177/016173468400600107Yu, W., & Zeng, L. (2014). A Novel Weighted Total Difference Based Image Reconstruction Algorithm for Few-View Computed Tomography. PLoS ONE, 9(10), e109345. doi:10.1371/journal.pone.0109345Flores, L., Vidal, V., & Verdú, G. (2015). Iterative Reconstruction from Few-view Projections. Procedia Computer Science, 51, 703-712. doi:10.1016/j.procs.2015.05.188Flores, L. A., Vidal, V., Mayo, P., Rodenas, F., & Verdú, G. (2014). Parallel CT image reconstruction based on GPUs. Radiation Physics and Chemistry, 95, 247-250. doi:10.1016/j.radphyschem.2013.03.011Chillarón, M., Vidal, V., Segrelles, D., Blanquer, I., & Verdú, G. (2017). Combining Grid Computing and Docker Containers for the Study and Parametrization of CT Image Reconstruction Methods. Procedia Computer Science, 108, 1195-1204. doi:10.1016/j.procs.2017.05.065Sollmann, N., Mei, K., Schwaiger, B. J., Gersing, A. S., Kopp, F. K., Bippus, R., … Baum, T. (2018). Effects of virtual tube current reduction and sparse sampling on MDCT-based femoral BMD measurements. Osteoporosis International, 29(12), 2685-2692. doi:10.1007/s00198-018-4675-6Yan Liu, Zhengrong Liang, Jianhua Ma, Hongbing Lu, Ke Wang, Hao Zhang, & Moore, W. (2014). Total Variation-Stokes Strategy for Sparse-View X-ray CT Image Reconstruction. IEEE Transactions on Medical Imaging, 33(3), 749-763. doi:10.1109/tmi.2013.2295738Tang, J., Nett, B. E., & Chen, G.-H. (2009). Performance comparison between total variation (TV)-based compressed sensing and statistical iterative reconstruction algorithms. Physics in Medicine and Biology, 54(19), 5781-5804. doi:10.1088/0031-9155/54/19/008Vandeghinste, B., Vandenberghe, S., Vanhove, C., Staelens, S., & Van Holen, R. (2013). Low-Dose Micro-CT Imaging for Vascular Segmentation and Analysis Using Sparse-View Acquisitions. PLoS ONE, 8(7), e68449. doi:10.1371/journal.pone.0068449Qi, H., Chen, Z., & Zhou, L. (2015). CT Image Reconstruction from Sparse Projections Using Adaptive TpV Regularization. Computational and Mathematical Methods in Medicine, 2015, 1-8. doi:10.1155/2015/354869Wu, W., Chen, P., Vardhanabhuti, V. V., Wu, W., & Yu, H. (2019). Improved Material Decomposition With a Two-Step Regularization for Spectral CT. IEEE Access, 7, 158770-158781. doi:10.1109/access.2019.2950427Rodriguez-Alvarez, M. J., Sanchez, F., Soriano, A., Moliner, L., Sanchez, S., & Benlloch, J. (2018). QR-Factorization Algorithm for Computed Tomography (CT): Comparison With FDK and Conjugate Gradient (CG) Algorithms. IEEE Transactions on Radiation and Plasma Medical Sciences, 2(5), 459-469. doi:10.1109/trpms.2018.2843803Chillarón, M., Vidal, V., & Verdú, G. (2020). CT image reconstruction with SuiteSparseQR factorization package. Radiation Physics and Chemistry, 167, 108289. doi:10.1016/j.radphyschem.2019.04.039Joseph, P. M. (1982). An Improved Algorithm for Reprojecting Rays through Pixel Images. IEEE Transactions on Medical Imaging, 1(3), 192-196. doi:10.1109/tmi.1982.4307572S. Toledo, F. Gustavson, The design and implementation of solar, a portable library for scalable out-of-core linear algebra computations, in: Proceedings of the Annual Workshop on I/O in Parallel and Distributed Systems, IOPADS,D’Azevedo, E., & Dongarra, J. (2000). The design and implementation of the parallel out-of-core ScaLAPACK LU, QR, and Cholesky factorization routines. Concurrency: Practice and Experience, 12(15), 1481-1493. doi:10.1002/1096-9128(20001225)12:153.0.co;2-vGunter, B. C., & Van De Geijn, R. A. (2005). Parallel out-of-core computation and updating of the QR factorization. ACM Transactions on Mathematical Software, 31(1), 60-78. doi:10.1145/1055531.1055534Quintana-Ortí, G., Igual, F. D., Marqués, M., Quintana-Ortí, E. S., & van de Geijn, R. A. (2012). A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures. ACM Transactions on Mathematical Software, 38(4), 1-25. doi:10.1145/2331130.2331133Marqués, M., Quintana-Ortí, G., Quintana-Ortí, E. S., & van de Geijn, R. (2010). Using desktop computers to solve large-scale dense linear algebra problems. The Journal of Supercomputing, 58(2), 145-150. doi:10.1007/s11227-010-0394-2G. Lauritsch, H. Bruder, FORBILD head phantom, http://www.imp.uni-erlangen.de/phantoms/head/head.html.Yan, K., Wang, X., Lu, L., & Summers, R. M. (2018). DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. Journal of Medical Imaging, 5(03), 1. doi:10.1117/1.jmi.5.3.036501Miqueles, E., Koshev, N., & Helou, E. S. (2018). A Backprojection Slice Theorem for Tomographic Reconstruction. IEEE Transactions on Image Processing, 27(2), 894-906. doi:10.1109/tip.2017.2766785N. Koshev, E.S. Helou, E.X. Miqueles, Fast backprojection techniques for high resolution tomographyarXiv preprint: 1608.03589

    Maintaining High Performance Across All Problem Sizes and Parallel Scales Using Microkernel-based Linear Algebra

    Get PDF
    Linear algebra underlies a large proportion of computational problems. With the continuous increase of scale on modern hardware, performance of small sized linear algebra has become increasingly important. To overcome the shortcomings of conventional approaches, we employ a new approach using a microkernel framework provided by ATLAS to improve the performance of a few linear algebra routines for all problem sizes. Our initial research consists of improving the performance of parallel LU factorization in ATLAS for which we were able to achieve up to 2.07x and 2.66x speedup for small problems, up to 91% and 87% of theoretical peak performance for asymptotic problems on a 12-core Intel Xeon and a 32-core AMD Opteron machine, respectively, outperforming all the state-of-the-art libraries at the time. Such performance was achieved via an exhaustive search of all the tuning parameters, which could take days. This motivated us to try to develop a computational model for our LU factorization that could predict those parameters by combining some basic empirical timings and a theoretical model based on the amount of required computations. While our model provided good prediction for mid-to-asymptotic sized problems, there were some unknown factors for small problems that could possibly be answered by extending the ATLAS tuning framework. While this extension is underway, we decided to pursue the model research using simpler serial BLAS-based approach. We investigated and implemented two Level-3 BLAS routines: TRSM and TRMM that are widely used primarily by LAPACK operations like the aforementioned LU factorization. With the microkernel-based approach, we were able to improve the performance of both routines by up to 15% and 73% for square and fat problems, respectively, over prior ATLAS implementations on modern hardware. Finally, with a collaborative research with ARM Inc., we improved the performance of the most important Level-3 BLAS operation GEMM in ATLAS by up to 53% via implementing microkernels for two 64-bit ARM architectures. This automatically improves other BLAS and LAPACK routines that rely on GEMM for high performance
    corecore