2 research outputs found

    Load-balanced parallel banded-system solvers

    Get PDF
    AbstractSolving banded systems is important in the applications of science and engineering. This paper presents a load-balancing strategy for solving banded systems in parallel when the number of processors used is small. An optimization-based load-balancing analysis is given to determine how many loads should be assigned to each processor in order to minimize the time requirement. Some experimentations are carried out on the nCUBE 2E multiprocessor to demonstrate the speedup advantage of the proposed load-balancing strategy. The speedup improvement ratio ranges from 47% to 66% (from 12% to 24%) when using 4 (8) processors

    Implementation of parallel tridiagonal solvers for a heterogeneous computing environment

    Get PDF
    Tridiagonal diagonally dominant linear systems arise in many scientific and engineering applications. The standard Thomas algorithm for solving such systems is inherently serial, forming a bottleneck in computation. Algorithms such as cyclic reduction and SPIKE reduce a single large tridiagonal system into multiple small independent systems which are solved in parallel. We develop portable cyclic reduction and the SPIKE algorithm for Open Computing Language implementations on a range of co-processors in a heterogeneous computing environment, including field programmable gate arrays, graphics processing units and other multi-core processors. We evaluate these designs in the context of solver performance, resource efficiency and numerical accuracy. References Altera. Implementing FPGA design with the OpenCL standard. Technical report, Altera Inc., November 2013. http://www.altera.com/literature/wp/wp-01173-opencl.pdf Altera. Altera SDK for OpenCL: Best practices guide. Technical report, Altera Inc., May 2015. https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/hb/opencl-sdk/aocl_optimization_guide.pdf P. Arbenz, A. Cleary, J. Dongarra, and M. Hegland. A comparison of parallel solvers for diagonally dominant and general narrow-banded linear systems. Parallel and Distributed Computing Practices, 2:385–400, 1999. http://www.scpe.org/index.php/scpe/article/view/152 P. Arbenz, A. Cleary, J. Dongarra, and M. Hegland. A comparison of parallel solvers for diagonally dominant and general narrow-banded linear systems II. In Euro-Par'99 Parallel Processing, Berlin, vol. 1685 of Lecture Notes in Computer Science pp. 1078–1087. Springer, 1999. doi:10.1007/3-540-48311-X_151 L.-W. Chang, J. A. Stratton, H.-S. Kim, and W. W. Hwu. A scalable, numerically stable, high-performance tridiagonal solver using GPUs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE Computer Society Press, 2012. doi:10.1109/SC.2012.12 Y. Dou, Y. Lei, G. Wu, S. Guo, J. Zhuo, and L. Shen. FPGA accelerating double/quad-double high precision floating-point applications for exascale computing. In Proceedings of the 24th ACM International Conference on Supercomputing, pp. 325–336, 2010. doi:10.1145/1810085.1810129 C. R. Dun, M. Hegland, and M. R. Osborne. Parallel stable solution methods for tridiagonal linear systems of equations. In Computational Techniques and Applications: CTAC95 Proceedings of the Seventh Biennial Conference, pp. 267–274. World Scientific, 1996. doi:10.1142/9789814530651 W. Gander and G. H. Golub. Cyclic reduction–-history and applications. In Scientific Computing, Proceedings of the Workshop, 1997, pp. 73–111. Springer, 1998. http://www.springer.com/fr/book/9789813083608 Khronos OpenCL Working Group. The OpenCL specification. Technical report, Khronos Group, October 2009. http://www.khronos.org/registry/cl/specs/opencl-1.0.pdf M. Hegland. On the parallel solution of tridiagonal systems by wrap-around partitioning and incomplete LU factorization. Numer. Math., 59(1):453–472, 1991. doi:10.1007/BF01385791 D. Jensen and A. F. Rodrigues. Embedded systems and exascale computing. Comput. Sci. Eng., 12(6):20–29, 2010. doi:10.1109/MCSE.2010.95 S. Palmer. Accelerating implicit finite difference schemes using a hardware optimised implementation of the thomas algorithm for FPGAs. Technical Report, 2014. http://arxiv.org/abs/1402.5094 E. Polizzi and A. H. Sameh. A parallel hybrid banded system solver: the spike algorithm. Parallel Comput., 32:177–194, 2006. doi:10.1016/j.parco.2005.07.005 W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in FORTRAN: The art of Scientific Computation. Cambridge University Press, Cambridge, England, 2nd edition, 1992. http://www.cambridge.org/au/academic/subjects/mathematics/numerical-recipes/numerical-recipes-fortran-77-art-scientific-computing-volume-1-2nd-edition J. Shalf, S. Dosanjh and J. Morrison. Exascale computing technology challenges. In High Performance Computing for Computational Science–-VECPAR 2010, vol. 6449 of Lecture Notes in Computer Science, pp 1–25. Springer, 2011. doi:10.1007/978-3-642-19328-6_1 S. Skalicky, S. Lopez, M. Lukowiak, J. Letendre and M. Ryan. Performance modeling of pipelined linear algebra architectures on FPGAs. In Reconfigurable Computing: Architectures, Tools and Applications, vol. 7806, Lecture Notes in Computer Science pp. 146-153. Springer, 2013. doi:10.1007/978-3-642-36812-7_14 D. J. Warne, R. F. Hayward, N. A. Kelson, J. E. Banks, and L. Mejias. Pulse-coupled neural network performance for real-time identification of vegetation during forced landing. In Engineering Mathematics and Applications Conference (EMAC2013), vol. 55 of ANZIAM J., pp. C1–C16, 2014. http://journal.austms.org.au/ojs/index.php/ANZIAMJ/article/view/7851 D. J. Warne, N. A. Kelson, and R. F. Hayward. Solving tri-diagonal linear systems using field programmable gate arrays. In Proceedings of the 4th International Conference on Computational Methods (ICCM2012), 2012. http://eprints.qut.edu.au/54894/ D. J. Warne, N. A. Kelson, and R. F. Hayward. Comparison of high level FPGA hardware design for solving tri-diagonal linear systems. Proc. Comput. Sci., 29:95–101, 2014. doi:10.1016/j.procs.2014.05.009 L. Wirbel. Xilinx SDAccel: A unified development environment for tomorrow's data center. Technical report, The Linley Group Inc., November 2014. http://www.xilinx.com/publications/prod_mktg/sdnet/sdaccel-wp.pdf G. Wu, Y. Dou, J. Sun and G. D. Peterson. A high performance and memory efficient LU decomposer on FPGAs. IEEE T. Comput., 61:366–378, 2012. doi:10.1109/TC.2010.27
    corecore