We discuss an approach for solving sparse or dense banded linear systems
Ax=b on a Graphics Processing Unit (GPU) card. The
matrix AβRNΓN is possibly nonsymmetric and
moderately large; i.e., 10000β€Nβ€500000. The ${\it split\ and\
parallelize}({\tt SaP})approachseekstopartitionthematrix{\bf A}intodiagonalsubβblocks{\bf A}_i,i=1,\ldots,P,whichareindependentlyfactoredinparallel.Thesolutionmaychoosetoconsiderortoignorethematricesthatcouplethediagonalsubβblocks{\bf A}_i.Thisapproach,alongwiththeKrylovsubspaceβbasediterativemethodthatitpreconditions,areimplementedinasolvercalled{\tt SaP::GPU},whichiscomparedintermsofefficiencywiththreecommonlyusedsparsedirectsolvers:{\tt PARDISO},{\tt SuperLU},and{\tt MUMPS}.{\tt SaP::GPU},whichrunsentirelyontheGPUexceptseveralstagesinvolvedinpreliminaryrowβcolumnpermutations,isrobustandcompareswellintermsofefficiencywiththeaforementioneddirectsolvers.InacomparisonagainstIntelβ²s{\tt MKL},{\tt SaP::GPU}alsofareswellwhenusedtosolvedensebandedsystemsthatareclosetobeingdiagonallydominant.{\tt SaP::GPU}$ is publicly available and distributed as
open source under a permissive BSD3 license.Comment: 38 page