19,422 research outputs found

    Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

    Full text link
    We discuss an approach for solving sparse or dense banded linear systems Ax=b{\bf A} {\bf x} = {\bf b} on a Graphics Processing Unit (GPU) card. The matrix A∈RN×N{\bf A} \in {\mathbb{R}}^{N \times N} is possibly nonsymmetric and moderately large; i.e., 10000≤N≤50000010000 \leq N \leq 500000. The ${\it split\ and\ parallelize}( ({\tt SaP})approachseekstopartitionthematrix) approach seeks to partition the matrix {\bf A}intodiagonalsub−blocks into diagonal sub-blocks {\bf A}_i,, i=1,\ldots,P,whichareindependentlyfactoredinparallel.Thesolutionmaychoosetoconsiderortoignorethematricesthatcouplethediagonalsub−blocks, which are independently factored in parallel. The solution may choose to consider or to ignore the matrices that couple the diagonal sub-blocks {\bf A}_i.Thisapproach,alongwiththeKrylovsubspace−basediterativemethodthatitpreconditions,areimplementedinasolvercalled. This approach, along with the Krylov subspace-based iterative method that it preconditions, are implemented in a solver called {\tt SaP::GPU},whichiscomparedintermsofefficiencywiththreecommonlyusedsparsedirectsolvers:, which is compared in terms of efficiency with three commonly used sparse direct solvers: {\tt PARDISO},, {\tt SuperLU},and, and {\tt MUMPS}.. {\tt SaP::GPU},whichrunsentirelyontheGPUexceptseveralstagesinvolvedinpreliminaryrow−columnpermutations,isrobustandcompareswellintermsofefficiencywiththeaforementioneddirectsolvers.InacomparisonagainstIntel′s, which runs entirely on the GPU except several stages involved in preliminary row-column permutations, is robust and compares well in terms of efficiency with the aforementioned direct solvers. In a comparison against Intel's {\tt MKL},, {\tt SaP::GPU}alsofareswellwhenusedtosolvedensebandedsystemsthatareclosetobeingdiagonallydominant. also fares well when used to solve dense banded systems that are close to being diagonally dominant. {\tt SaP::GPU}$ is publicly available and distributed as open source under a permissive BSD3 license.Comment: 38 page

    Activity recognition from videos with parallel hypergraph matching on GPUs

    Full text link
    In this paper, we propose a method for activity recognition from videos based on sparse local features and hypergraph matching. We benefit from special properties of the temporal domain in the data to derive a sequential and fast graph matching algorithm for GPUs. Traditionally, graphs and hypergraphs are frequently used to recognize complex and often non-rigid patterns in computer vision, either through graph matching or point-set matching with graphs. Most formulations resort to the minimization of a difficult discrete energy function mixing geometric or structural terms with data attached terms involving appearance features. Traditional methods solve this minimization problem approximately, for instance with spectral techniques. In this work, instead of solving the problem approximatively, the exact solution for the optimal assignment is calculated in parallel on GPUs. The graphical structure is simplified and regularized, which allows to derive an efficient recursive minimization algorithm. The algorithm distributes subproblems over the calculation units of a GPU, which solves them in parallel, allowing the system to run faster than real-time on medium-end GPUs
    • …
    corecore