10,520 research outputs found

    A High-Throughput Solver for Marginalized Graph Kernels on GPU

    Get PDF
    We present the design and optimization of a linear solver on General Purpose GPUs for the efficient and high-throughput evaluation of the marginalized graph kernel between pairs of labeled graphs. The solver implements a preconditioned conjugate gradient (PCG) method to compute the solution to a generalized Laplacian equation associated with the tensor product of two graphs. To cope with the gap between the instruction throughput and the memory bandwidth of current generation GPUs, our solver forms the tensor product linear system on-the-fly without storing it in memory when performing matrix-vector dot product operations in PCG. Such on-the-fly computation is accomplished by using threads in a warp to cooperatively stream the adjacency and edge label matrices of individual graphs by small square matrix blocks called tiles, which are then staged in registers and the shared memory for later reuse. Warps across a thread block can further share tiles via the shared memory to increase data reuse. We exploit the sparsity of the graphs hierarchically by storing only non-empty tiles using a coordinate format and nonzero elements within each tile using bitmaps. Besides, we propose a new partition-based reordering algorithm for aggregating nonzero elements of the graphs into fewer but denser tiles to improve the efficiency of the sparse format.We carry out extensive theoretical analyses on the graph tensor product primitives for tiles of various density and evaluate their performance on synthetic and real-world datasets. Our solver delivers three to four orders of magnitude speedup over existing CPU-based solvers such as GraKeL and GraphKernels. The capability of the solver enables kernel-based learning tasks at unprecedented scales

    Algorithmic counting of nonequivalent compact Huffman codes

    Full text link
    It is known that the following five counting problems lead to the same integer sequence~ft(n)f_t(n): the number of nonequivalent compact Huffman codes of length~nn over an alphabet of tt letters, the number of `nonequivalent' canonical rooted tt-ary trees (level-greedy trees) with nn~leaves, the number of `proper' words, the number of bounded degree sequences, and the number of ways of writing 1=1tx1+⋯+1txn1= \frac{1}{t^{x_1}}+ \dots + \frac{1}{t^{x_n}} with integers 0≤x1≤x2≤⋯≤xn0 \leq x_1 \leq x_2 \leq \dots \leq x_n. In this work, we show that one can compute this sequence for \textbf{all} n<Nn<N with essentially one power series division. In total we need at most N1+εN^{1+\varepsilon} additions and multiplications of integers of cNcN bits, c<1c<1, or N2+εN^{2+\varepsilon} bit operations, respectively. This improves an earlier bound by Even and Lempel who needed O(N3)O(N^3) operations in the integer ring or O(N4)O(N^4) bit operations, respectively

    On the Complexity of the Generalized MinRank Problem

    Full text link
    We study the complexity of solving the \emph{generalized MinRank problem}, i.e. computing the set of points where the evaluation of a polynomial matrix has rank at most rr. A natural algebraic representation of this problem gives rise to a \emph{determinantal ideal}: the ideal generated by all minors of size r+1r+1 of the matrix. We give new complexity bounds for solving this problem using Gr\"obner bases algorithms under genericity assumptions on the input matrix. In particular, these complexity bounds allow us to identify families of generalized MinRank problems for which the arithmetic complexity of the solving process is polynomial in the number of solutions. We also provide an algorithm to compute a rational parametrization of the variety of a 0-dimensional and radical system of bi-degree (D,1)(D,1). We show that its complexity can be bounded by using the complexity bounds for the generalized MinRank problem.Comment: 29 page

    Lanczos eigensolution method for high-performance computers

    Get PDF
    The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors

    Differentiable Genetic Programming

    Full text link
    We introduce the use of high order automatic differentiation, implemented via the algebra of truncated Taylor polynomials, in genetic programming. Using the Cartesian Genetic Programming encoding we obtain a high-order Taylor representation of the program output that is then used to back-propagate errors during learning. The resulting machine learning framework is called differentiable Cartesian Genetic Programming (dCGP). In the context of symbolic regression, dCGP offers a new approach to the long unsolved problem of constant representation in GP expressions. On several problems of increasing complexity we find that dCGP is able to find the exact form of the symbolic expression as well as the constants values. We also demonstrate the use of dCGP to solve a large class of differential equations and to find prime integrals of dynamical systems, presenting, in both cases, results that confirm the efficacy of our approach
    • …
    corecore