10,844 research outputs found
A High-Throughput Solver for Marginalized Graph Kernels on GPU
We present the design and optimization of a linear solver on General Purpose GPUs for the efficient and high-throughput evaluation of the marginalized graph kernel between pairs of labeled graphs. The solver implements a preconditioned conjugate gradient (PCG) method to compute the solution to a generalized Laplacian equation associated with the tensor product of two graphs. To cope with the gap between the instruction throughput and the memory bandwidth of current generation GPUs, our solver forms the tensor product linear system on-the-fly without storing it in memory when performing matrix-vector dot product operations in PCG. Such on-the-fly computation is accomplished by using threads in a warp to cooperatively stream the adjacency and edge label matrices of individual graphs by small square matrix blocks called tiles, which are then staged in registers and the shared memory for later reuse. Warps across a thread block can further share tiles via the shared memory to increase data reuse. We exploit the sparsity of the graphs hierarchically by storing only non-empty tiles using a coordinate format and nonzero elements within each tile using bitmaps. Besides, we propose a new partition-based reordering algorithm for aggregating nonzero elements of the graphs into fewer but denser tiles to improve the efficiency of the sparse format.We carry out extensive theoretical analyses on the graph tensor product primitives for tiles of various density and evaluate their performance on synthetic and real-world datasets. Our solver delivers three to four orders of magnitude speedup over existing CPU-based solvers such as GraKeL and GraphKernels. The capability of the solver enables kernel-based learning tasks at unprecedented scales
Algorithmic counting of nonequivalent compact Huffman codes
It is known that the following five counting problems lead to the same
integer sequence~: the number of nonequivalent compact Huffman codes of
length~ over an alphabet of letters, the number of `nonequivalent'
canonical rooted -ary trees (level-greedy trees) with ~leaves, the number
of `proper' words, the number of bounded degree sequences, and the number of
ways of writing with integers
. In this work, we show that one can
compute this sequence for \textbf{all} with essentially one power series
division. In total we need at most additions and
multiplications of integers of bits, , or bit
operations, respectively. This improves an earlier bound by Even and Lempel who
needed operations in the integer ring or bit operations,
respectively
On the Complexity of the Generalized MinRank Problem
We study the complexity of solving the \emph{generalized MinRank problem},
i.e. computing the set of points where the evaluation of a polynomial matrix
has rank at most . A natural algebraic representation of this problem gives
rise to a \emph{determinantal ideal}: the ideal generated by all minors of size
of the matrix. We give new complexity bounds for solving this problem
using Gr\"obner bases algorithms under genericity assumptions on the input
matrix. In particular, these complexity bounds allow us to identify families of
generalized MinRank problems for which the arithmetic complexity of the solving
process is polynomial in the number of solutions. We also provide an algorithm
to compute a rational parametrization of the variety of a 0-dimensional and
radical system of bi-degree . We show that its complexity can be bounded
by using the complexity bounds for the generalized MinRank problem.Comment: 29 page
Lanczos eigensolution method for high-performance computers
The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors
Differentiable Genetic Programming
We introduce the use of high order automatic differentiation, implemented via
the algebra of truncated Taylor polynomials, in genetic programming. Using the
Cartesian Genetic Programming encoding we obtain a high-order Taylor
representation of the program output that is then used to back-propagate errors
during learning. The resulting machine learning framework is called
differentiable Cartesian Genetic Programming (dCGP). In the context of symbolic
regression, dCGP offers a new approach to the long unsolved problem of constant
representation in GP expressions. On several problems of increasing complexity
we find that dCGP is able to find the exact form of the symbolic expression as
well as the constants values. We also demonstrate the use of dCGP to solve a
large class of differential equations and to find prime integrals of dynamical
systems, presenting, in both cases, results that confirm the efficacy of our
approach
- …