542 research outputs found
A High-Throughput Solver for Marginalized Graph Kernels on GPU
We present the design and optimization of a linear solver on General Purpose GPUs for the efficient and high-throughput evaluation of the marginalized graph kernel between pairs of labeled graphs. The solver implements a preconditioned conjugate gradient (PCG) method to compute the solution to a generalized Laplacian equation associated with the tensor product of two graphs. To cope with the gap between the instruction throughput and the memory bandwidth of current generation GPUs, our solver forms the tensor product linear system on-the-fly without storing it in memory when performing matrix-vector dot product operations in PCG. Such on-the-fly computation is accomplished by using threads in a warp to cooperatively stream the adjacency and edge label matrices of individual graphs by small square matrix blocks called tiles, which are then staged in registers and the shared memory for later reuse. Warps across a thread block can further share tiles via the shared memory to increase data reuse. We exploit the sparsity of the graphs hierarchically by storing only non-empty tiles using a coordinate format and nonzero elements within each tile using bitmaps. Besides, we propose a new partition-based reordering algorithm for aggregating nonzero elements of the graphs into fewer but denser tiles to improve the efficiency of the sparse format.We carry out extensive theoretical analyses on the graph tensor product primitives for tiles of various density and evaluate their performance on synthetic and real-world datasets. Our solver delivers three to four orders of magnitude speedup over existing CPU-based solvers such as GraKeL and GraphKernels. The capability of the solver enables kernel-based learning tasks at unprecedented scales
Simulating a Family of Tissue P Systems Solving SAT on the GPU
In order to provide e cient software tools to deal with large membrane
systems, high-throughput simulators are required. Parallel computing platforms are good
candidates, since they are capable of partially implementing the inherently parallel nature
of the model. In this concern, today GPUs (Graphics Processing Unit) are considered as
highly parallel processors, and they are being consolidated as accelerators for scienti c
applications. In fact, previous attempts to design P systems simulators on GPUs have
shown that a parallel architecture is better suited in performance than traditional single
CPUs.
In 2010, a GPU-based simulator was introduced for a family of P systems with active
membranes solving SAT in linear time. This is the starting point of this paper, which
presents a new GPU simulator for another polynomial-time solution to SAT by means of
tissue P systems with cell division, trading space for time. The aim of this simulator is
to further study which ingredients of di erent P systems models are well suited to be
managed by the GPU.Junta de AndalucĂa P08-TIC04200Ministerio de EconomĂa y Competitividad TIN2012-3743
QUBO.jl: A Julia Ecosystem for Quadratic Unconstrained Binary Optimization
We present QUBO.jl, an end-to-end Julia package for working with QUBO
(Quadratic Unconstrained Binary Optimization) instances. This tool aims to
convert a broad range of JuMP problems for straightforward application in many
physics and physics-inspired solution methods whose standard optimization form
is equivalent to the QUBO. These methods include quantum annealing, quantum
gate-circuit optimization algorithms (Quantum Optimization Alternating Ansatz,
Variational Quantum Eigensolver), other hardware-accelerated platforms, such as
Coherent Ising Machines and Simulated Bifurcation Machines, and more
traditional methods such as simulated annealing. Besides working with
reformulations, QUBO.jl allows its users to interface with the aforementioned
hardware, sending QUBO models in various file formats and retrieving results
for subsequent analysis. QUBO.jl was written as a JuMP / MathOptInterface (MOI)
layer that automatically maps between the input and output frames, thus
providing a smooth modeling experience
Generalize a Small Pre-trained Model to Arbitrarily Large TSP Instances
For the traveling salesman problem (TSP), the existing supervised learning
based algorithms suffer seriously from the lack of generalization ability. To
overcome this drawback, this paper tries to train (in supervised manner) a
small-scale model, which could be repetitively used to build heat maps for TSP
instances of arbitrarily large size, based on a series of techniques such as
graph sampling, graph converting and heat maps merging. Furthermore, the heat
maps are fed into a reinforcement learning approach (Monte Carlo tree search),
to guide the search of high-quality solutions. Experimental results based on a
large number of instances (with up to 10,000 vertices) show that, this new
approach clearly outperforms the existing machine learning based TSP
algorithms, and significantly improves the generalization ability of the
trained model
Learning to Search Feasible and Infeasible Regions of Routing Problems with Flexible Neural k-Opt
In this paper, we present Neural k-Opt (NeuOpt), a novel learning-to-search
(L2S) solver for routing problems. It learns to perform flexible k-opt
exchanges based on a tailored action factorization method and a customized
recurrent dual-stream decoder. As a pioneering work to circumvent the pure
feasibility masking scheme and enable the autonomous exploration of both
feasible and infeasible regions, we then propose the Guided Infeasible Region
Exploration (GIRE) scheme, which supplements the NeuOpt policy network with
feasibility-related features and leverages reward shaping to steer
reinforcement learning more effectively. Additionally, we equip NeuOpt with
Dynamic Data Augmentation (D2A) for more diverse searches during inference.
Extensive experiments on the Traveling Salesman Problem (TSP) and Capacitated
Vehicle Routing Problem (CVRP) demonstrate that our NeuOpt not only
significantly outstrips existing (masking-based) L2S solvers, but also
showcases superiority over the learning-to-construct (L2C) and
learning-to-predict (L2P) solvers. Notably, we offer fresh perspectives on how
neural solvers can handle VRP constraints. Our code is available:
https://github.com/yining043/NeuOpt.Comment: Accepted at NeurIPS 202
QuASeR -- Quantum Accelerated De Novo DNA Sequence Reconstruction
In this article, we present QuASeR, a reference-free DNA sequence
reconstruction implementation via de novo assembly on both gate-based and
quantum annealing platforms. Each one of the four steps of the implementation
(TSP, QUBO, Hamiltonians and QAOA) is explained with simple proof-of-concept
examples to target both the genomics research community and quantum application
developers in a self-contained manner. The details of the implementation are
discussed for the various layers of the quantum full-stack accelerator design.
We also highlight the limitations of current classical simulation and available
quantum hardware systems. The implementation is open-source and can be found on
https://github.com/prince-ph0en1x/QuASeR.Comment: 24 page
- âŠ