31 research outputs found
Parallel approaches to shortest-path problems for multilevel heterogeneous computing
Existen diferentes algoritmos que solucionan problemas de computación del camino-más-corto. Estos problemas son clave dentro de la optimización combinatoria por sus múltiples aplicaciones en la vida real. Últimamente, el interés de la comunidad cientÃfica por ellos crece significativamente, no sólo por la amplia aplicabilidad de sus soluciones, sino también por el uso eficiente de la computación paralela. La aparición de nuevos modelos de programación junto con las modernas GPUs, ha enriquecido el rendimiento de los algoritmos paralelos anteriores, y ha propiciado la creación otros más eficientes. El uso conjunto de estos dispositivos junto con las CPUs conforman la herramienta perfecta para enfrentarse a los problemas más costosos del cálculo de caminos-más-cortos. Esta Tesis Doctoral aborda ambos contextos mediante: el desarrollo de nuevos planteamientos sobre GPUs
para problemas de caminos-más-cortos, junto con el estudio de configuraciones óptimas; y el diseño de soluciones que combinan algoritmos secuenciales y paralelos en entornos heterogéneos.Departamento de Informática (Arquitectura y TecnologÃa de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos
Algorithm Engineering for fundamental Sorting and Graph Problems
Fundamental Algorithms build a basis knowledge for every computer science undergraduate or a professional programmer. It is a set of basic techniques one can find in any (good) coursebook on algorithms and data structures. In this thesis we try to close the gap between theoretically worst-case optimal classical algorithms and the real-world circumstances one face under the assumptions imposed by the data size, limited main memory or available parallelism
Tree-based Coarsening and Partitioning of Complex Networks
Many applications produce massive complex networks whose analysis would
benefit from parallel processing. Parallel algorithms, in turn, often require a
suitable network partition. For solving optimization tasks such as graph
partitioning on large networks, multilevel methods are preferred in practice.
Yet, complex networks pose challenges to established multilevel algorithms, in
particular to their coarsening phase.
One way to specify a (recursive) coarsening of a graph is to rate its edges
and then contract the edges as prioritized by the rating. In this paper we (i)
define weights for the edges of a network that express the edges' importance
for connectivity, (ii) compute a minimum weight spanning tree with
respect to these weights, and (iii) rate the network edges based on the
conductance values of 's fundamental cuts. To this end, we also (iv)
develop the first optimal linear-time algorithm to compute the conductance
values of \emph{all} fundamental cuts of a given spanning tree. We integrate
the new edge rating into a leading multilevel graph partitioner and equip the
latter with a new greedy postprocessing for optimizing the maximum
communication volume (MCV). Experiments on bipartitioning frequently used
benchmark networks show that the postprocessing already reduces MCV by 11.3%.
Our new edge rating further reduces MCV by 10.3% compared to the previously
best rating with the postprocessing in place for both ratings. In total, with a
modest increase in running time, our new approach reduces the MCV of complex
network partitions by 20.4%
Efficiently Answering Quality Constrained Shortest Distance Queries in Large Graphs
The shortest-path distance is a fundamental concept in graph data analytics and has been extensively studied in literature. In many real-world applications, quality constraints are naturally associated with edges in the graph, and finding the shortest distance between vertices along only valid edges (i.e., edges that satisfy a given quality constraint) is also critical. In this work, we investigate this novel and important problem of quality constraint shortest distance queries. We propose an efficient index structure based on 2-hop labeling approaches. Supported by a path dominance relationship incorporating both quality and length information, we demonstrate the minimal property of the new index. An efficient query processing algorithm is also developed. Extensive experimental studies over real-life datasets demonstrates efficiency and effectiveness of our techniques
Scalable Graph Convolutional Network Training on Distributed-Memory Systems
Graph Convolutional Networks (GCNs) are extensively utilized for deep
learning on graphs. The large data sizes of graphs and their vertex features
make scalable training algorithms and distributed memory systems necessary.
Since the convolution operation on graphs induces irregular memory access
patterns, designing a memory- and communication-efficient parallel algorithm
for GCN training poses unique challenges. We propose a highly parallel training
algorithm that scales to large processor counts. In our solution, the large
adjacency and vertex-feature matrices are partitioned among processors. We
exploit the vertex-partitioning of the graph to use non-blocking point-to-point
communication operations between processors for better scalability. To further
minimize the parallelization overheads, we introduce a sparse matrix
partitioning scheme based on a hypergraph partitioning model for full-batch
training. We also propose a novel stochastic hypergraph model to encode the
expected communication volume in mini-batch training. We show the merits of the
hypergraph model, previously unexplored for GCN training, over the standard
graph partitioning model which does not accurately encode the communication
costs. Experiments performed on real-world graph datasets demonstrate that the
proposed algorithms achieve considerable speedups over alternative solutions.
The optimizations achieved on communication costs become even more pronounced
at high scalability with many processors. The performance benefits are
preserved in deeper GCNs having more layers as well as on billion-scale graphs.Comment: To appear in PVLDB'2
Recommended from our members
Elixir: synthesis of parallel irregular algorithms
Algorithms in new application areas like machine learning and data analytics usually operate on unstructured sparse graphs. Writing efficient parallel code to implement these algorithms is very challenging for a number of reasons.
First, there may be many algorithms to solve a problem and each algorithm may have many implementations. Second, synchronization, which is necessary for correct parallel execution, introduces potential problems such as data-races and deadlocks. These issues interact in subtle ways, making the best solution dependent both on the parallel platform and on properties of the input graph. Consequently, implementing and selecting the best parallel solution can be a daunting task for non-experts, since we have few performance models for predicting the performance of parallel sparse graph programs on parallel hardware.
This dissertation presents a synthesis methodology and a system, Elixir, that addresses these problems by (i) allowing programmers to specify solutions at a high level of abstraction, and (ii) generating many parallel implementations automatically and using search to find the best one. An Elixir specification consists of a set of operators capturing the main algorithm logic and a schedule specifying how to efficiently apply the operators. Elixir employs sophisticated automated reasoning to merge these two components, and uses techniques based on automated planning to insert synchronization and synthesize efficient parallel code.
Experimental evaluation of our approach demonstrates that the performance of the Elixir generated code is competitive to, and can even outperform, hand-optimized code written by expert programmers for many interesting graph benchmarks.Computer Science
Scalable graph convolutional network training on distributed-memory systems
Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs. The large data sizes of graphs and their vertex features make scalable training algorithms and distributed memory systems necessary. Since the convolution operation on graphs induces irregular memory access patterns, designing a memory- and communication-efficient parallel algorithm for GCN training poses unique challenges. We propose a highly parallel training algorithm that scales to large processor counts. In our solution, the large adjacency and vertex-feature matrices are partitioned among processors. We exploit the vertex-partitioning of the graph to use non-blocking point-to-point communication operations between processors for better scalability. To further minimize the parallelization overheads, we introduce a sparse matrix partitioning scheme based on a hypergraph partitioning model for full-batch training. We also propose a novel stochastic hypergraph model to encode the expected communication volume in mini-batch training. We show the merits of the hypergraph model, previously unexplored for GCN training, over the standard graph partitioning model which does not accurately encode the communication costs. Experiments performed on real-world graph datasets demonstrate that the proposed algorithms achieve considerable speedups over alternative solutions. The optimizations achieved on communication costs become even more pronounced at high scalability with many processors. The performance benefits are preserved in deeper GCNs having more layers as well as on billion-scale graphs
Advanced Route Planning in Transportation Networks
We present fast and efficient algorithms for routing in road and public transit networks. An algorithm for public transit can handle very large and poorly structured networks in a fully realistic scenario. Algorithms to answer flexible shortest path queries consider additional query parameters, such as edge weight or restrictions. Finally, specialized algorithms compute sets of related shortest path distances for time-dependent distance table computation, ride sharing and closest POI location
Many-core Algorithms for Combinatorial Optimization
Combinatorial Optimization is becoming ever more crucial, in these days. From natural sciences to economics, passing through urban centers administration and personnel management, methodologies and algorithms with a strong theoretical background and a consolidated real-word effectiveness is more and more requested, in order to find, quickly, good solutions to complex strategical problems. Resource optimization is, nowadays, a fundamental ground for building the basements of successful projects. From the theoretical point of view, Combinatorial Optimization rests on stable and strong foundations, that allow researchers to face ever more challenging problems.
However, from the application point of view, it seems that the rate of theoretical developments cannot cope with that enjoyed by modern hardware technologies,
especially with reference to the one of processors industry. In this work we propose new parallel algorithms, designed for exploiting the new parallel architectures available on the market. We found that, exposing the inherent parallelism of some resolution techniques (like Dynamic Programming), the computational benefits are remarkable, lowering the execution times by more than an order of magnitude, and allowing to address instances with dimensions not possible before. We approached four Combinatorial Optimization’s notable problems: Packing Problem, Vehicle Routing Problem, Single Source Shortest Path Problem and a Network Design problem. For each of these problems we propose a collection of effective parallel solution algorithms, either for solving the full problem (Guillotine Cuts and SSSPP) or for enhancing a fundamental part of the solution method (VRP and ND).
We endorse our claim by presenting computational results for all problems, either on standard benchmarks from the literature or, when possible, on data from real-world applications, where speed-ups of one order of magnitude are usually attained, not uncommonly scaling up to 40 X factors