Search CORE

1,205 research outputs found

High performance graph analysis on parallel architectures

Author: Grivas Athanasios K
Publication venue: Newcastle University
Publication date: 01/01/2016
Field of study

PhD ThesisOver the last decade pharmacology has been developing computational methods to enhance drug development and testing. A computational method called network pharmacology uses graph analysis tools to determine protein target sets that can lead on better targeted drugs for diseases as Cancer. One promising area of network-based pharmacology is the detection of protein groups that can produce better e ects if they are targeted together by drugs. However, the e cient prediction of such protein combinations is still a bottleneck in the area of computational biology. The computational burden of the algorithms used by such protein prediction strategies to characterise the importance of such proteins consists an additional challenge for the eld of network pharmacology. Such computationally expensive graph algorithms as the all pairs shortest path (APSP) computation can a ect the overall drug discovery process as needed network analysis results cannot be given on time. An ideal solution for these highly intensive computations could be the use of super-computing. However, graph algorithms have datadriven computation dictated by the structure of the graph and this can lead to low compute capacity utilisation with execution times dominated by memory latency. Therefore, this thesis seeks optimised solutions for the real-world graph problems of critical node detection and e ectiveness characterisation emerged from the collaboration with a pioneer company in the eld of network pharmacology as part of a Knowledge Transfer Partnership (KTP) / Secondment (KTS). In particular, we examine how genetic algorithms could bene t the prediction of protein complexes where their removal could produce a more e ective 'druggable' impact. Furthermore, we investigate how the problem of all pairs shortest path (APSP) computation can be bene ted by the use of emerging parallel hardware architectures as GPU- and FPGA- desktop-based accelerators. In particular, we address the problem of critical node detection with the development of a heuristic search method. It is based on a genetic algorithm that computes optimised node combinations where their removal causes greater impact than common impact analysis strategies. Furthermore, we design a general pattern for parallel network analysis on multi-core architectures that considers graph's embedded properties. It is a divide and conquer approach that decomposes a graph into smaller subgraphs based on its strongly connected components and computes the all pairs shortest paths concurrently on GPU. Furthermore, we use linear algebra to design an APSP approach based on the BFS algorithm. We use algebraic expressions to transform the problem of path computation to multiple independent matrix-vector multiplications that are executed concurrently on FPGA. Finally, we analyse how the optimised solutions of perturbation analysis and parallel graph processing provided in this thesis will impact the drug discovery process.This research was part of a Knowledge Transfer Partnership (KTP) and Knowledge Transfer Secondment (KTS) between e-therapeutics PLC and Newcastle University. It was supported as a collaborative project by e-therapeutics PLC and Technology Strategy boar

Newcastle University eTheses

Reconfigurable computing for large-scale graph traversal algorithms

Author: Betkaoui Brahim
Publication venue: Computing, Imperial College London
Publication date: 01/09/2014
Field of study

This thesis proposes a reconfigurable computing approach for supporting parallel processing in large-scale graph traversal algorithms. Our approach is based on a reconfigurable hardware architecture which exploits the capabilities of both FPGAs (Field-Programmable Gate Arrays) and a multi-bank parallel memory subsystem. The proposed methodology to accelerate graph traversal algorithms has been applied to three case studies, revealing that application-specific hardware customisations can benefit performance. A summary of our four contributions is as follows. First, a reconfigurable computing approach to accelerate large-scale graph traversal algorithms. We propose a reconfigurable hardware architecture which decouples computation and communication while keeping multiple memory requests in flight at any given time, taking advantage of the high bandwidth of multi-bank memory subsystems. Second, a demonstration of the effectiveness of our approach through two case studies: the breadth-first search algorithm, and a graphlet counting algorithm from bioinformatics. Both case studies involve graph traversal, but each of them adopts a different graph data representation. Third, a method for using on-chip memory resources in FPGAs to reduce off-chip memory accesses for accelerating graph traversal algorithms, through a case-study of the All-Pairs Shortest-Paths algorithm. This case study has been applied to process human brain network data. Fourth, an evaluation of an approach based on instruction-set extension for FPGA design against many-core GPUs (Graphics Processing Units), based on a set of benchmarks with different memory access characteristics. It is shown that while GPUs excel at streaming applications, the proposed approach can outperform GPUs in applications with poor locality characteristics, such as graph traversal problems.Open Acces

Spiral - Imperial College Digital Repository

Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study

Author: A Nakaya
C Rosales
DE Culler
G Venkataraman
J Reinders
MB Giles
RW Floyd
S Jalali
S Warshall
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/11/2018
Field of study

Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architecture. While optimizing applications on CPUs, GPUs and first Xeon Phi's has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound applications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS.Comment: Computer Science - CACIC 2017. Springer Communications in Computer and Information Science, vol 79

arXiv.org e-Print Archive

Crossref

Cellular Automata Applications in Shortest Path Problem

Author: A Macwan
A Nash
AI Adamatzky
Anastasios Tsiftsis
C Nieto-Granda
C. Hochberger
C. Y. Lee
Christian Hochberger
D Ferguson
DB Johnson
EW Dijkstra
Fabio M. Marchese
G Ch Sirakoulis
G Ch Sirakoulis
George S. Fishman
H Beigy
Hong Qu
Ioakeim G. Georgoudas
J Li
J Neumann Von
J-H Liang
K Nagel
K. Ioannidis
Konstantinos Ioannidis
Lazaros Nalpantidis
M Defoort
M.A. Porta Garcia
M.G. Kechaidou
Michail-Antisthenis I Tsompanas
Michail-Antisthenis I. Tsompanas
Michail-Antisthenis I. Tsompanas
Michail-Antisthenis I. Tsompanas
Michail-Antisthenis I. Tsompanas
Michail-Antisthenis I. Tsompanas
Nikolaos Dourvas
Nikolaos I. Dourvas
Nikolaos I. Dourvas
O Montiel
PG Tzionas
Q Sun
RW Floyd
S Liu
S Warshall
SK Moghaddam
T Nakagaki
Tiago P. Nascimento
Usman Ahmed Syed
V.S. Kalogeiton
Vasileios G. Ntinas
Vasilis Evangelidis
Vassilios A. Mardiris
Vicky S. Kalogeiton
X Zhang
XGM Wang
XJ Wu
Y Wang
Publication venue
Publication date: 14/12/2017
Field of study

Cellular Automata (CAs) are computational models that can capture the essential features of systems in which global behavior emerges from the collective effect of simple components, which interact locally. During the last decades, CAs have been extensively used for mimicking several natural processes and systems to find fine solutions in many complex hard to solve computer science and engineering problems. Among them, the shortest path problem is one of the most pronounced and highly studied problems that scientists have been trying to tackle by using a plethora of methodologies and even unconventional approaches. The proposed solutions are mainly justified by their ability to provide a correct solution in a better time complexity than the renowned Dijkstra's algorithm. Although there is a wide variety regarding the algorithmic complexity of the algorithms suggested, spanning from simplistic graph traversal algorithms to complex nature inspired and bio-mimicking algorithms, in this chapter we focus on the successful application of CAs to shortest path problem as found in various diverse disciplines like computer science, swarm robotics, computer networks, decision science and biomimicking of biological organisms' behaviour. In particular, an introduction on the first CA-based algorithm tackling the shortest path problem is provided in detail. After the short presentation of shortest path algorithms arriving from the relaxization of the CAs principles, the application of the CA-based shortest path definition on the coordinated motion of swarm robotics is also introduced. Moreover, the CA based application of shortest path finding in computer networks is presented in brief. Finally, a CA that models exactly the behavior of a biological organism, namely the Physarum's behavior, finding the minimum-length path between two points in a labyrinth is given.Comment: To appear in the book: Adamatzky, A (Ed.) Shortest path solvers. From software to wetware. Springer, 201

arXiv.org e-Print Archive

Crossref

Floyd-Warshall Algorithm 1

Author: Ajay Somkuwar
Sachin Awasthi
Vijayshri Chaurasia
Publication venue
Publication date
Field of study

Abstract: There are several applications in VLSI technology that require high-speed shortest-path computations. The shortest path is a path between two nodes (or points) in a graph such that the sum of the weights of its constituent edges is minimum. Floyd-Warshall algorithm provides fastest computation of shortest path between all pair of nodes present in the graph. With rapid advances in VLSI technology, Field Programmable Gate Arrays (FPGAs) are receiving the attention of the Parallel and High Performance Computing community. This paper gives implementation outcome of Floyd-Warshall algorithm to solve the all pairs shortest-paths problem for directed graph in Verilog

CiteSeerX

Генерация потоковых сетей акторов поиска кратчайших путей для параллельной многоядерной реализации

Author: A. A. Prihozhy
А. А. Прихожий
Publication venue: UIIP NASB
Publication date: 29/06/2023
Field of study

Objectives. The problem of parallelizing computations on multicore systems is considered. On the Floyd – Warshall blocked algorithm of shortest paths search in dense graphs of large size, two types of parallelism are compared: fork-join and network dataflow. Using the CAL programming language, a method of developing actors and an algorithm of generating parallel dataflow networks are proposed. The objective is to improve performance of parallel implementations of algorithms which have the property of partial order of computations on multicore processors.Methods. Methods of graph theory, algorithm theory, parallelization theory and formal language theory are used.Results. Claims about the possibility of reordering calculations in the blocked Floyd – Warshall algorithm are proved, which make it possible to achieve a greater load of cores during algorithm execution. Based on the claims, a method of constructing actors in the CAL language is developed and an algorithm for automatic generation of dataflow CAL networks for various configurations of block matrices describing the lengths of the shortest paths is proposed. It is proved that the networks have the properties of rate consistency, boundedness, and liveness. In actors running in parallel, the order of execution of actions with asynchronous behavior can change dynamically, resulting in efficient use of caches and increased core load. To implement the new features of actors, networks and the method of their generation, a tunable multi-threaded CAL engine has been developed that implements a static dataflow model of computation with bounded sizes of buffers. From the experimental results obtained on four types of multi-core processors it follows that there is an optimal size of the network matrix of actors for which the performance is maximum, and the size depends on the number of cores and the size of graph.Conclusion. It has been shown that dataflow networks of actors are an effective means to parallelize computationally intensive algorithms that describe a partial order of computations over decomposed data. The results obtained on the blocked algorithm of shortest paths search prove that the parallelism of dataflow networks gives higher performance of software implementations on multicore processors in comparison with the fork-join parallelism of OpenMP.Цели. Рассматривается задача распараллеливания вычислений на многоядерных системах. Посредством блочного алгоритма Флойда – Уоршалла поиска кратчайших путей на плотных графах большого размера сравниваются два вида параллелизма: разветвление/слияние и сетевой потоковый. С использованием языка программирования CAL разрабатываются метод построения акторов потока данных и алгоритм генерации параллельных сетей акторов. Целью работы является повышение производительности параллельных сетевых реализаций алгоритмов, обладающих свойством частичного порядка вычислений, на многоядерных процессорах.Методы. Используются методы теории графов, теории алгоритмов, теории распараллеливания, теории формальных языков.Результаты. Доказаны утверждения о возможности переупорядочивания вычислений в блочном алгоритме Флойда – Уоршалла, способствующие повышению загрузки ядер при реализации алгоритма. На основе утверждений разработан метод построения акторов на языке CAL и предложен алгоритм автоматической генерации CAL-сетей потока данных для различных конфигураций матриц блоков, описывающих длины кратчайших путей. Доказано, что сети обладают свойствами согласованности, ограниченности и живучести. В акторах, работающих параллельно, порядок выполнения действий с асинхронным поведением может динамически меняться, что приводит к эффективному использованию кэшей и увеличению загрузки ядер. Для реализации новых возможностей акторов, сетей и метода их генерации разработан настраиваемый многопоточный CAL-движок, реализующий статическую модель потоковых вычислений с ограниченными размерами буферов. Из экспериментальных результатов, полученных на четырех типах многоядерных процессоров, следует, что существует оптимальный размер сетевой матрицы акторов, для которого производительность максимальна, и этот размер зависит от размера графа и количества ядер.Заключение. Показано, что сети акторов потока данных являются эффективным средством распарал-леливания алгоритмов с высокой вычислительной нагрузкой, описывающих частичный порядок вычислений над данными, декомпозированными на части. Результаты, полученные на блочном алгоритме поиска кратчайших путей, показали, что параллелизм сетей потока данных дает более высокую производительность программных реализаций на многоядерных процессорах по сравнению с параллелизмом разветвления/слияния стандарта OpenMP

Informatics (E-Journal) / Информатика