1,205 research outputs found

    High performance graph analysis on parallel architectures

    Get PDF
    PhD ThesisOver the last decade pharmacology has been developing computational methods to enhance drug development and testing. A computational method called network pharmacology uses graph analysis tools to determine protein target sets that can lead on better targeted drugs for diseases as Cancer. One promising area of network-based pharmacology is the detection of protein groups that can produce better e ects if they are targeted together by drugs. However, the e cient prediction of such protein combinations is still a bottleneck in the area of computational biology. The computational burden of the algorithms used by such protein prediction strategies to characterise the importance of such proteins consists an additional challenge for the eld of network pharmacology. Such computationally expensive graph algorithms as the all pairs shortest path (APSP) computation can a ect the overall drug discovery process as needed network analysis results cannot be given on time. An ideal solution for these highly intensive computations could be the use of super-computing. However, graph algorithms have datadriven computation dictated by the structure of the graph and this can lead to low compute capacity utilisation with execution times dominated by memory latency. Therefore, this thesis seeks optimised solutions for the real-world graph problems of critical node detection and e ectiveness characterisation emerged from the collaboration with a pioneer company in the eld of network pharmacology as part of a Knowledge Transfer Partnership (KTP) / Secondment (KTS). In particular, we examine how genetic algorithms could bene t the prediction of protein complexes where their removal could produce a more e ective 'druggable' impact. Furthermore, we investigate how the problem of all pairs shortest path (APSP) computation can be bene ted by the use of emerging parallel hardware architectures as GPU- and FPGA- desktop-based accelerators. In particular, we address the problem of critical node detection with the development of a heuristic search method. It is based on a genetic algorithm that computes optimised node combinations where their removal causes greater impact than common impact analysis strategies. Furthermore, we design a general pattern for parallel network analysis on multi-core architectures that considers graph's embedded properties. It is a divide and conquer approach that decomposes a graph into smaller subgraphs based on its strongly connected components and computes the all pairs shortest paths concurrently on GPU. Furthermore, we use linear algebra to design an APSP approach based on the BFS algorithm. We use algebraic expressions to transform the problem of path computation to multiple independent matrix-vector multiplications that are executed concurrently on FPGA. Finally, we analyse how the optimised solutions of perturbation analysis and parallel graph processing provided in this thesis will impact the drug discovery process.This research was part of a Knowledge Transfer Partnership (KTP) and Knowledge Transfer Secondment (KTS) between e-therapeutics PLC and Newcastle University. It was supported as a collaborative project by e-therapeutics PLC and Technology Strategy boar

    Reconfigurable computing for large-scale graph traversal algorithms

    Get PDF
    This thesis proposes a reconfigurable computing approach for supporting parallel processing in large-scale graph traversal algorithms. Our approach is based on a reconfigurable hardware architecture which exploits the capabilities of both FPGAs (Field-Programmable Gate Arrays) and a multi-bank parallel memory subsystem. The proposed methodology to accelerate graph traversal algorithms has been applied to three case studies, revealing that application-specific hardware customisations can benefit performance. A summary of our four contributions is as follows. First, a reconfigurable computing approach to accelerate large-scale graph traversal algorithms. We propose a reconfigurable hardware architecture which decouples computation and communication while keeping multiple memory requests in flight at any given time, taking advantage of the high bandwidth of multi-bank memory subsystems. Second, a demonstration of the effectiveness of our approach through two case studies: the breadth-first search algorithm, and a graphlet counting algorithm from bioinformatics. Both case studies involve graph traversal, but each of them adopts a different graph data representation. Third, a method for using on-chip memory resources in FPGAs to reduce off-chip memory accesses for accelerating graph traversal algorithms, through a case-study of the All-Pairs Shortest-Paths algorithm. This case study has been applied to process human brain network data. Fourth, an evaluation of an approach based on instruction-set extension for FPGA design against many-core GPUs (Graphics Processing Units), based on a set of benchmarks with different memory access characteristics. It is shown that while GPUs excel at streaming applications, the proposed approach can outperform GPUs in applications with poor locality characteristics, such as graph traversal problems.Open Acces

    Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study

    Full text link
    Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architecture. While optimizing applications on CPUs, GPUs and first Xeon Phi's has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound applications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS.Comment: Computer Science - CACIC 2017. Springer Communications in Computer and Information Science, vol 79

    Cellular Automata Applications in Shortest Path Problem

    Full text link
    Cellular Automata (CAs) are computational models that can capture the essential features of systems in which global behavior emerges from the collective effect of simple components, which interact locally. During the last decades, CAs have been extensively used for mimicking several natural processes and systems to find fine solutions in many complex hard to solve computer science and engineering problems. Among them, the shortest path problem is one of the most pronounced and highly studied problems that scientists have been trying to tackle by using a plethora of methodologies and even unconventional approaches. The proposed solutions are mainly justified by their ability to provide a correct solution in a better time complexity than the renowned Dijkstra's algorithm. Although there is a wide variety regarding the algorithmic complexity of the algorithms suggested, spanning from simplistic graph traversal algorithms to complex nature inspired and bio-mimicking algorithms, in this chapter we focus on the successful application of CAs to shortest path problem as found in various diverse disciplines like computer science, swarm robotics, computer networks, decision science and biomimicking of biological organisms' behaviour. In particular, an introduction on the first CA-based algorithm tackling the shortest path problem is provided in detail. After the short presentation of shortest path algorithms arriving from the relaxization of the CAs principles, the application of the CA-based shortest path definition on the coordinated motion of swarm robotics is also introduced. Moreover, the CA based application of shortest path finding in computer networks is presented in brief. Finally, a CA that models exactly the behavior of a biological organism, namely the Physarum's behavior, finding the minimum-length path between two points in a labyrinth is given.Comment: To appear in the book: Adamatzky, A (Ed.) Shortest path solvers. From software to wetware. Springer, 201

    Floyd-Warshall Algorithm 1

    Get PDF
    Abstract: There are several applications in VLSI technology that require high-speed shortest-path computations. The shortest path is a path between two nodes (or points) in a graph such that the sum of the weights of its constituent edges is minimum. Floyd-Warshall algorithm provides fastest computation of shortest path between all pair of nodes present in the graph. With rapid advances in VLSI technology, Field Programmable Gate Arrays (FPGAs) are receiving the attention of the Parallel and High Performance Computing community. This paper gives implementation outcome of Floyd-Warshall algorithm to solve the all pairs shortest-paths problem for directed graph in Verilog

    ГСнСрация ΠΏΠΎΡ‚ΠΎΠΊΠΎΠ²Ρ‹Ρ… сСтСй Π°ΠΊΡ‚ΠΎΡ€ΠΎΠ² поиска ΠΊΡ€Π°Ρ‚Ρ‡Π°ΠΉΡˆΠΈΡ… ΠΏΡƒΡ‚Π΅ΠΉ для ΠΏΠ°Ρ€Π°Π»Π»Π΅Π»ΡŒΠ½ΠΎΠΉ многоядСрной Ρ€Π΅Π°Π»ΠΈΠ·Π°Ρ†ΠΈΠΈ

    Get PDF
    Objectives. The problem of parallelizing computations on multicore systems is considered. On the Floyd – Warshall blocked algorithm of shortest paths search in dense graphs of large size, two types of parallelism are compared: fork-join and network dataflow. Using the CAL programming language, a method of developing actors and an algorithm of generating parallel dataflow networks are proposed. The objective is to improve performance of parallel implementations of algorithms which have the property of partial order of computations on multicore processors.Methods. Methods of graph theory, algorithm theory, parallelization theory and formal language theory are used.Results. Claims about the possibility of reordering calculations in the blocked Floyd – Warshall algorithm are proved, which make it possible to achieve a greater load of cores during algorithm execution. Based on the claims, a method of constructing actors in the CAL language is developed and an algorithm for automatic generation of dataflow CAL networks for various configurations of block matrices describing the lengths of the shortest paths is proposed. It is proved that the networks have the properties of rate consistency, boundedness, and liveness. In actors running in parallel, the order of execution of actions with asynchronous behavior can change dynamically, resulting in efficient use of caches and increased core load. To implement the new features of actors, networks and the method of their generation, a tunable multi-threaded CAL engine has been developed that implements a static dataflow model of computation with bounded sizes of buffers. From the experimental results obtained on four types of multi-core processors it follows that there is an optimal size of the network matrix of actors for which the performance is maximum, and the size depends on the number of cores and the size of graph.Conclusion. It has been shown that dataflow networks of actors are an effective means to parallelize computationally intensive algorithms that describe a partial order of computations over decomposed data. The results obtained on the blocked algorithm of shortest paths search prove that the parallelism of dataflow networks gives higher performance of software implementations on multicore processors in comparison with the fork-join parallelism of OpenMP.Π¦Π΅Π»ΠΈ. РассматриваСтся Π·Π°Π΄Π°Ρ‡Π° распараллСливания вычислСний Π½Π° многоядСрных систСмах. ΠŸΠΎΡΡ€Π΅Π΄ΡΡ‚Π²ΠΎΠΌ Π±Π»ΠΎΡ‡Π½ΠΎΠ³ΠΎ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ° Π€Π»ΠΎΠΉΠ΄Π° – Π£ΠΎΡ€ΡˆΠ°Π»Π»Π° поиска ΠΊΡ€Π°Ρ‚Ρ‡Π°ΠΉΡˆΠΈΡ… ΠΏΡƒΡ‚Π΅ΠΉ Π½Π° ΠΏΠ»ΠΎΡ‚Π½Ρ‹Ρ… Π³Ρ€Π°Ρ„Π°Ρ… большого Ρ€Π°Π·ΠΌΠ΅Ρ€Π° ΡΡ€Π°Π²Π½ΠΈΠ²Π°ΡŽΡ‚ΡΡ Π΄Π²Π° Π²ΠΈΠ΄Π° ΠΏΠ°Ρ€Π°Π»Π»Π΅Π»ΠΈΠ·ΠΌΠ°: Ρ€Π°Π·Π²Π΅Ρ‚Π²Π»Π΅Π½ΠΈΠ΅/слияниС ΠΈ сСтСвой ΠΏΠΎΡ‚ΠΎΠΊΠΎΠ²Ρ‹ΠΉ. Π‘ использованиСм языка программирования CAL Ρ€Π°Π·Ρ€Π°Π±Π°Ρ‚Ρ‹Π²Π°ΡŽΡ‚ΡΡ ΠΌΠ΅Ρ‚ΠΎΠ΄ построСния Π°ΠΊΡ‚ΠΎΡ€ΠΎΠ² ΠΏΠΎΡ‚ΠΎΠΊΠ° Π΄Π°Π½Π½Ρ‹Ρ… ΠΈ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌ Π³Π΅Π½Π΅Ρ€Π°Ρ†ΠΈΠΈ ΠΏΠ°Ρ€Π°Π»Π»Π΅Π»ΡŒΠ½Ρ‹Ρ… сСтСй Π°ΠΊΡ‚ΠΎΡ€ΠΎΠ². ЦСлью Ρ€Π°Π±ΠΎΡ‚Ρ‹ являСтся ΠΏΠΎΠ²Ρ‹ΡˆΠ΅Π½ΠΈΠ΅ ΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ ΠΏΠ°Ρ€Π°Π»Π»Π΅Π»ΡŒΠ½Ρ‹Ρ… сСтСвых Ρ€Π΅Π°Π»ΠΈΠ·Π°Ρ†ΠΈΠΉ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ², ΠΎΠ±Π»Π°Π΄Π°ΡŽΡ‰ΠΈΡ… свойством частичного порядка вычислСний, Π½Π° многоядСрных процСссорах.ΠœΠ΅Ρ‚ΠΎΠ΄Ρ‹. Π˜ΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡŽΡ‚ΡΡ ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ‹ Ρ‚Π΅ΠΎΡ€ΠΈΠΈ Π³Ρ€Π°Ρ„ΠΎΠ², Ρ‚Π΅ΠΎΡ€ΠΈΠΈ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ², Ρ‚Π΅ΠΎΡ€ΠΈΠΈ распараллСливания, Ρ‚Π΅ΠΎΡ€ΠΈΠΈ Ρ„ΠΎΡ€ΠΌΠ°Π»ΡŒΠ½Ρ‹Ρ… языков.Π Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Ρ‹. Π”ΠΎΠΊΠ°Π·Π°Π½Ρ‹ утвСрТдСния ΠΎ возмоТности пСрСупорядочивания вычислСний Π² Π±Π»ΠΎΡ‡Π½ΠΎΠΌ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ΅ Π€Π»ΠΎΠΉΠ΄Π° – Π£ΠΎΡ€ΡˆΠ°Π»Π»Π°, ΡΠΏΠΎΡΠΎΠ±ΡΡ‚Π²ΡƒΡŽΡ‰ΠΈΠ΅ ΠΏΠΎΠ²Ρ‹ΡˆΠ΅Π½ΠΈΡŽ Π·Π°Π³Ρ€ΡƒΠ·ΠΊΠΈ ядСр ΠΏΡ€ΠΈ Ρ€Π΅Π°Π»ΠΈΠ·Π°Ρ†ΠΈΠΈ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ°. На основС ΡƒΡ‚Π²Π΅Ρ€ΠΆΠ΄Π΅Π½ΠΈΠΉ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚Π°Π½ ΠΌΠ΅Ρ‚ΠΎΠ΄ построСния Π°ΠΊΡ‚ΠΎΡ€ΠΎΠ² Π½Π° языкС CAL ΠΈ ΠΏΡ€Π΅Π΄Π»ΠΎΠΆΠ΅Π½ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌ автоматичСской Π³Π΅Π½Π΅Ρ€Π°Ρ†ΠΈΠΈ CAL-сСтСй ΠΏΠΎΡ‚ΠΎΠΊΠ° Π΄Π°Π½Π½Ρ‹Ρ… для Ρ€Π°Π·Π»ΠΈΡ‡Π½Ρ‹Ρ… ΠΊΠΎΠ½Ρ„ΠΈΠ³ΡƒΡ€Π°Ρ†ΠΈΠΉ ΠΌΠ°Ρ‚Ρ€ΠΈΡ† Π±Π»ΠΎΠΊΠΎΠ², ΠΎΠΏΠΈΡΡ‹Π²Π°ΡŽΡ‰ΠΈΡ… Π΄Π»ΠΈΠ½Ρ‹ ΠΊΡ€Π°Ρ‚Ρ‡Π°ΠΉΡˆΠΈΡ… ΠΏΡƒΡ‚Π΅ΠΉ. Π”ΠΎΠΊΠ°Π·Π°Π½ΠΎ, Ρ‡Ρ‚ΠΎ сСти ΠΎΠ±Π»Π°Π΄Π°ΡŽΡ‚ свойствами согласованности, ограничСнности ΠΈ ТивучСсти. Π’ Π°ΠΊΡ‚ΠΎΡ€Π°Ρ…, Ρ€Π°Π±ΠΎΡ‚Π°ΡŽΡ‰ΠΈΡ… ΠΏΠ°Ρ€Π°Π»Π»Π΅Π»ΡŒΠ½ΠΎ, порядок выполнСния дСйствий с асинхронным ΠΏΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ΠΌ ΠΌΠΎΠΆΠ΅Ρ‚ динамичСски ΠΌΠ΅Π½ΡΡ‚ΡŒΡΡ, Ρ‡Ρ‚ΠΎ ΠΏΡ€ΠΈΠ²ΠΎΠ΄ΠΈΡ‚ ΠΊ эффСктивному использованию кэшСй ΠΈ ΡƒΠ²Π΅Π»ΠΈΡ‡Π΅Π½ΠΈΡŽ Π·Π°Π³Ρ€ΡƒΠ·ΠΊΠΈ ядСр. Для Ρ€Π΅Π°Π»ΠΈΠ·Π°Ρ†ΠΈΠΈ Π½ΠΎΠ²Ρ‹Ρ… возмоТностСй Π°ΠΊΡ‚ΠΎΡ€ΠΎΠ², сСтСй ΠΈ ΠΌΠ΅Ρ‚ΠΎΠ΄Π° ΠΈΡ… Π³Π΅Π½Π΅Ρ€Π°Ρ†ΠΈΠΈ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚Π°Π½ настраиваСмый ΠΌΠ½ΠΎΠ³ΠΎΠΏΠΎΡ‚ΠΎΡ‡Π½Ρ‹ΠΉ CAL-Π΄Π²ΠΈΠΆΠΎΠΊ, Ρ€Π΅Π°Π»ΠΈΠ·ΡƒΡŽΡ‰ΠΈΠΉ ΡΡ‚Π°Ρ‚ΠΈΡ‡Π΅ΡΠΊΡƒΡŽ модСль ΠΏΠΎΡ‚ΠΎΠΊΠΎΠ²Ρ‹Ρ… вычислСний с ΠΎΠ³Ρ€Π°Π½ΠΈΡ‡Π΅Π½Π½Ρ‹ΠΌΠΈ Ρ€Π°Π·ΠΌΠ΅Ρ€Π°ΠΌΠΈ Π±ΡƒΡ„Π΅Ρ€ΠΎΠ². Из ΡΠΊΡΠΏΠ΅Ρ€ΠΈΠΌΠ΅Π½Ρ‚Π°Π»ΡŒΠ½Ρ‹Ρ… Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΎΠ², ΠΏΠΎΠ»ΡƒΡ‡Π΅Π½Π½Ρ‹Ρ… Π½Π° Ρ‡Π΅Ρ‚Ρ‹Ρ€Π΅Ρ… Ρ‚ΠΈΠΏΠ°Ρ… многоядСрных процСссоров, слСдуСт, Ρ‡Ρ‚ΠΎ сущСствуСт ΠΎΠΏΡ‚ΠΈΠΌΠ°Π»ΡŒΠ½Ρ‹ΠΉ Ρ€Π°Π·ΠΌΠ΅Ρ€ сСтСвой ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Ρ‹ Π°ΠΊΡ‚ΠΎΡ€ΠΎΠ², для ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ³ΠΎ ΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΡŒ максимальна, ΠΈ этот Ρ€Π°Π·ΠΌΠ΅Ρ€ зависит ΠΎΡ‚ Ρ€Π°Π·ΠΌΠ΅Ρ€Π° Π³Ρ€Π°Ρ„Π° ΠΈ количСства ядСр.Π—Π°ΠΊΠ»ΡŽΡ‡Π΅Π½ΠΈΠ΅. Показано, Ρ‡Ρ‚ΠΎ сСти Π°ΠΊΡ‚ΠΎΡ€ΠΎΠ² ΠΏΠΎΡ‚ΠΎΠΊΠ° Π΄Π°Π½Π½Ρ‹Ρ… ΡΠ²Π»ΡΡŽΡ‚ΡΡ эффСктивным срСдством распарал-лСливания Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ² с высокой Π²Ρ‹Ρ‡ΠΈΡΠ»ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΠΉ Π½Π°Π³Ρ€ΡƒΠ·ΠΊΠΎΠΉ, ΠΎΠΏΠΈΡΡ‹Π²Π°ΡŽΡ‰ΠΈΡ… частичный порядок вычислСний Π½Π°Π΄ Π΄Π°Π½Π½Ρ‹ΠΌΠΈ, Π΄Π΅ΠΊΠΎΠΌΠΏΠΎΠ·ΠΈΡ€ΠΎΠ²Π°Π½Π½Ρ‹ΠΌΠΈ Π½Π° части. Π Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Ρ‹, ΠΏΠΎΠ»ΡƒΡ‡Π΅Π½Π½Ρ‹Π΅ Π½Π° Π±Π»ΠΎΡ‡Π½ΠΎΠΌ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ΅ поиска ΠΊΡ€Π°Ρ‚Ρ‡Π°ΠΉΡˆΠΈΡ… ΠΏΡƒΡ‚Π΅ΠΉ, ΠΏΠΎΠΊΠ°Π·Π°Π»ΠΈ, Ρ‡Ρ‚ΠΎ ΠΏΠ°Ρ€Π°Π»Π»Π΅Π»ΠΈΠ·ΠΌ сСтСй ΠΏΠΎΡ‚ΠΎΠΊΠ° Π΄Π°Π½Π½Ρ‹Ρ… Π΄Π°Π΅Ρ‚ Π±ΠΎΠ»Π΅Π΅ Π²Ρ‹ΡΠΎΠΊΡƒΡŽ ΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΡŒ ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΠ½Ρ‹Ρ… Ρ€Π΅Π°Π»ΠΈΠ·Π°Ρ†ΠΈΠΉ Π½Π° многоядСрных процСссорах ΠΏΠΎ ΡΡ€Π°Π²Π½Π΅Π½ΠΈΡŽ с ΠΏΠ°Ρ€Π°Π»Π»Π΅Π»ΠΈΠ·ΠΌΠΎΠΌ развСтвлСния/слияния стандарта OpenMP
    • …
    corecore