235 research outputs found

    JGraphT -- A Java library for graph data structures and algorithms

    Full text link
    Mathematical software and graph-theoretical algorithmic packages to efficiently model, analyze and query graphs are crucial in an era where large-scale spatial, societal and economic network data are abundantly available. One such package is JGraphT, a programming library which contains very efficient and generic graph data-structures along with a large collection of state-of-the-art algorithms. The library is written in Java with stability, interoperability and performance in mind. A distinctive feature of this library is the ability to model vertices and edges as arbitrary objects, thereby permitting natural representations of many common networks including transportation, social and biological networks. Besides classic graph algorithms such as shortest-paths and spanning-tree algorithms, the library contains numerous advanced algorithms: graph and subgraph isomorphism; matching and flow problems; approximation algorithms for NP-hard problems such as independent set and TSP; and several more exotic algorithms such as Berge graph detection. Due to its versatility and generic design, JGraphT is currently used in large-scale commercial, non-commercial and academic research projects. In this work we describe in detail the design and underlying structure of the library, and discuss its most important features and algorithms. A computational study is conducted to evaluate the performance of JGraphT versus a number of similar libraries. Experiments on a large number of graphs over a variety of popular algorithms show that JGraphT is highly competitive with other established libraries such as NetworkX or the BGL.Comment: Major Revisio

    High performance graph analysis on parallel architectures

    Get PDF
    PhD ThesisOver the last decade pharmacology has been developing computational methods to enhance drug development and testing. A computational method called network pharmacology uses graph analysis tools to determine protein target sets that can lead on better targeted drugs for diseases as Cancer. One promising area of network-based pharmacology is the detection of protein groups that can produce better e ects if they are targeted together by drugs. However, the e cient prediction of such protein combinations is still a bottleneck in the area of computational biology. The computational burden of the algorithms used by such protein prediction strategies to characterise the importance of such proteins consists an additional challenge for the eld of network pharmacology. Such computationally expensive graph algorithms as the all pairs shortest path (APSP) computation can a ect the overall drug discovery process as needed network analysis results cannot be given on time. An ideal solution for these highly intensive computations could be the use of super-computing. However, graph algorithms have datadriven computation dictated by the structure of the graph and this can lead to low compute capacity utilisation with execution times dominated by memory latency. Therefore, this thesis seeks optimised solutions for the real-world graph problems of critical node detection and e ectiveness characterisation emerged from the collaboration with a pioneer company in the eld of network pharmacology as part of a Knowledge Transfer Partnership (KTP) / Secondment (KTS). In particular, we examine how genetic algorithms could bene t the prediction of protein complexes where their removal could produce a more e ective 'druggable' impact. Furthermore, we investigate how the problem of all pairs shortest path (APSP) computation can be bene ted by the use of emerging parallel hardware architectures as GPU- and FPGA- desktop-based accelerators. In particular, we address the problem of critical node detection with the development of a heuristic search method. It is based on a genetic algorithm that computes optimised node combinations where their removal causes greater impact than common impact analysis strategies. Furthermore, we design a general pattern for parallel network analysis on multi-core architectures that considers graph's embedded properties. It is a divide and conquer approach that decomposes a graph into smaller subgraphs based on its strongly connected components and computes the all pairs shortest paths concurrently on GPU. Furthermore, we use linear algebra to design an APSP approach based on the BFS algorithm. We use algebraic expressions to transform the problem of path computation to multiple independent matrix-vector multiplications that are executed concurrently on FPGA. Finally, we analyse how the optimised solutions of perturbation analysis and parallel graph processing provided in this thesis will impact the drug discovery process.This research was part of a Knowledge Transfer Partnership (KTP) and Knowledge Transfer Secondment (KTS) between e-therapeutics PLC and Newcastle University. It was supported as a collaborative project by e-therapeutics PLC and Technology Strategy boar

    Privaatsust säilitavad paralleelarvutused graafiülesannete jaoks

    Get PDF
    Turvalisel mitmeosalisel arvutusel põhinevate reaalsete privaatsusrakenduste loomine on SMC-protokolli arvutusosaliste ümmarguse keerukuse tõttu keeruline. Privaatsust säilitavate tehnoloogiate uudsuse ja nende probleemidega kaasnevate suurte arvutuskulude tõttu ei ole paralleelseid privaatsust säilitavaid graafikualgoritme veel uuritud. Graafikalgoritmid on paljude arvutiteaduse rakenduste selgroog, nagu navigatsioonisüsteemid, kogukonna tuvastamine, tarneahela võrk, hüperspektraalne kujutis ja hõredad lineaarsed lahendajad. Graafikalgoritmide suurte privaatsete andmekogumite töötlemise kiirendamiseks ja kõrgetasemeliste arvutusnõuete täitmiseks on vaja privaatsust säilitavaid paralleelseid algoritme. Seetõttu esitleb käesolev lõputöö tipptasemel protokolle privaatsuse säilitamise paralleelarvutustes erinevate graafikuprobleemide jaoks, ühe allika lühima tee, kõigi paaride lühima tee, minimaalse ulatuva puu ja metsa ning algebralise tee arvutamise. Need uued protokollid on üles ehitatud kombinatoorsete ja algebraliste graafikualgoritmide põhjal lisaks SMC protokollidele. Nende protokollide koostamiseks kasutatakse ka ühe käsuga mitut andmeoperatsiooni, et vooru keerukust tõhusalt vähendada. Oleme väljapakutud protokollid juurutanud Sharemind SMC platvormil, kasutades erinevaid graafikuid ja võrgukeskkondi. Selles lõputöös kirjeldatakse uudseid paralleelprotokolle koos nendega seotud algoritmide, tulemuste, kiirendamise, hindamiste ja ulatusliku võrdlusuuringuga. Privaatsust säilitavate ühe allika lühimate teede ja minimaalse ulatusega puuprotokollide tegelike juurutuste tulemused näitavad tõhusat meetodit, mis vähendas tööaega võrreldes varasemate töödega sadu kordi. Lisaks ei ole privaatsust säilitavate kõigi paaride lühima tee protokollide hindamine ja ulatuslik võrdlusuuringud sarnased ühegi varasema tööga. Lisaks pole kunagi varem käsitletud privaatsust säilitavaid metsa ja algebralise tee arvutamise protokolle.Constructing real-world privacy applications based on secure multiparty computation is challenging due to the round complexity of the computation parties of SMC protocol. Due to the novelty of privacy-preserving technologies and the high computational costs associated with these problems, parallel privacy-preserving graph algorithms have not yet been studied. Graph algorithms are the backbone of many applications in computer science, such as navigation systems, community detection, supply chain network, hyperspectral image, and sparse linear solvers. In order to expedite the processing of large private data sets for graphs algorithms and meet high-end computational demands, privacy-preserving parallel algorithms are needed. Therefore, this Thesis presents the state-of-the-art protocols in privacy-preserving parallel computations for different graphs problems, single-source shortest path (SSSP), All-pairs shortest path (APSP), minimum spanning tree (MST) and forest (MSF), and algebraic path computation. These new protocols have been constructed based on combinatorial and algebraic graph algorithms on top of the SMC protocols. Single-instruction-multiple-data (SIMD) operations are also used to build those protocols to reduce the round complexities efficiently. We have implemented the proposed protocols on the Sharemind SMC platform using various graphs and network environments. This Thesis outlines novel parallel protocols with their related algorithms, the results, speed-up, evaluations, and extensive benchmarking. The results of the real implementations of the privacy-preserving single-source shortest paths and minimum spanning tree protocols show an efficient method that reduced the running time hundreds of times compared with previous works. Furthermore, the evaluation and extensive benchmarking of privacy-preserving All-pairs shortest path protocols are not similar to any previous work. Moreover, the privacy-preserving minimum spanning forest and algebraic path computation protocols have never been addressed before.https://www.ester.ee/record=b555865

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    Big networks : a survey

    Get PDF
    A network is a typical expressive form of representing complex systems in terms of vertices and links, in which the pattern of interactions amongst components of the network is intricate. The network can be static that does not change over time or dynamic that evolves through time. The complication of network analysis is different under the new circumstance of network size explosive increasing. In this paper, we introduce a new network science concept called a big network. A big networks is generally in large-scale with a complicated and higher-order inner structure. This paper proposes a guideline framework that gives an insight into the major topics in the area of network science from the viewpoint of a big network. We first introduce the structural characteristics of big networks from three levels, which are micro-level, meso-level, and macro-level. We then discuss some state-of-the-art advanced topics of big network analysis. Big network models and related approaches, including ranking methods, partition approaches, as well as network embedding algorithms are systematically introduced. Some typical applications in big networks are then reviewed, such as community detection, link prediction, recommendation, etc. Moreover, we also pinpoint some critical open issues that need to be investigated further. © 2020 Elsevier Inc

    Complex networks and data mining: toward a new perspective for the understanding of complex systems

    Get PDF
    Complex systems, i.e. systems composed of a large set of elements interacting in a non-linear way, are constantly found all around us. In the last decades, different approaches have been proposed toward their understanding, one of the most interesting being the Complex Network perspective. This legacy of the 18th century mathematical concepts proposed by Leonhard Euler is still current, and more and more relevant in real-world problems. In recent years, it has been demonstrated that network-based representations can yield relevant knowledge about complex systems. In spite of that, several problems have been detected, mainly related to the degree of subjectivity involved in the creation and evaluation of such network structures. In this Thesis, we propose addressing these problems by means of different data mining techniques, thus obtaining a novel hybrid approximation intermingling complex networks and data mining. Results indicate that such techniques can be effectively used to i) enable the creation of novel network representations, ii) reduce the dimensionality of analyzed systems by pre-selecting the most important elements, iii) describe complex networks, and iv) assist in the analysis of different network topologies. The soundness of such approach is validated through different validation cases drawn from actual biomedical problems, e.g. the diagnosis of cancer from tissue analysis, or the study of the dynamics of the brain under different neurological disorders

    Subcellular Protein Localization in Fluorescense Images Using Convolutional Neural Networks

    Get PDF
    Gene expression is manifested through the synthesis of proteins within the cell. The Cell Atlas, within the Human Protein Atlas project, provides immunofluorescence microscopy images of the cell structures aligned with images showing the stained protein of interest. The images reveal the localization patterns of the majority of proteins found in human cells. These patterns in turn can be used when studying the cellular functions related to gene expression and mutations. In the advent of deep learning methods applied to computer vision problems, machine learning algorithms can be used to categorize the localization patterns into subcellular structures. In this thesis, two types of neural network algorithms were applied into the classification of the Cell Atlas samples from the dataset used in 33rd Congress of the International Society for Advancement of Cytometry imaging challenge, where the task was to do multi-class multi-label classification of the images into 13 subcellular structures. The algorithms tested, namely, were Convolutional Neural Networks (CNN) and Fully Convolutional Networks (FCN). Model performance was evaluated with class-wise F1 score. The results were promising, with CNN and FCN implementations yielding weighted averages of class-wise F1 scores of 0.822 and 0.810 respectively. Another interesting remark is that the FCN, which outputs probability maps showing where in the image the certain class is present instead of a single probability for the whole image, learns significantly faster than the CNN, suggesting that it efficiently utilizes the spatial information in the training samples. FCN also provides more information in its outputs compared to CNN, which loses the spatial information in its outputs. Considering the relatively small size of the dataset (20 000 samples, divided to 16 000 training samples and 4 000 testing samples), and the fact that the data is heavily imbalanced, the results are promising. More complex deep learning architectures can take advantage of millions of images, so in the future research the size of labeled dataset should be increased. Also, the quality of the labels can be questioned as they are derived from consensus between individuals without professional training

    Exploring Multi-Level Parallelism For Graph-Based Applications Via Algorithm And System Co-Design

    Get PDF
    Graph processing is at the heart of many modern applications where graphs are used as the basic data structure to represent the entities of interest and the relationships between them. Improving the performance of graph-based applications, especially using parallelism techniques, has drawn significant interest both in academia and industry. On the one hand, modern CPU architectures are able to provide massive computational power by using sophisticated memory hierarchy and multi-level parallelism, including thread-level parallelism, data-level parallelism, etc. On the other hand, graph processing workloads are notoriously challenging for achieving high performance due to their irregular computation pattern and unpredictable control flow. Therefore, how to accelerate the performance of graph-based applications using parallelism is still an open question. This dissertation focuses on providing high performance for graph-based applications. To take full advantage of multi-level parallelism resources provided by CPUs, this dissertation studies the characteristics of graph-based applications and matches their parallel solutions with the underlying hardware via algorithm and system co-design. This dissertation divides graph-based applications into three categories: typical graph algorithms, sequential graph-based applications, and applications with graph-based solutions. The first category comprises typical graph algorithms with available parallel solutions. This dissertation proposes GraphPhi as a new approach to graph processing on emerging Intel Xeon Phi-like architectures. The second category includes specialized graph applications without nontrivial parallel solutions. This dissertation studies a state-of-the-art 2-hop labeling approach named Pruned Landmark Labeling (PLL). This dissertation proposes Batched Vertex-Centric PLL (BVC-PLL), which breaks PLL\u27s inherent dependencies and parallelizes it in a scalable way. The third category includes applications that rely on graph-based solutions. This dissertation studies the sequential search algorithm for the graph-based indexing methods used for the Approximate Nearest Neighbor Search (ANNS) problem. This dissertation proposes Speed-ANN, a parallel similarity search algorithm that reveals hidden intra-query parallelism to accelerate the search speed while fulfilling the high accuracy requirement. Moreover, this dissertation further explores the optimization opportunities for computational graph-based deep neural network inference running on tiny devices, specifically microcontrollers (MCUs). Altogether, this dissertation studies graph-based applications and improves their performance by providing solutions of multi-level parallelism via algorithm and system co-design to match them with the underlying multi-core CPU architectures
    corecore