5 research outputs found

    A Comparative Study Of Large-Scale Network Data Visualization Tools

    Get PDF
    One of the most important parts of Data Analysis is Data Visualization [15]. The easy thing about Data Visualization is that there are hundreds of ways to do it, one better than the other. Ironically, however, it is difficult to choose the right tool for the job. This can be a concern because it is really important to know which tool is best depending on the resources we have. This thesis tries to answer that question – to an extent. In this thesis, I have tried to compare three Data Visualization tools: Gephi, Pajek and NodeXL. I have mainly discussed what each tool can do, what each tool is best at, and when to and when not to use each tool. Therefore, using the right tool can not only save us a lot of time by making the task easy and get the work done using a minimal number of resources, but also help to get the best results. The comparison is based on what Visualization features each tool has, how each tool computes different graph features, and how Compatible and Scalable each tool is. In the process, I used different Network datasets and tried to calculate certain features of the graph and wrote the findings. The end report discusses which tool can be best to use given the size of dataset, the problem we are trying to solve, the resources we have and the time we can spend

    Using High-Performance Computing Profilers to Understand the Performance of Graph Algorithms

    Get PDF
    An algorithm designer working with parallel computing systems should know how the characteristics of their implemented algorithm affects various performance aspects of their parallel program. It would be beneficial to these designers if each algorithm came with a specific set of standards that identified which algorithms worked better for a specified system. Therefore, the goal of this paper is to take implementations of four graphing algorithms, extract their features such as memory consumption, scalability using profilers (Vtunes /Tau) to determine which algorithms work to their fullest potential in one of the three systems: GPU, shared memory system, or distributed memory system. The features extracted in this study were scalability, speedup, and parallel efficiency. We find that when looking at various parallel algorithms: Community Detection, Communities through Directed Affiliations (Coda), BigClam, and Breadth First Search all achieved noticeable speedup with increasing number of cores

    Analysis of Protein-Protein Interaction Networks Using High Performance Scalable Tools

    Get PDF
    Protein-Protein Interaction (PPI) Research currently generates an extraordinary amount of publications and interest in fellow computer scientists and biologists alike because of the underlying potential of the source material that researchers can work with. PPI networks are the networks of protein complexes formed by biochemical events or electrostatic forces serving a biological function [1]. Since the analysis of the protein networks is now growing, we have more information regarding protein, genomes and their influence on life. Today, PPI networks are used to study diseases, improve drugs and understand other processes in medicine and health that will eventually help mankind. Though PPI network research is considered extremely important in the field, there is an issue – we do not have enough people who have enough interdisciplinary knowledge in both the fields of biology and computer science; this limits our rate of progress in the field. Most biologists that are not expert coders need a way of calculating graph values and information that will help them analyze the graphs better without having to manipulate the data themselves. In this research, I test a few ways of achieving results through the use of available frameworks and algorithms, present the results and compare each method’s efficacy. My analysis takes place on very large datasets where I calculate several centralities and other data from the graph using different metrics, and I also visualize them in order to gain further insight. I also managed to note the significance of MPI and multithreading on the results thus obtained that suggest building scalable tools will help improve the analysis immensely

    Scalable Community Detection using Distributed Louvain Algorithm

    Get PDF
    Community detection (or clustering) in large-scale graph is an important problem in graph mining. Communities reveal interesting characteristics of a network. Louvain is an efficient sequential algorithm but fails to scale emerging large-scale data. Developing distributed-memory parallel algorithms is challenging because of inter-process communication and load-balancing issues. In this work, we design a shared memory-based algorithm using OpenMP, which shows a 4-fold speedup but is limited to available physical cores. Our second algorithm is an MPI-based parallel algorithm that scales to a moderate number of processors. We also implement a hybrid algorithm combining both. Finally, we incorporate dynamic load-balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms, shows around 12-fold speedup scaling to a larger number of processors. Overall, we present the challenges, our solutions, and the empirical performance of our algorithms for several large real-world networks
    corecore