103 research outputs found

    High Performance Large Graph Analytics by Enhancing Locality

    Get PDF
    Graphs are widely used in a variety of domains for representing entities and their relationship to each other. Graph analytics helps to understand, detect, extract and visualize insightful relationships between different entities. Graph analytics has a wide range of applications in various domains including computational biology, commerce, intelligence, health care and transportation. The breadth of problems that require large graph analytics is growing rapidly resulting in a need for fast and efficient graph processing. One of the major challenges in graph processing is poor locality of reference. Locality of reference refers to the phenomenon of frequently accessing the same memory location or adjacent memory locations. Applications with poor data locality reduce the effectiveness of the cache memory. They result in large number of cache misses, requiring access to high latency main memory. Therefore, it is essential to have good locality for good performance. Most graph processing applications have highly random memory access patterns. Coupled with the current large sizes of the graphs, they result in poor cache utilization. Additionally, the computation to data access ratio in many graph processing applications is very low, making it difficult to cover the memory latency using computation. It is also challenging to efficiently parallelize most graph applications. Many graphs in real world have unbalanced degree distribution. It is difficult to achieve a balanced workload for such graphs. The parallelism in graph applications is generally fine-grained in nature. This calls for efficient synchronization and communication between the processing units. Techniques for enhancing locality have been well studied in the context of regular applications like linear algebra. Those techniques are in most cases not applicable to the graph problems. In this dissertation, we propose two techniques for enhancing locality in graph algorithms: access transformation and task-set reduction. Access transformation can be applied to algorithms to improve the spatial locality by changing the random access pattern to sequential access. It is applicable to iterative algorithms that process random vertices/edges in each iteration. The task-set reduction technique can be applied to enhance the temporal locality. It is applicable to algorithms which repeatedly access the same data to perform certain task. Using the two techniques, we propose novel algorithms for three graph problems: k-core decomposition, maximal clique enumeration and triangle listing. We have implemented the algorithms. The results show that these algorithms provide significant improvement in performance and also scale well

    Maximum Common Subgraph Isomorphism Algorithms

    Get PDF
    Maximum common subgraph (MCS) isomorphism algorithms play an important role in chemoinformatics by providing an effective mechanism for the alignment of pairs of chemical structures. This article discusses the various types of MCS that can be identified when two graphs are compared and reviews some of the algorithms that are available for this purpose, focusing on those that are, or may be, applicable to the matching of chemical graphs

    Multipartite Graph Algorithms for the Analysis of Heterogeneous Data

    Get PDF
    The explosive growth in the rate of data generation in recent years threatens to outpace the growth in computer power, motivating the need for new, scalable algorithms and big data analytic techniques. No field may be more emblematic of this data deluge than the life sciences, where technologies such as high-throughput mRNA arrays and next generation genome sequencing are routinely used to generate datasets of extreme scale. Data from experiments in genomics, transcriptomics, metabolomics and proteomics are continuously being added to existing repositories. A goal of exploratory analysis of such omics data is to illuminate the functions and relationships of biomolecules within an organism. This dissertation describes the design, implementation and application of graph algorithms, with the goal of seeking dense structure in data derived from omics experiments in order to detect latent associations between often heterogeneous entities, such as genes, diseases and phenotypes. Exact combinatorial solutions are developed and implemented, rather than relying on approximations or heuristics, even when problems are exceedingly large and/or difficult. Datasets on which the algorithms are applied include time series transcriptomic data from an experiment on the developing mouse cerebellum, gene expression data measuring acute ethanol response in the prefrontal cortex, and the analysis of a predicted protein-protein interaction network. A bipartite graph model is used to integrate heterogeneous data types, such as genes with phenotypes and microbes with mouse strains. The techniques are then extended to a multipartite algorithm to enumerate dense substructure in multipartite graphs, constructed using data from three or more heterogeneous sources, with applications to functional genomics. Several new theoretical results are given regarding multipartite graphs and the multipartite enumeration algorithm. In all cases, practical implementations are demonstrated to expand the frontier of computational feasibility

    Combinatorial algorithms for the seriation problem

    Get PDF
    In this thesis we study the seriation problem, a combinatorial problem arising in data analysis, which asks to sequence a set of objects in such a way that similar objects are ordered close to each other. We focus on the combinatorial structure and properties of Robinsonian matrices, a special class of structured matrices which best achieve the seriation goal. Our contribution is both theoretical and practical, with a particular emphasis on algorithms. In Chapter 2 we introduce basic concepts about graphs, permutations and proximity matrices used throughout the thesis. In Chapter 3 we present Robinsonian matrices, discussing their characterizations and recognition algorithms existing in the literature. In Chapter 4 we discuss Lexicographic Breadth-First search (Lex-BFS), a special graph traversal algorithm used in multisweep algorithms for the recognition of several classes of graphs. In Chapter 5 we introduce a new Lex-BFS based algorithm to recognize Robinsonian matrices, which is derived from a new characterization of Robinsonian matrices in terms of straight enumerations of unit interval graphs. In Chapter 6 we introduce the novel Similarity-First Search algorithm (SFS), a weighted version of Lex-BFS which we use in a multisweep algorithm for the recognition of Robinsonian matrices. In Chapter 7 we model the seriation problem as an instance of Quadratic Assignment Problem (QAP) and we show that if the data has a Robinsonian structure, then one can find an optimal solution for QAP using a Robinsonian recognition algorithm. In Chapter 8 we discuss how to solve the seriation problem when the data does not have a Robinsonian structure, by finding a Robinsonian approximation of the original data. Finally, in Chapter 9 we discuss some experiments which we have carried out in order to compare the performance of the algorithms introduced in the thesis

    Combinatorial algorithms for the seriation problem

    Get PDF

    Efficient kk-Clique Listing: An Edge-Oriented Branching Strategy

    Full text link
    kk-clique listing is a vital graph mining operator with diverse applications in various networks. The state-of-the-art algorithms all adopt a branch-and-bound (BB) framework with a vertex-oriented branching strategy (called VBBkC), which forms a sub-branch by expanding a partial kk-clique with a vertex. These algorithms have the time complexity of O(km(δ/2)k−2)O(k m (\delta/2)^{k-2}), where mm is the number of edges in the graph and δ\delta is the degeneracy of the graph. In this paper, we propose a BB framework with a new edge-oriented branching (called EBBkC), which forms a sub-branch by expanding a partial kk-clique with two vertices that connect each other (which correspond to an edge). We explore various edge orderings for EBBkC such that it achieves a time complexity of O(δm+km(τ/2)k−2)O(\delta m + k m (\tau/2)^{k-2}), where τ\tau is an integer related to the maximum truss number of the graph and we have τ<δ\tau < \delta. The time complexity of EBBkC is better than that of VBBkC algorithms for k>3k>3 since both O(δm)O(\delta m) and O(km(τ/2)k−2)O(k m (\tau/2)^{k-2}) are bounded by O(km(δ/2)k−2)O(k m (\delta/2)^{k-2}). Furthermore, we develop specialized algorithms for sub-branches on dense graphs so that we can early-terminate them and apply the specialized algorithms. We conduct extensive experiments on 19 real graphs, and the results show that our newly developed EBBkC-based algorithms with the early termination technique consistently and largely outperform the state-of-the-art (VBBkC-based) algorithms.Comment: This paper has been accepted by SIGMOD 202
    • …