Search CORE

362 research outputs found

Efficient Subgraph Matching on Billion Node Graphs

Author: Li Jianzhong
Shao Bin
Sun Zhao
Wang Haixun
Wang Hongzhi
Publication venue
Publication date: 01/01/2012
Field of study

The ability to handle large scale graph data is crucial to an increasing number of applications. Much work has been dedicated to supporting basic graph operations such as subgraph matching, reachability, regular expression matching, etc. In many cases, graph indices are employed to speed up query processing. Typically, most indices require either super-linear indexing time or super-linear indexing space. Unfortunately, for very large graphs, super-linear approaches are almost always infeasible. In this paper, we study the problem of subgraph matching on billion-node graphs. We present a novel algorithm that supports efficient subgraph matching for graphs deployed on a distributed memory store. Instead of relying on super-linear indices, we use efficient graph exploration and massive parallel computing for query processing. Our experimental results demonstrate the feasibility of performing subgraph matching on web-scale graph data.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Building Large k-Cores from Sparse Graphs

Author: Fomin Fedor V.
Sagunov Danil
Simonov Kirill
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020)
Publication date: 01/01/2020
Field of study

A popular model to measure network stability is the k-core, that is the maximal induced subgraph in which every vertex has degree at least k. For example, k-cores are commonly used to model the unraveling phenomena in social networks. In this model, users having less than k connections within the network leave it, so the remaining users form exactly the k-core. In this paper we study the question of whether it is possible to make the network more robust by spending only a limited amount of resources on new connections. A mathematical model for the k-core construction problem is the following Edge k-Core optimization problem. We are given a graph G and integers k, b and p. The task is to ensure that the k-core of G has at least p vertices by adding at most b edges. The previous studies on Edge k-Core demonstrate that the problem is computationally challenging. In particular, it is NP-hard when k = 3, W[1]-hard when parameterized by k+b+p (Chitnis and Talmon, 2018), and APX-hard (Zhou et al, 2019). Nevertheless, we show that there are efficient algorithms with provable guarantee when the k-core has to be constructed from a sparse graph with some additional structural properties. Our results are - When the input graph is a forest, Edge k-Core is solvable in polynomial time; - Edge k-Core is fixed-parameter tractable (FPT) when parameterized by the minimum size of a vertex cover in the input graph. On the other hand, with such parameterization, the problem does not admit a polynomial kernel subject to a widely-believed assumption from complexity theory; - Edge k-Core is FPT parameterized by the treewidth of the graph plus k. This improves upon a result of Chitnis and Talmon by not requiring b to be small. Each of our algorithms is built upon a new graph-theoretical result interesting in its own

arXiv.org e-Print Archive

University of Bergen

Dagstuhl Research Online Publication Server

NORA - Norwegian Open Research Archives

GPU accelerating distributed succinct de Bruijn graph construction

Author: Laanti Topi
Publication venue: Helsingfors universitet
Publication date: 01/01/2022
Field of study

The research and methods in the field of computational biology have grown in the last decades, thanks to the availability of biological data. One of the applications in computational biology is genome sequencing or sequence alignment, a method to arrange sequences of, for example, DNA or RNA, to determine regions of similarity between these sequences. Sequence alignment applications include public health purposes, such as monitoring antimicrobial resistance. Demand for fast sequence alignment has led to the usage of data structures, such as the de Bruijn graph, to store a large amount of information efficiently. De Bruijn graphs are currently one of the top data structures used in indexing genome sequences, and different methods to represent them have been explored. One of these methods is the BOSS data structure, a special case of Wheeler graph index, which uses succinct data structures to represent a de Bruijn graph. As genomes can take a large amount of space, the construction of succinct de Bruijn graphs is slow. This has led to experimental research on using large-scale cluster engines such as Apache Spark and Graphic Processing Units (GPUs) in genome data processing. This thesis explores the use of Apache Spark and Spark RAPIDS, a GPU computing library for Apache Spark, in the construction of a succinct de Bruijn graph index from genome sequences. The experimental results indicate that Spark RAPIDS can provide up to 8 times speedups to specific operations, but for some other operations has severe limitations that limit its processing power in terms of succinct de Bruijn graph index construction

Helsingin yliopiston digitaalinen arkisto

Easier Parallel Programming with Provably-Efficient Runtime Schedulers

Author: Utterback Robert
Publication venue: Washington University Open Scholarship
Publication date: 15/08/2017
Field of study

Over the past decade processor manufacturers have pivoted from increasing uniprocessor performance to multicore architectures. However, utilizing this computational power has proved challenging for software developers. Many concurrency platforms and languages have emerged to address parallel programming challenges, yet writing correct and performant parallel code retains a reputation of being one of the hardest tasks a programmer can undertake. This dissertation will study how runtime scheduling systems can be used to make parallel programming easier. We address the difficulty in writing parallel data structures, automatically finding shared memory bugs, and reproducing non-deterministic synchronization bugs. Each of the systems presented depends on a novel runtime system which provides strong theoretical performance guarantees and performs well in practice

Washington University St. Louis: Open Scholarship

On the Design and Analysis of Parallel and Distributed Algorithms

Author: Chowdhary K R
Purohit Rajendra
Purohit S D
Publication venue
Publication date: 09/11/2023
Field of study

Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide distributed processing using distributed file systems and processing units, while network is modeled as minimum cost spanning tree. On the other hand, the parallel processing chooses different language platforms, data parallel vs. parallel programming, and GPUs. Processing units, memory elements and storage are connected through dynamic distributed networks in the form of spanning trees. The article presents foundational algorithms, analysis, and efficiency considerations.Comment: 9 page

arXiv.org e-Print Archive