Search CORE

260,959 research outputs found

Graph-based Analysis of Dynamic Systems

Author: Schiller Benjamin
Publication venue
Publication date: 15/12/2016
Field of study

The analysis of dynamic systems provides insights into their time-dependent characteristics. This enables us to monitor, evaluate, and improve systems from various areas. They are often represented as graphs that model the system's components and their relations. The analysis of the resulting dynamic graphs yields great insights into the system's underlying structure, its characteristics, as well as properties of single components. The interpretation of these results can help us understand how a system works and how parameters influence its performance. This knowledge supports the design of new systems and the improvement of existing ones. The main issue in this scenario is the performance of analyzing the dynamic graph to obtain relevant properties. While various approaches have been developed to analyze dynamic graphs, it is not always clear which one performs best for the analysis of a specific graph. The runtime also depends on many other factors, including the size and topology of the graph, the frequency of changes, and the data structures used to represent the graph in memory. While the benefits and drawbacks of many data structures are well-known, their runtime is hard to predict when used for the representation of dynamic graphs. Hence, tools are required to benchmark and compare different algorithms for the computation of graph properties and data structures for the representation of dynamic graphs in memory. Based on deeper insights into their performance, new algorithms can be developed and efficient data structures can be selected. In this thesis, we present four contributions to tackle these problems: A benchmarking framework for dynamic graph analysis, novel algorithms for the efficient analysis of dynamic graphs, an approach for the parallelization of dynamic graph analysis, and a novel paradigm to select and adapt graph data structures. In addition, we present three use cases from the areas of social, computer, and biological networks to illustrate the great insights provided by their graph-based analysis. We present a new benchmarking framework for the analysis of dynamic graphs, the Dynamic Network Analyzer (DNA). It provides tools to benchmark and compare different algorithms for the analysis of dynamic graphs as well as the data structures used to represent them in memory. DNA supports the development of new algorithms and the automatic verification of their results. Its visualization component provides different ways to represent dynamic graphs and the results of their analysis. We introduce three new stream-based algorithms for the analysis of dynamic graphs. We evaluate their performance on synthetic as well as real-world dynamic graphs and compare their runtimes to snapshot-based algorithms. Our results show great performance gains for all three algorithms. The new stream-based algorithm StreaM_k, which counts the frequencies of k-vertex motifs, achieves speedups up to 19,043 x for synthetic and 2882 x for real-world datasets. We present a novel approach for the distributed processing of dynamic graphs, called parallel Dynamic Graph Analysis (pDNA). To analyze a dynamic graph, the work is distributed by a partitioner that creates subgraphs and assigns them to workers. They compute the properties of their respective subgraph using standard algorithms. Their results are used by the collator component to merge them to the properties of the original graph. We evaluate the performance of pDNA for the computation of five graph properties on two real-world dynamic graphs with up to 32 workers. Our approach achieves great speedups, especially for the analysis of complex graph measures. We introduce two novel approaches for the selection of efficient graph data structures. The compile-time approach estimates the workload of an analysis after an initial profiling phase and recommends efficient data structures based on benchmarking results. It achieves speedups of up to 5.4 x over baseline data structure configurations for the analysis of real-word dynamic graphs. The run-time approach monitors the workload during analysis and exchanges the graph representation if it finds a configuration that promises to be more efficient for the current workload. Compared to baseline configurations, it achieves speedups up to 7.3 x for the analysis of a synthetic workload. Our contributions provide novel approaches for the efficient analysis of dynamic graphs and tools to further investigate the trade-offs between different factors that influence the performance.:1 Introduction 2 Notation and Terminology 3 Related Work 4 DNA - Dynamic Network Analyzer 5 Algorithms 6 Parallel Dynamic Network Analysis 7 Selection of Efficient Graph Data Structures 8 Use Cases 9 Conclusion A DNA - Dynamic Network Analyzer B Algorithms C Selection of Efficient Graph Data Structures D Parallel Dynamic Network Analysis E Graph-based Intrusion Detection System F Molecular Dynamic

Qucosa

Technische Universität Dresden: Qucosa

Graph-based Analysis of Dynamic Systems

Author: Schiller Benjamin
Publication venue
Publication date: 15/12/2016
Field of study

Technische Universität Dresden: Qucosa

Succinct dynamic de Bruijn graphs

Author: Alipanahi Bahar
Boucher Christina
Kuhnle Alan
Puglisi Simon J.
Salmela Leena
Publication venue
Publication date: 15/07/2021
Field of study

Motivation: The de Bruijn graph is one of the fundamental data structures for analysis of high throughput sequencing data. In order to be applicable to population-scale studies, it is essential to build and store the graph in a space- and time-efficient manner. In addition, due to the ever-changing nature of population studies, it has become essential to update the graph after construction, e.g. add and remove nodes and edges. Although there has been substantial effort on making the construction and storage of the graph efficient, there is a limited amount of work in building the graph in an efficient and mutable manner. Hence, most space efficient data structures require complete reconstruction of the graph in order to add or remove edges or nodes. Results: In this article, we present DynamicBOSS, a succinct representation of the de Bruijn graph that allows for an unlimited number of additions and deletions of nodes and edges. We compare our method with other competing methods and demonstrate that DynamicBOSS is the only method that supports both addition and deletion and is applicable to very large samples (e.g. greater than 15 billion k-mers). Competing dynamic methods, e.g. FDBG cannot be constructed on large scale datasets, or cannot support both addition and deletion, e.g. BiFrost.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Complex graph stream mining

Author: Pan S
Publication venue
Publication date: 01/01/2015
Field of study

University of Technology Sydney. Faculty of Engineering and Information Technology.Recent years have witnessed a dramatic increase of information due to the ever development of modern technologies. The large scale of information makes data analysis, particularly data mining and knowledge discovery tasks, unprecedentedly challenging. First, data is becoming more and more interconnected. In a variety of domains such as social networks, chemical compounds, and XML documents, data is no longer represented by a flat table with instance-feature format, but exhibits complex structures indicating dependency relationships. Second, data is evolving more and more dynamically. Emerging applications such as social networks continuously generate information over time. Third, the learning tasks in many real-life applications become more and more complicated in that there are various constraints on the number of labelled data, class distributions, misclassification costs, or the number of learning tasks etc. Considering the above challenges, this research aims to investigate theoretical foundations, study new algorithm designs and system frameworks to enable the mining of complex graph streams from three aspects, including (1) Correlated Graph Stream Mining, (2) Graph Stream Classifications, and (3) Complex Task Graph Classification. In particular, correlated graph stream mining intends to carry out structured pattern search and support the query of similar graphs from a graph stream. Due to the dynamic changing nature of the streaming data and the inherent complexity of the graph query process, treating graph streams as static datasets is computationally infeasible or ineffective. Therefore, we proposed a novel algorithm, CGStream, to identify correlated graphs from a data stream, by using a sliding window, which covers a number of consecutive batches of stream data records. Experimental results demonstrate that the proposed algorithm is several times, or even an order of magnitude, more efficient than the straightforward algorithms. Graph stream classification aims to build effective and efficient classification models for graph streams with continuous growing volumes and dynamic changes. We proposed two methods for complex graph stream classification. Due to the inherent complexity of graph structure, labelling graph data is very expensive. To solve this problem, we proposed a gLSU algorithm, which aims to select discriminative subgraph features with minimum redundancy by using both labelled and unlabelled graphs for graph streams. The second approach handles graph streams with imbalanced class distributions and noise. Both frameworks use an instance weighting scheme to capture the underlying concept drifts of graph streams and achieve significant performance gain on benchmark graph streams. Complex task graph classification aims to address the graph classification problems with complex constraints. We studied two complex task graph classification problems, cost-sensitive graph classification of large-scale graphs and multi-task graph classification. As in medical diagnosis the misclassification cost/risk for different classes is inherently different and large scale graph classification is highly demanded in real-life applications, we proposed a CogBoost algorithm for cost-sensitive classification of large scale graphs. To overcome the limitation of insufficient labelled graphs for a specific learning task, we further proposed effective algorithms to leverage multiple graph learning tasks to select subgraph features and regularize multiple tasks to achieve better generalization performance for all learning tasks

OPUS - University of Technology Sydney

Inductive Graph Neural Networks for Spatiotemporal Kriging

Author: Labbe Aurelie
Sun Lijun
Wu Yuankai
Zhuang Dingyi
Publication venue
Publication date: 19/12/2020
Field of study

Time series forecasting and spatiotemporal kriging are the two most important tasks in spatiotemporal data analysis. Recent research on graph neural networks has made substantial progress in time series forecasting, while little attention has been paid to the kriging problem -- recovering signals for unsampled locations/sensors. Most existing scalable kriging methods (e.g., matrix/tensor completion) are transductive, and thus full retraining is required when we have a new sensor to interpolate. In this paper, we develop an Inductive Graph Neural Network Kriging (IGNNK) model to recover data for unsampled sensors on a network/graph structure. To generalize the effect of distance and reachability, we generate random subgraphs as samples and reconstruct the corresponding adjacency matrix for each sample. By reconstructing all signals on each sample subgraph, IGNNK can effectively learn the spatial message passing mechanism. Empirical results on several real-world spatiotemporal datasets demonstrate the effectiveness of our model. In addition, we also find that the learned model can be successfully transferred to the same type of kriging tasks on an unseen dataset. Our results show that: 1) GNN is an efficient and effective tool for spatial kriging; 2) inductive GNNs can be trained using dynamic adjacency matrices; 3) a trained model can be transferred to new graph structures and 4) IGNNK can be used to generate virtual sensors.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Investigation into Indexing XML Data Techniques

Author: Joan Lu
Klaib Alhadi
Publication venue
Publication date: 21/07/2014
Field of study

The rapid development of XML technology improves the WWW, since the XML data has many advantages and has become a common technology for transferring data cross the internet. Therefore, the objective of this research is to investigate and study the XML indexing techniques in terms of their structures. The main goal of this investigation is to identify the main limitations of these techniques and any other open issues. Furthermore, this research considers most common XML indexing techniques and performs a comparison between them. Subsequently, this work makes an argument to find out these limitations. To conclude, the main problem of all the XML indexing techniques is the trade-off between the size and the efficiency of the indexes. So, all the indexes become large in order to perform well, and none of them is suitable for all users’ requirements. However, each one of these techniques has some advantages in somehow

University of Huddersfield Repository

Dynamic Graphs on the GPU

Author: Ashkiani Saman
Awad Muhammad A.
Owens John D.
Porumbescu Serban D.
Publication venue: eScholarship, University of California
Publication date: 01/05/2020
Field of study

We present a fast dynamic graph data structure for the GPU. Our dynamic graph structure uses one hash table per vertex to store adjacency lists and achieves 3.4–14.8x faster insertion rates over the state of the art across a diverse set of large datasets, as well as deletion speedups up to 7.8x. The data structure supports queries and dynamic updates through both edge and vertex insertion and deletion. In addition, we define a comprehensive evaluation strategy based on operations, workloads, and applications that we believe better characterize and evaluate dynamic graph data structures

eScholarship - University of California