1,282 research outputs found

    On Large-Scale Graph Generation with Validation of Diverse Triangle Statistics at Edges and Vertices

    Full text link
    Researchers developing implementations of distributed graph analytic algorithms require graph generators that yield graphs sharing the challenging characteristics of real-world graphs (small-world, scale-free, heavy-tailed degree distribution) with efficiently calculable ground-truth solutions to the desired output. Reproducibility for current generators used in benchmarking are somewhat lacking in this respect due to their randomness: the output of a desired graph analytic can only be compared to expected values and not exact ground truth. Nonstochastic Kronecker product graphs meet these design criteria for several graph analytics. Here we show that many flavors of triangle participation can be cheaply calculated while generating a Kronecker product graph. Given two medium-sized scale-free graphs with adjacency matrices AA and BB, their Kronecker product graph has adjacency matrix C=A⊗BC = A \otimes B. Such graphs are highly compressible: ∣E∣|{\cal E}| edges are represented in O(∣E∣1/2){\cal O}(|{\cal E}|^{1/2}) memory and can be built in a distributed setting from small data structures, making them easy to share in compressed form. Many interesting graph calculations have worst-case complexity bounds O(∣E∣p){\cal O}(|{\cal E}|^p) and often these are reduced to O(∣E∣p/2){\cal O}(|{\cal E}|^{p/2}) for Kronecker product graphs, when a Kronecker formula can be derived yielding the sought calculation on CC in terms of related calculations on AA and BB. We focus on deriving formulas for triangle participation at vertices, tC{\bf t}_C, a vector storing the number of triangles that every vertex is involved in, and triangle participation at edges, ΔC\Delta_C, a sparse matrix storing the number of triangles at every edge.Comment: 10 pages, 7 figures, IEEE IPDPS Graph Algorithms Building Block

    SubGraph2Vec: Highly-Vectorized Tree-likeSubgraph Counting

    Full text link
    Subgraph counting aims to count occurrences of a template T in a given network G(V, E). It is a powerful graph analysis tool and has found real-world applications in diverse domains. Scaling subgraph counting problems is known to be memory bounded and computationally challenging with exponential complexity. Although scalable parallel algorithms are known for several graph problems such as Triangle Counting and PageRank, this is not common for counting complex subgraphs. Here we address this challenge and study connected acyclic graphs or trees. We propose a novel vectorized subgraph counting algorithm, named Subgraph2Vec, as well as both shared memory and distributed implementations: 1) reducing algorithmic complexity by minimizing neighbor traversal; 2) achieving a highly-vectorized implementation upon linear algebra kernels to significantly improve performance and hardware utilization. 3) Subgraph2Vec improves the overall performance over the state-of-the-art work by orders of magnitude and up to 660x on a single node. 4) Subgraph2Vec in distributed mode can scale up the template size to 20 and maintain good strong scalability. 5) enabling portability to both CPU and GPU.Comment: arXiv admin note: text overlap with arXiv:1903.0439

    GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra

    Full text link
    We propose GraphMineSuite (GMS): the first benchmarking suite for graph mining that facilitates evaluating and constructing high-performance graph mining algorithms. First, GMS comes with a benchmark specification based on extensive literature review, prescribing representative problems, algorithms, and datasets. Second, GMS offers a carefully designed software platform for seamless testing of different fine-grained elements of graph mining algorithms, such as graph representations or algorithm subroutines. The platform includes parallel implementations of more than 40 considered baselines, and it facilitates developing complex and fast mining algorithms. High modularity is possible by harnessing set algebra operations such as set intersection and difference, which enables breaking complex graph mining algorithms into simple building blocks that can be separately experimented with. GMS is supported with a broad concurrency analysis for portability in performance insights, and a novel performance metric to assess the throughput of graph mining algorithms, enabling more insightful evaluation. As use cases, we harness GMS to rapidly redesign and accelerate state-of-the-art baselines of core graph mining problems: degeneracy reordering (by up to >2x), maximal clique listing (by up to >9x), k-clique listing (by 1.1x), and subgraph isomorphism (by up to 2.5x), also obtaining better theoretical performance bounds

    Intelligent Malware Detection Using File-to-file Relations and Enhancing its Security against Adversarial Attacks

    Get PDF
    With computing devices and the Internet being indispensable in people\u27s everyday life, malware has posed serious threats to their security, making its detection of utmost concern. To protect legitimate users from the evolving malware attacks, machine learning-based systems have been successfully deployed and offer unparalleled flexibility in automatic malware detection. In most of these systems, resting on the analysis of different content-based features either statically or dynamically extracted from the file samples, various kinds of classifiers are constructed to detect malware. However, besides content-based features, file-to-file relations, such as file co-existence, can provide valuable information in malware detection and make evasion harder. To better understand the properties of file-to-file relations, we construct the file co-existence graph. Resting on the constructed graph, we investigate the semantic relatedness among files, and leverage graph inference, active learning and graph representation learning for malware detection. Comprehensive experimental results on the real sample collections from Comodo Cloud Security Center demonstrate the effectiveness of our proposed learning paradigms. As machine learning-based detection systems become more widely deployed, the incentive for defeating them increases. Therefore, we go further insight into the arms race between adversarial malware attack and defense, and aim to enhance the security of machine learning-based malware detection systems. In particular, we first explore the adversarial attacks under different scenarios (i.e., different levels of knowledge the attackers might have about the targeted learning system), and define a general attack strategy to thoroughly assess the adversarial behaviors. Then, considering different skills and capabilities of the attackers, we propose the corresponding secure-learning paradigms to counter the adversarial attacks and enhance the security of the learning systems while not compromising the detection accuracy. We conduct a series of comprehensive experimental studies based on the real sample collections from Comodo Cloud Security Center and the promising results demonstrate the effectiveness of our proposed secure-learning models, which can be readily applied to other detection tasks

    APPLICATIONS OF GRAPH THEORY FOR REUSE OF MODEL BASED SYSTEMS ENGINEERING DESIGN DATA

    Get PDF
    This dissertation contributes to systems engineering (SE) by introducing and demonstrating a novel graph-based design repository (GBDR) tool. GBDR enables engineers to leverage system design information from a heterogenous set of system models created using multiple model based systems engineering (MBSE) software tools as an integrated body of knowledge. Specifically, the research provides a set of approaches that allow the use of system models described in Systems Modeling Language and Lifecycle Modeling Language as an integrated body of design information. The coalesced body of system design information serves to support concept ideation and analysis within SE. The research accomplishes this by using a graph database to store system model information imported from digital artifacts created by MBSE tools and applying principles from graph theory and semantic web technologies to identify likely connections and equivalent concepts across system models, modeling languages, and metamodels. The research demonstrates that the presented tool can import, store, synthesize, search, display, distribute, and export information from multiple MBSE tools. As a practical demonstration, feasible subsystem design alternatives for a small unmanned aircraft system government reference architecture are identified from within a set of existing system models.OSD CAPECivilian, Office of the Secretary of DefenseApproved for public release. Distribution is unlimited
    • …
    corecore