1,282 research outputs found
On Large-Scale Graph Generation with Validation of Diverse Triangle Statistics at Edges and Vertices
Researchers developing implementations of distributed graph analytic
algorithms require graph generators that yield graphs sharing the challenging
characteristics of real-world graphs (small-world, scale-free, heavy-tailed
degree distribution) with efficiently calculable ground-truth solutions to the
desired output. Reproducibility for current generators used in benchmarking are
somewhat lacking in this respect due to their randomness: the output of a
desired graph analytic can only be compared to expected values and not exact
ground truth. Nonstochastic Kronecker product graphs meet these design criteria
for several graph analytics. Here we show that many flavors of triangle
participation can be cheaply calculated while generating a Kronecker product
graph. Given two medium-sized scale-free graphs with adjacency matrices and
, their Kronecker product graph has adjacency matrix . Such
graphs are highly compressible: edges are represented in memory and can be built in a distributed setting from
small data structures, making them easy to share in compressed form. Many
interesting graph calculations have worst-case complexity bounds and often these are reduced to
for Kronecker product graphs, when a Kronecker formula can be derived yielding
the sought calculation on in terms of related calculations on and .
We focus on deriving formulas for triangle participation at vertices, , a vector storing the number of triangles that every vertex is involved
in, and triangle participation at edges, , a sparse matrix storing
the number of triangles at every edge.Comment: 10 pages, 7 figures, IEEE IPDPS Graph Algorithms Building Block
SubGraph2Vec: Highly-Vectorized Tree-likeSubgraph Counting
Subgraph counting aims to count occurrences of a template T in a given
network G(V, E). It is a powerful graph analysis tool and has found real-world
applications in diverse domains. Scaling subgraph counting problems is known to
be memory bounded and computationally challenging with exponential complexity.
Although scalable parallel algorithms are known for several graph problems such
as Triangle Counting and PageRank, this is not common for counting complex
subgraphs. Here we address this challenge and study connected acyclic graphs or
trees. We propose a novel vectorized subgraph counting algorithm, named
Subgraph2Vec, as well as both shared memory and distributed implementations: 1)
reducing algorithmic complexity by minimizing neighbor traversal; 2) achieving
a highly-vectorized implementation upon linear algebra kernels to significantly
improve performance and hardware utilization. 3) Subgraph2Vec improves the
overall performance over the state-of-the-art work by orders of magnitude and
up to 660x on a single node. 4) Subgraph2Vec in distributed mode can scale up
the template size to 20 and maintain good strong scalability. 5) enabling
portability to both CPU and GPU.Comment: arXiv admin note: text overlap with arXiv:1903.0439
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra
We propose GraphMineSuite (GMS): the first benchmarking suite for graph
mining that facilitates evaluating and constructing high-performance graph
mining algorithms. First, GMS comes with a benchmark specification based on
extensive literature review, prescribing representative problems, algorithms,
and datasets. Second, GMS offers a carefully designed software platform for
seamless testing of different fine-grained elements of graph mining algorithms,
such as graph representations or algorithm subroutines. The platform includes
parallel implementations of more than 40 considered baselines, and it
facilitates developing complex and fast mining algorithms. High modularity is
possible by harnessing set algebra operations such as set intersection and
difference, which enables breaking complex graph mining algorithms into simple
building blocks that can be separately experimented with. GMS is supported with
a broad concurrency analysis for portability in performance insights, and a
novel performance metric to assess the throughput of graph mining algorithms,
enabling more insightful evaluation. As use cases, we harness GMS to rapidly
redesign and accelerate state-of-the-art baselines of core graph mining
problems: degeneracy reordering (by up to >2x), maximal clique listing (by up
to >9x), k-clique listing (by 1.1x), and subgraph isomorphism (by up to 2.5x),
also obtaining better theoretical performance bounds
Intelligent Malware Detection Using File-to-file Relations and Enhancing its Security against Adversarial Attacks
With computing devices and the Internet being indispensable in people\u27s everyday life, malware has posed serious threats to their security, making its detection of utmost concern. To protect legitimate users from the evolving malware attacks, machine learning-based systems have been successfully deployed and offer unparalleled flexibility in automatic malware detection. In most of these systems, resting on the analysis of different content-based features either statically or dynamically extracted from the file samples, various kinds of classifiers are constructed to detect malware. However, besides content-based features, file-to-file relations, such as file co-existence, can provide valuable information in malware detection and make evasion harder. To better understand the properties of file-to-file relations, we construct the file co-existence graph. Resting on the constructed graph, we investigate the semantic relatedness among files, and leverage graph inference, active learning and graph representation learning for malware detection. Comprehensive experimental results on the real sample collections from Comodo Cloud Security Center demonstrate the effectiveness of our proposed learning paradigms.
As machine learning-based detection systems become more widely deployed, the incentive for defeating them increases. Therefore, we go further insight into the arms race between adversarial malware attack and defense, and aim to enhance the security of machine learning-based malware detection systems. In particular, we first explore the adversarial attacks under different scenarios (i.e., different levels of knowledge the attackers might have about the targeted learning system), and define a general attack strategy to thoroughly assess the adversarial behaviors. Then, considering different skills and capabilities of the attackers, we propose the corresponding secure-learning paradigms to counter the adversarial attacks and enhance the security of the learning systems while not compromising the detection accuracy. We conduct a series of comprehensive experimental studies based on the real sample collections from Comodo Cloud Security Center and the promising results demonstrate the effectiveness of our proposed secure-learning models, which can be readily applied to other detection tasks
APPLICATIONS OF GRAPH THEORY FOR REUSE OF MODEL BASED SYSTEMS ENGINEERING DESIGN DATA
This dissertation contributes to systems engineering (SE) by introducing and demonstrating a novel graph-based design repository (GBDR) tool. GBDR enables engineers to leverage system design information from a heterogenous set of system models created using multiple model based systems engineering (MBSE) software tools as an integrated body of knowledge. Specifically, the research provides a set of approaches that allow the use of system models described in Systems Modeling Language and Lifecycle Modeling Language as an integrated body of design information. The coalesced body of system design information serves to support concept ideation and analysis within SE. The research accomplishes this by using a graph database to store system model information imported from digital artifacts created by MBSE tools and applying principles from graph theory and semantic web technologies to identify likely connections and equivalent concepts across system models, modeling languages, and metamodels. The research demonstrates that the presented tool can import, store, synthesize, search, display, distribute, and export information from multiple MBSE tools. As a practical demonstration, feasible subsystem design alternatives for a small unmanned aircraft system government reference architecture are identified from within a set of existing system models.OSD CAPECivilian, Office of the Secretary of DefenseApproved for public release. Distribution is unlimited
- …