Search CORE

6 research outputs found

RedisGraph GraphBLAS Enabled Graph Database

Author: Cailliau Pieter
Davis Tim
Gadepally Vijay
Kepner Jeremy
Lipman Roi
Lovitz Jeffrey
Ouaknine Keren
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

RedisGraph is a Redis module developed by Redis Labs to add graph database functionality to the Redis database. RedisGraph represents connected data as adjacency matrices. By representing the data as sparse matrices and employing the power of GraphBLAS (a highly optimized library for sparse matrix operations), RedisGraph delivers a fast and efficient way to store, manage and process graphs. Initial benchmarks indicate that RedisGraph is significantly faster than comparable graph databases.Comment: Accepted to IEEE IPDPS 2019 GrAPL worksho

arXiv.org e-Print Archive

Crossref

Fast Mapping onto Census Blocks

Author: Arcand William
Bergeron William
Bestor David
Byun Chansup
Engwirda Darren
Gadepally Vijay
Hill Chris
Houle Michael
Hubbell Matthew
Jones Michael
Kepner Jeremy
Kipf Andreas
Kirby Andrew
Klein Anna
Kraska Tim
Michaleas Peter
Milechin Lauren
Mullen Julie
Prout Andrew
Reuther Albert
Rosa Antonio
Samsi Sid
Vembar Navin
Yee Charles
Publication venue
Publication date: 01/08/2020
Field of study

Pandemic measures such as social distancing and contact tracing can be enhanced by rapidly integrating dynamic location data and demographic data. Projecting billions of longitude and latitude locations onto hundreds of thousands of highly irregular demographic census block polygons is computationally challenging in both research and deployment contexts. This paper describes two approaches labeled "simple" and "fast". The simple approach can be implemented in any scripting language (Matlab/Octave, Python, Julia, R) and is easily integrated and customized to a variety of research goals. This simple approach uses a novel combination of hierarchy, sparse bounding boxes, polygon crossing-number, vectorization, and parallel processing to achieve 100,000,000+ projections per second on 100 servers. The simple approach is compact, does not increase data storage requirements, and is applicable to any country or region. The fast approach exploits the thread, vector, and memory optimizations that are possible using a low-level language (C++) and achieves similar performance on a single server. This paper details these approaches with the goal of enabling the broader community to quickly integrate location and demographic data.Comment: 8 pages, 7 figures, 55 references; accepted to IEEE HPEC 202

arXiv.org e-Print Archive

Crossref

Focusing and Calibration of Large Scale Network Sensors using GraphBLAS Anonymized Hypersparse Matrices

Author: Arcand William
Bergeron William
Bestor David
Byun Chansup
Davis Timothy
Dykstra Phil
Gadepally Vijay
Houle Micheal
Hubbell Matthew
Jananthan Hayden
Jones Michael
Kepner Jeremy
Klein Anna
Michaleas Peter
Milechin Lauren
Morales Guillermo
Mullen Julie
Patel Ritesh
Pentland Alex
Pisharody Sandeep
Prout Andrew
Reuther Albert
Rosa Antonio
Samsi Siddharth
Trigg Tyler
Yee Charles
Publication venue
Publication date: 04/09/2023
Field of study

Defending community-owned cyber space requires community-based efforts. Large-scale network observations that uphold the highest regard for privacy are key to protecting our shared cyberspace. Deployment of the necessary network sensors requires careful sensor placement, focusing, and calibration with significant volumes of network observations. This paper demonstrates novel focusing and calibration procedures on a multi-billion packet dataset using high-performance GraphBLAS anonymized hypersparse matrices. The run-time performance on a real-world data set confirms previously observed real-time processing rates for high-bandwidth links while achieving significant data compression. The output of the analysis demonstrates the effectiveness of these procedures at focusing the traffic matrix and revealing the underlying stable heavy-tail statistical distributions that are necessary for anomaly detection. A simple model of the corresponding probability of detection (

p_{\rm d}

) and probability of false alarm (

p_{\rm fa}

) for these distributions highlights the criticality of network sensor focusing and calibration. Once a sensor is properly focused and calibrated it is then in a position to carry out two of the central tenets of good cybersecurity: (1) continuous observation of the network and (2) minimizing unbrokered network connections.Comment: Accepted to IEEE HPEC, 9 pages, 12 figures, 1 table, 63 references, 2 appendice

arXiv.org e-Print Archive

Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries

Author: Alonso Gustavo
Barthels Claude
Besta Maciej
Fischer Marc
Gerstenberger Robert
Hoefler Torsten
Peter Emanuel
Podstawski Michał
Publication venue
Publication date: 04/11/2022
Field of study

Graph processing has become an important part of multiple areas of computer science, such as machine learning, computational sciences, medical applications, social network analysis, and many others. Numerous graphs such as web or social networks may contain up to trillions of edges. Often, these graphs are also dynamic (their structure changes over time) and have domain-specific rich data associated with vertices and edges. Graph database systems such as Neo4j enable storing, processing, and analyzing such large, evolving, and rich datasets. Due to the sheer size of such datasets, combined with the irregular nature of graph processing, these systems face unique design challenges. To facilitate the understanding of this emerging domain, we present the first survey and taxonomy of graph database systems. We focus on identifying and analyzing fundamental categories of these systems (e.g., triple stores, tuple stores, native graph database systems, or object-oriented systems), the associated graph models (e.g., RDF or Labeled Property Graph), data organization techniques (e.g., storing graph data in indexing structures or dividing data into records), and different aspects of data distribution and query execution (e.g., support for sharding and ACID). 51 graph database systems are presented and compared, including Neo4j, OrientDB, or Virtuoso. We outline graph database queries and relationships with associated domains (NoSQL stores, graph streaming, and dynamic graph algorithms). Finally, we describe research and engineering challenges to outline the future of graph databases

arXiv.org e-Print Archive

Neural Graph Databases

Author: Besta Maciej
Chen Tiancheng
Dryden Nikoli
Hoefler Torsten
Iff Patrick
Osawa Kazuki
Podstawski Michal
Scheidl Florian
Publication venue
Publication date: 20/09/2022
Field of study

Graph databases (GDBs) enable processing and analysis of unstructured, complex, rich, and usually vast graph datasets. Despite the large significance of GDBs in both academia and industry, little effort has been made into integrating them with the predictive power of graph neural networks (GNNs). In this work, we show how to seamlessly combine nearly any GNN model with the computational capabilities of GDBs. For this, we observe that the majority of these systems are based on, or support, a graph data model called the Labeled Property Graph (LPG), where vertices and edges can have arbitrarily complex sets of labels and properties. We then develop LPG2vec, an encoder that transforms an arbitrary LPG dataset into a representation that can be directly used with a broad class of GNNs, including convolutional, attentional, message-passing, and even higher-order or spectral models. In our evaluation, we show that the rich information represented as LPG labels and properties is properly preserved by LPG2vec, and it increases the accuracy of predictions regardless of the targeted learning task or the used GNN model, by up to 34% compared to graphs with no LPG labels/properties. In general, LPG2vec enables combining predictive power of the most powerful GNNs with the full scope of information encoded in the LPG model, paving the way for neural graph databases, a class of systems where the vast complexity of maintained data will benefit from modern and future graph machine learning methods

arXiv.org e-Print Archive