6 research outputs found
RedisGraph GraphBLAS Enabled Graph Database
RedisGraph is a Redis module developed by Redis Labs to add graph database
functionality to the Redis database. RedisGraph represents connected data as
adjacency matrices. By representing the data as sparse matrices and employing
the power of GraphBLAS (a highly optimized library for sparse matrix
operations), RedisGraph delivers a fast and efficient way to store, manage and
process graphs. Initial benchmarks indicate that RedisGraph is significantly
faster than comparable graph databases.Comment: Accepted to IEEE IPDPS 2019 GrAPL worksho
Fast Mapping onto Census Blocks
Pandemic measures such as social distancing and contact tracing can be
enhanced by rapidly integrating dynamic location data and demographic data.
Projecting billions of longitude and latitude locations onto hundreds of
thousands of highly irregular demographic census block polygons is
computationally challenging in both research and deployment contexts. This
paper describes two approaches labeled "simple" and "fast". The simple approach
can be implemented in any scripting language (Matlab/Octave, Python, Julia, R)
and is easily integrated and customized to a variety of research goals. This
simple approach uses a novel combination of hierarchy, sparse bounding boxes,
polygon crossing-number, vectorization, and parallel processing to achieve
100,000,000+ projections per second on 100 servers. The simple approach is
compact, does not increase data storage requirements, and is applicable to any
country or region. The fast approach exploits the thread, vector, and memory
optimizations that are possible using a low-level language (C++) and achieves
similar performance on a single server. This paper details these approaches
with the goal of enabling the broader community to quickly integrate location
and demographic data.Comment: 8 pages, 7 figures, 55 references; accepted to IEEE HPEC 202
Focusing and Calibration of Large Scale Network Sensors using GraphBLAS Anonymized Hypersparse Matrices
Defending community-owned cyber space requires community-based efforts.
Large-scale network observations that uphold the highest regard for privacy are
key to protecting our shared cyberspace. Deployment of the necessary network
sensors requires careful sensor placement, focusing, and calibration with
significant volumes of network observations. This paper demonstrates novel
focusing and calibration procedures on a multi-billion packet dataset using
high-performance GraphBLAS anonymized hypersparse matrices. The run-time
performance on a real-world data set confirms previously observed real-time
processing rates for high-bandwidth links while achieving significant data
compression. The output of the analysis demonstrates the effectiveness of these
procedures at focusing the traffic matrix and revealing the underlying stable
heavy-tail statistical distributions that are necessary for anomaly detection.
A simple model of the corresponding probability of detection () and
probability of false alarm () for these distributions highlights
the criticality of network sensor focusing and calibration. Once a sensor is
properly focused and calibrated it is then in a position to carry out two of
the central tenets of good cybersecurity: (1) continuous observation of the
network and (2) minimizing unbrokered network connections.Comment: Accepted to IEEE HPEC, 9 pages, 12 figures, 1 table, 63 references, 2
appendice
Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries
Graph processing has become an important part of multiple areas of computer
science, such as machine learning, computational sciences, medical
applications, social network analysis, and many others. Numerous graphs such as
web or social networks may contain up to trillions of edges. Often, these
graphs are also dynamic (their structure changes over time) and have
domain-specific rich data associated with vertices and edges. Graph database
systems such as Neo4j enable storing, processing, and analyzing such large,
evolving, and rich datasets. Due to the sheer size of such datasets, combined
with the irregular nature of graph processing, these systems face unique design
challenges. To facilitate the understanding of this emerging domain, we present
the first survey and taxonomy of graph database systems. We focus on
identifying and analyzing fundamental categories of these systems (e.g., triple
stores, tuple stores, native graph database systems, or object-oriented
systems), the associated graph models (e.g., RDF or Labeled Property Graph),
data organization techniques (e.g., storing graph data in indexing structures
or dividing data into records), and different aspects of data distribution and
query execution (e.g., support for sharding and ACID). 51 graph database
systems are presented and compared, including Neo4j, OrientDB, or Virtuoso. We
outline graph database queries and relationships with associated domains (NoSQL
stores, graph streaming, and dynamic graph algorithms). Finally, we describe
research and engineering challenges to outline the future of graph databases
Neural Graph Databases
Graph databases (GDBs) enable processing and analysis of unstructured,
complex, rich, and usually vast graph datasets. Despite the large significance
of GDBs in both academia and industry, little effort has been made into
integrating them with the predictive power of graph neural networks (GNNs). In
this work, we show how to seamlessly combine nearly any GNN model with the
computational capabilities of GDBs. For this, we observe that the majority of
these systems are based on, or support, a graph data model called the Labeled
Property Graph (LPG), where vertices and edges can have arbitrarily complex
sets of labels and properties. We then develop LPG2vec, an encoder that
transforms an arbitrary LPG dataset into a representation that can be directly
used with a broad class of GNNs, including convolutional, attentional,
message-passing, and even higher-order or spectral models. In our evaluation,
we show that the rich information represented as LPG labels and properties is
properly preserved by LPG2vec, and it increases the accuracy of predictions
regardless of the targeted learning task or the used GNN model, by up to 34%
compared to graphs with no LPG labels/properties. In general, LPG2vec enables
combining predictive power of the most powerful GNNs with the full scope of
information encoded in the LPG model, paving the way for neural graph
databases, a class of systems where the vast complexity of maintained data will
benefit from modern and future graph machine learning methods