1,767 research outputs found
A Brief Study of Open Source Graph Databases
With the proliferation of large irregular sparse relational datasets, new
storage and analysis platforms have arisen to fill gaps in performance and
capability left by conventional approaches built on traditional database
technologies and query languages. Many of these platforms apply graph
structures and analysis techniques to enable users to ingest, update, query and
compute on the topological structure of these relationships represented as
set(s) of edges between set(s) of vertices. To store and process Facebook-scale
datasets, they must be able to support data sources with billions of edges,
update rates of millions of updates per second, and complex analysis kernels.
These platforms must provide intuitive interfaces that enable graph experts and
novice programmers to write implementations of common graph algorithms. In this
paper, we explore a variety of graph analysis and storage platforms. We compare
their capabil- ities, interfaces, and performance by implementing and computing
a set of real-world graph algorithms on synthetic graphs with up to 256 million
edges. In the spirit of full disclosure, several authors are affiliated with
the development of STINGER.Comment: WSSSPE13, 4 Pages, 18 Pages with Appendix, 25 figure
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
- …