The Dynamic Distributed Dimensional Data Model (D4M) library implements
associative arrays in a variety of languages (Python, Julia, and Matlab/Octave)
and provides a lightweight in-memory database implementation of hypersparse
arrays that are ideal for analyzing many types of network data. D4M relies on
associative arrays which combine properties of spreadsheets, databases,
matrices, graphs, and networks, while providing rigorous mathematical
guarantees, such as linearity. Streaming updates of D4M associative arrays put
enormous pressure on the memory hierarchy. This work describes the design and
performance optimization of an implementation of hierarchical associative
arrays that reduces memory pressure and dramatically increases the update rate
into an associative array. The parameters of hierarchical associative arrays
rely on controlling the number of entries in each level in the hierarchy before
an update is cascaded. The parameters are easily tunable to achieve optimal
performance for a variety of applications. Hierarchical arrays achieve over
40,000 updates per second in a single instance. Scaling to 34,000 instances of
hierarchical D4M associative arrays on 1,100 server nodes on the MIT SuperCloud
achieved a sustained update rate of 1,900,000,000 updates per second. This
capability allows the MIT SuperCloud to analyze extremely large streaming
network data sets.Comment: 6 pages; 6 figures; accepted to IEEE High Performance Extreme
Computing (HPEC) Conference 2019. arXiv admin note: text overlap with
arXiv:1807.05308, arXiv:1902.0084