Search CORE

4 research outputs found

Streaming 1.9 Billion Hypersparse Network Updates per Second with D4M

Author: Arcand William
Bergeron William
Bestor David
Byun Chansup
Gadepally Vijay
Houle Michael
Hubbell Matthew
Jones Michael
Kepner Jeremy
Klein Anne
Michaleas Peter
Milechin Lauren
Mullen Julie
Prout Andrew
Reuther Albert
Rosa Antonio
Samsi Siddharth
Yee Charles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/07/2019
Field of study

The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database implementation of hypersparse arrays that are ideal for analyzing many types of network data. D4M relies on associative arrays which combine properties of spreadsheets, databases, matrices, graphs, and networks, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of D4M associative arrays put enormous pressure on the memory hierarchy. This work describes the design and performance optimization of an implementation of hierarchical associative arrays that reduces memory pressure and dramatically increases the update rate into an associative array. The parameters of hierarchical associative arrays rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical arrays achieve over 40,000 updates per second in a single instance. Scaling to 34,000 instances of hierarchical D4M associative arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 1,900,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.Comment: 6 pages; 6 figures; accepted to IEEE High Performance Extreme Computing (HPEC) Conference 2019. arXiv admin note: text overlap with arXiv:1807.05308, arXiv:1902.0084

arXiv.org e-Print Archive

Crossref

Multi-Temporal Analysis and Scaling Relations of 100,000,000,000 Network Packets

Author: Arcand William
Bergeron William
Bernays Jonathan
Bestor David
Byun Chansup
Davis Timothy
Gadepally Vijay
Harnasch Raul
Houle Micheal
Hubbell Matthew
Jones Micheal
Kepner Jeremy
Kirby Andrew
Klein Anna
McGuire Sarah
Meiners Chad
Michaleas Peter
Milechin Lauren
Mullen Julie
Prout Andrew
Reuther Albert
Rosa Antonio
Samsi Siddharth
Stetson Doug
Tse Adam
Yee Charles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2020
Field of study

Our society has never been more dependent on computer networks. Effective utilization of networks requires a detailed understanding of the normal background behaviors of network traffic. Large-scale measurements of networks are computationally challenging. Building on prior work in interactive supercomputing and GraphBLAS hypersparse hierarchical traffic matrices, we have developed an efficient method for computing a wide variety of streaming network quantities on diverse time scales. Applying these methods to 100,000,000,000 anonymized source-destination pairs collected at a network gateway reveals many previously unobserved scaling relationships. These observations provide new insights into normal network background traffic that could be used for anomaly detection, AI feature engineering, and testing theoretical models of streaming networks.Comment: 6 pages, 6 figures,3 tables, 49 references, accepted to IEEE HPEC 202

arXiv.org e-Print Archive

Crossref

Fast Mapping onto Census Blocks

Author: Arcand William
Bergeron William
Bestor David
Byun Chansup
Engwirda Darren
Gadepally Vijay
Hill Chris
Houle Michael
Hubbell Matthew
Jones Michael
Kepner Jeremy
Kipf Andreas
Kirby Andrew
Klein Anna
Kraska Tim
Michaleas Peter
Milechin Lauren
Mullen Julie
Prout Andrew
Reuther Albert
Rosa Antonio
Samsi Sid
Vembar Navin
Yee Charles
Publication venue
Publication date: 01/08/2020
Field of study

Pandemic measures such as social distancing and contact tracing can be enhanced by rapidly integrating dynamic location data and demographic data. Projecting billions of longitude and latitude locations onto hundreds of thousands of highly irregular demographic census block polygons is computationally challenging in both research and deployment contexts. This paper describes two approaches labeled "simple" and "fast". The simple approach can be implemented in any scripting language (Matlab/Octave, Python, Julia, R) and is easily integrated and customized to a variety of research goals. This simple approach uses a novel combination of hierarchy, sparse bounding boxes, polygon crossing-number, vectorization, and parallel processing to achieve 100,000,000+ projections per second on 100 servers. The simple approach is compact, does not increase data storage requirements, and is applicable to any country or region. The fast approach exploits the thread, vector, and memory optimizations that are possible using a low-level language (C++) and achieves similar performance on a single server. This paper details these approaches with the goal of enabling the broader community to quickly integrate location and demographic data.Comment: 8 pages, 7 figures, 55 references; accepted to IEEE HPEC 202

arXiv.org e-Print Archive

Crossref