4 research outputs found
Assessing Internet-wide Cyber Situational Awareness of Critical Sectors
In this short paper, we take a first step towards empirically assessing Internet-wide malicious activities generated from and targeted towards Internet-scale business sectors (i.e., financial, health, education, etc.) and critical infrastructure (i.e., utilities, manufacturing, government, etc.). Facilitated by an innovative and a collaborative large-scale effort, we have conducted discussions with numerous Internet entities to obtain rare and private information related to allocated IP blocks pertaining to the aforementioned sectors and critical infrastructure. To this end, we employ such information to attribute Internet-scale maliciousness to such sectors and realms, in an attempt to provide an in-depth analysis of the global cyber situational posture. We draw upon close to 16.8 TB of darknet data to infer probing activities (typically generated by malicious/infected hosts) and DDoS backscatter, from which we distill IP addresses of victims. By executing week-long measurements, we observed an alarming number of more than 11,000 probing machines and 300 DDoS attack victims hosted by critical sectors. We also generate rare insights related to the maliciousness of various business sectors, including financial, which typically do not report their hosted and targeted illicit activities for reputation-preservation purposes. While we treat the obtained results with strict confidence due to obvious sensitivity reasons, we postulate that such generated cyber threat intelligence could be shared with sector/critical infrastructure operators, backbone networks and Internet service providers to contribute to the overall threat remediation objective
Hypersparse Neural Network Analysis of Large-Scale Internet Traffic
The Internet is transforming our society, necessitating a quantitative
understanding of Internet traffic. Our team collects and curates the largest
publicly available Internet traffic data containing 50 billion packets.
Utilizing a novel hypersparse neural network analysis of "video" streams of
this traffic using 10,000 processors in the MIT SuperCloud reveals a new
phenomena: the importance of otherwise unseen leaf nodes and isolated links in
Internet traffic. Our neural network approach further shows that a
two-parameter modified Zipf-Mandelbrot distribution accurately describes a wide
variety of source/destination statistics on moving sample windows ranging from
100,000 to 100,000,000 packets over collections that span years and continents.
The inferred model parameters distinguish different network streams and the
model leaf parameter strongly correlates with the fraction of the traffic in
different underlying network topologies. The hypersparse neural network
pipeline is highly adaptable and different network statistics and training
models can be incorporated with simple changes to the image filter functions.Comment: 11 pages, 10 figures, 3 tables, 60 citations; to appear in IEEE High
Performance Extreme Computing (HPEC) 201
Data-driven curation, learning and analysis for inferring evolving IoT botnets in the wild
The insecurity of the Internet-of-Things (IoT) paradigm continues to wreak havoc in consumer and critical infrastructure realms. Several challenges impede addressing IoT security at large, including, the lack of IoT-centric data that can be collected, analyzed and correlated, due to the highly heterogeneous nature of such devices and their widespread deployments in Internet-wide environments. To this end, this paper explores macroscopic, passive empirical data to shed light on this evolving threat phenomena. This not only aims at classifying and inferring Internet-scale compromised IoT devices by solely observing such one-way network traffic, but also endeavors to uncover, track and report on orchestrated "in the wild" IoT botnets. Initially, to prepare the effective utilization of such data, a novel probabilistic model is designed and developed to cleanse such traffic from noise samples (i.e., misconfiguration traffic). Subsequently, several shallow and deep learning models are evaluated to ultimately design and develop a multi-window convolution neural network trained on active and passive measurements to accurately identify compromised IoT devices. Consequently, to infer orchestrated and unsolicited activities that have been generated by well-coordinated IoT botnets, hierarchical agglomerative clustering is deployed by scrutinizing a set of innovative and efficient network feature sets. By analyzing 3.6 TB of recent darknet traffic, the proposed approach uncovers a momentous 440,000 compromised IoT devices and generates evidence-based artifacts related to 350 IoT botnets. While some of these detected botnets refer to previously documented campaigns such as the Hide and Seek, Hajime and Fbot, other events illustrate evolving threats such as those with cryptojacking capabilities and those that are targeting industrial control system communication and control services