3 research outputs found

    Hypersparse Neural Network Analysis of Large-Scale Internet Traffic

    Full text link
    The Internet is transforming our society, necessitating a quantitative understanding of Internet traffic. Our team collects and curates the largest publicly available Internet traffic data containing 50 billion packets. Utilizing a novel hypersparse neural network analysis of "video" streams of this traffic using 10,000 processors in the MIT SuperCloud reveals a new phenomena: the importance of otherwise unseen leaf nodes and isolated links in Internet traffic. Our neural network approach further shows that a two-parameter modified Zipf-Mandelbrot distribution accurately describes a wide variety of source/destination statistics on moving sample windows ranging from 100,000 to 100,000,000 packets over collections that span years and continents. The inferred model parameters distinguish different network streams and the model leaf parameter strongly correlates with the fraction of the traffic in different underlying network topologies. The hypersparse neural network pipeline is highly adaptable and different network statistics and training models can be incorporated with simple changes to the image filter functions.Comment: 11 pages, 10 figures, 3 tables, 60 citations; to appear in IEEE High Performance Extreme Computing (HPEC) 201

    Big privacy: challenges and opportunities of privacy study in the age of big data

    Full text link
    One of the biggest concerns of big data is privacy. However, the study on big data privacy is still at a very early stage. We believe the forthcoming solutions and theories of big data privacy root from the in place research output of the privacy discipline. Motivated by these factors, we extensively survey the existing research outputs and achievements of the privacy field in both application and theoretical angles, aiming to pave a solid starting ground for interested readers to address the challenges in the big data case. We first present an overview of the battle ground by defining the roles and operations of privacy systems. Second, we review the milestones of the current two major research categories of privacy: data clustering and privacy frameworks. Third, we discuss the effort of privacy study from the perspectives of different disciplines, respectively. Fourth, the mathematical description, measurement, and modeling on privacy are presented. We summarize the challenges and opportunities of this promising topic at the end of this paper, hoping to shed light on the exciting and almost uncharted land

    Predicted packet padding for anonymous web browsing against traffic analysis attacks

    Full text link
    Anonymous communication has become a hot research topic in order to meet the increasing demand for web privacy protection. However, there are few such systems which can provide high level anonymity for web browsing. The reason is the current dominant dummy packet padding method for anonymization against traffic analysis attacks. This method inherits huge delay and bandwidth waste, which inhibits its use for web browsing. In this paper, we propose a predicted packet padding strategy to replace the dummy packet padding method for anonymous web browsing systems. The proposed strategy mitigates delay and bandwidth waste significantly on average. We formulated the traffic analysis attack and defense problem, and defined a metric, cost coefficient of anonymization (CCA), to measure the performance of anonymization. We thoroughly analyzed the problem with the characteristics of web browsing and concluded that the proposed strategy is better than the current dummy packet padding strategy in theory. We have conducted extensive experiments on two real world data sets, and the results confirmed the advantage of the proposed method
    corecore