831 research outputs found

    Automated Website Fingerprinting through Deep Learning

    Full text link
    Several studies have shown that the network traffic that is generated by a visit to a website over Tor reveals information specific to the website through the timing and sizes of network packets. By capturing traffic traces between users and their Tor entry guard, a network eavesdropper can leverage this meta-data to reveal which website Tor users are visiting. The success of such attacks heavily depends on the particular set of traffic features that are used to construct the fingerprint. Typically, these features are manually engineered and, as such, any change introduced to the Tor network can render these carefully constructed features ineffective. In this paper, we show that an adversary can automate the feature engineering process, and thus automatically deanonymize Tor traffic by applying our novel method based on deep learning. We collect a dataset comprised of more than three million network traces, which is the largest dataset of web traffic ever used for website fingerprinting, and find that the performance achieved by our deep learning approaches is comparable to known methods which include various research efforts spanning over multiple years. The obtained success rate exceeds 96% for a closed world of 100 websites and 94% for our biggest closed world of 900 classes. In our open world evaluation, the most performant deep learning model is 2% more accurate than the state-of-the-art attack. Furthermore, we show that the implicit features automatically learned by our approach are far more resilient to dynamic changes of web content over time. We conclude that the ability to automatically construct the most relevant traffic features and perform accurate traffic recognition makes our deep learning based approach an efficient, flexible and robust technique for website fingerprinting.Comment: To appear in the 25th Symposium on Network and Distributed System Security (NDSS 2018

    Principal Patterns on Graphs: Discovering Coherent Structures in Datasets

    Get PDF
    Graphs are now ubiquitous in almost every field of research. Recently, new research areas devoted to the analysis of graphs and data associated to their vertices have emerged. Focusing on dynamical processes, we propose a fast, robust and scalable framework for retrieving and analyzing recurring patterns of activity on graphs. Our method relies on a novel type of multilayer graph that encodes the spreading or propagation of events between successive time steps. We demonstrate the versatility of our method by applying it on three different real-world examples. Firstly, we study how rumor spreads on a social network. Secondly, we reveal congestion patterns of pedestrians in a train station. Finally, we show how patterns of audio playlists can be used in a recommender system. In each example, relevant information previously hidden in the data is extracted in a very efficient manner, emphasizing the scalability of our method. With a parallel implementation scaling linearly with the size of the dataset, our framework easily handles millions of nodes on a single commodity server

    Distributed Correlation-Based Feature Selection in Spark

    Get PDF
    CFS (Correlation-Based Feature Selection) is an FS algorithm that has been successfully applied to classification problems in many domains. We describe Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and distributed version of the CFS algorithm, capable of dealing with the large volumes of data typical of big data applications. Two versions of the algorithm were implemented and compared using the Apache Spark cluster computing model, currently gaining popularity due to its much faster processing times than Hadoop's MapReduce model. We tested our algorithms on four publicly available datasets, each consisting of a large number of instances and two also consisting of a large number of features. The results show that our algorithms were superior in terms of both time-efficiency and scalability. In leveraging a computer cluster, they were able to handle larger datasets than the non-distributed WEKA version while maintaining the quality of the results, i.e., exactly the same features were returned by our algorithms when compared to the original algorithm available in WEKA.Comment: 25 pages, 5 figure

    Anomaly Intrusion Detection based on Concept Drift

    Get PDF
    Nowadays, security on the internet is a vital issue and therefore, intrusion detection is one of the major research problems for networks that defend external attacks. Intrusion detection is a new approach for providing security in existing computers and data networks. An Intrusion Detection System is a software application that monitors the system for malicious activities and unauthorized access to the system. An easy accessibility condition causes computer networks vulnerable against the attack and several threats from attackers. Intrusion Detection System is used to analyze a network of interconnected systems for avoiding uncommon intrusion or chaos. The intrusion detection problem is becoming a challenging task due to the increase in computer networks since the increased connectivity of computer systems gives access to all and makes it easier for hackers to avoid their traces and identification. The goal of intrusion detection is to identify unauthorized use, misuse and abuse of computer systems. This project focuses on algorithms: (i) Concept Drift based ensemble Incremental Learning approach for anomaly intrusion detection, and (ii) Diversity and Transfer-based Ensemble Learning. These are highly ranked anomaly detection models. We study and compare both learning models. The Network Security Laboratory-Knowledge Discovery and Data Mining (NSL-KDD99) dataset have been used for training and to detect the misuse activities

    Stylizing Map Based on Examples of Representative Styling

    Get PDF
    Generally, the present disclosure is directed to stylizing a map based on one or more examples of representative styling. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict map styling rules based on one or more examples of representative styling
    • …
    corecore