12,362 research outputs found

    Your Smart Home Can't Keep a Secret: Towards Automated Fingerprinting of IoT Traffic with Neural Networks

    Get PDF
    The IoT (Internet of Things) technology has been widely adopted in recent years and has profoundly changed the people's daily lives. However, in the meantime, such a fast-growing technology has also introduced new privacy issues, which need to be better understood and measured. In this work, we look into how private information can be leaked from network traffic generated in the smart home network. Although researchers have proposed techniques to infer IoT device types or user behaviors under clean experiment setup, the effectiveness of such approaches become questionable in the complex but realistic network environment, where common techniques like Network Address and Port Translation (NAPT) and Virtual Private Network (VPN) are enabled. Traffic analysis using traditional methods (e.g., through classical machine-learning models) is much less effective under those settings, as the features picked manually are not distinctive any more. In this work, we propose a traffic analysis framework based on sequence-learning techniques like LSTM and leveraged the temporal relations between packets for the attack of device identification. We evaluated it under different environment settings (e.g., pure-IoT and noisy environment with multiple non-IoT devices). The results showed our framework was able to differentiate device types with a high accuracy. This result suggests IoT network communications pose prominent challenges to users' privacy, even when they are protected by encryption and morphed by the network gateway. As such, new privacy protection methods on IoT traffic need to be developed towards mitigating this new issue

    Multitask Learning for Network Traffic Classification

    Full text link
    Traffic classification has various applications in today's Internet, from resource allocation, billing and QoS purposes in ISPs to firewall and malware detection in clients. Classical machine learning algorithms and deep learning models have been widely used to solve the traffic classification task. However, training such models requires a large amount of labeled data. Labeling data is often the most difficult and time-consuming process in building a classifier. To solve this challenge, we reformulate the traffic classification into a multi-task learning framework where bandwidth requirement and duration of a flow are predicted along with the traffic class. The motivation of this approach is twofold: First, bandwidth requirement and duration are useful in many applications, including routing, resource allocation, and QoS provisioning. Second, these two values can be obtained from each flow easily without the need for human labeling or capturing flows in a controlled and isolated environment. We show that with a large amount of easily obtainable data samples for bandwidth and duration prediction tasks, and only a few data samples for the traffic classification task, one can achieve high accuracy. We conduct two experiment with ISCX and QUIC public datasets and show the efficacy of our approach

    An Experimental Evaluation of the Computational Cost of a DPI Traffic Classifier

    Get PDF
    A common belief in the scientific community is that traffic classifiers based on deep packet inspection (DPI) are far more expensive in terms of computational complexity compared to statistical classifiers. In this paper we counter this notion by defining accurate models for a deep packet inspection classifier and a statistical one based on support vector machines, and by evaluating their actual processing costs through experimental analysis. The results suggest that, contrary to the common belief, a DPI classifier and an SVM-based one can have comparable computational costs. Although much work is left to prove that our results apply in more general cases, this preliminary analysis is a first indication of how DPI classifiers might not be as computationally complex, compared to other approaches, as we previously though

    No NAT'd User left Behind: Fingerprinting Users behind NAT from NetFlow Records alone

    Full text link
    It is generally recognized that the traffic generated by an individual connected to a network acts as his biometric signature. Several tools exploit this fact to fingerprint and monitor users. Often, though, these tools assume to access the entire traffic, including IP addresses and payloads. This is not feasible on the grounds that both performance and privacy would be negatively affected. In reality, most ISPs convert user traffic into NetFlow records for a concise representation that does not include, for instance, any payloads. More importantly, large and distributed networks are usually NAT'd, thus a few IP addresses may be associated to thousands of users. We devised a new fingerprinting framework that overcomes these hurdles. Our system is able to analyze a huge amount of network traffic represented as NetFlows, with the intent to track people. It does so by accurately inferring when users are connected to the network and which IP addresses they are using, even though thousands of users are hidden behind NAT. Our prototype implementation was deployed and tested within an existing large metropolitan WiFi network serving about 200,000 users, with an average load of more than 1,000 users simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned out to be very effective, with an accuracy greater than 90%. We also devised new tools and refined existing ones that may be applied to other contexts related to NetFlow analysis

    A user-oriented network forensic analyser: the design of a high-level protocol analyser

    Get PDF
    Network forensics is becoming an increasingly important tool in the investigation of cyber and computer-assisted crimes. Unfortunately, whilst much effort has been undertaken in developing computer forensic file system analysers (e.g. Encase and FTK), such focus has not been given to Network Forensic Analysis Tools (NFATs). The single biggest barrier to effective NFATs is the handling of large volumes of low-level traffic and being able to exact and interpret forensic artefacts and their context – for example, being able extract and render application-level objects (such as emails, web pages and documents) from the low-level TCP/IP traffic but also understand how these applications/artefacts are being used. Whilst some studies and tools are beginning to achieve object extraction, results to date are limited to basic objects. No research has focused upon analysing network traffic to understand the nature of its use – not simply looking at the fact a person requested a webpage, but how long they spend on the application and what interactions did they have with whilst using the service (e.g. posting an image, or engaging in an instant message chat). This additional layer of information can provide an investigator with a far more rich and complete understanding of a suspect’s activities. To this end, this paper presents an investigation into the ability to derive high-level application usage characteristics from low-level network traffic meta-data. The paper presents a three application scenarios – web surfing, communications and social networking and demonstrates it is possible to derive the user interactions (e.g. page loading, chatting and file sharing ) within these systems. The paper continues to present a framework that builds upon this capability to provide a robust, flexible and user-friendly NFAT that provides access to a greater range of forensic information in a far easier format
    corecore