24,697 research outputs found

    Efficient Learning of Communication Profiles from IP Flow Records

    Get PDF
    The task of network traffic monitoring has evolved drastically with the ever-increasing amount of data flowing in large scale networks. The automated analysis of this tremendous source of information often comes with using simpler models on aggregated data (e.g. IP flow records) due to time and space constraints. A step towards utilizing IP flow records more effectively are stream learning techniques. We propose a method to collect a limited yet relevant amount of data in order to learn a class of complex models, finite state machines, in real-time. These machines are used as communication profiles to fingerprint, identify or classify hosts and services and offer high detection rates while requiring less training data and thus being faster to compute than simple models

    No NAT'd User left Behind: Fingerprinting Users behind NAT from NetFlow Records alone

    Full text link
    It is generally recognized that the traffic generated by an individual connected to a network acts as his biometric signature. Several tools exploit this fact to fingerprint and monitor users. Often, though, these tools assume to access the entire traffic, including IP addresses and payloads. This is not feasible on the grounds that both performance and privacy would be negatively affected. In reality, most ISPs convert user traffic into NetFlow records for a concise representation that does not include, for instance, any payloads. More importantly, large and distributed networks are usually NAT'd, thus a few IP addresses may be associated to thousands of users. We devised a new fingerprinting framework that overcomes these hurdles. Our system is able to analyze a huge amount of network traffic represented as NetFlows, with the intent to track people. It does so by accurately inferring when users are connected to the network and which IP addresses they are using, even though thousands of users are hidden behind NAT. Our prototype implementation was deployed and tested within an existing large metropolitan WiFi network serving about 200,000 users, with an average load of more than 1,000 users simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned out to be very effective, with an accuracy greater than 90%. We also devised new tools and refined existing ones that may be applied to other contexts related to NetFlow analysis

    On the Relation Between Mobile Encounters and Web Traffic Patterns: A Data-driven Study

    Full text link
    Mobility and network traffic have been traditionally studied separately. Their interaction is vital for generations of future mobile services and effective caching, but has not been studied in depth with real-world big data. In this paper, we characterize mobility encounters and study the correlation between encounters and web traffic profiles using large-scale datasets (30TB in size) of WiFi and NetFlow traces. The analysis quantifies these correlations for the first time, across spatio-temporal dimensions, for device types grouped into on-the-go Flutes and sit-to-use Cellos. The results consistently show a clear relation between mobility encounters and traffic across different buildings over multiple days, with encountered pairs showing higher traffic similarity than non-encountered pairs, and long encounters being associated with the highest similarity. We also investigate the feasibility of learning encounters through web traffic profiles, with implications for dissemination protocols, and contact tracing. This provides a compelling case to integrate both mobility and web traffic dimensions in future models, not only at an individual level, but also at pairwise and collective levels. We have released samples of code and data used in this study on GitHub, to support reproducibility and encourage further research (https://github.com/BabakAp/encounter-traffic).Comment: Technical report with details for conference paper at MSWiM 2018, v3 adds GitHub lin

    Profiling Users by Modeling Web Transactions

    Full text link
    Users of electronic devices, e.g., laptop, smartphone, etc. have characteristic behaviors while surfing the Web. Profiling this behavior can help identify the person using a given device. In this paper, we introduce a technique to profile users based on their web transactions. We compute several features extracted from a sequence of web transactions and use them with one-class classification techniques to profile a user. We assess the efficacy and speed of our method at differentiating 25 users on a dataset representing 6 months of web traffic monitoring from a small company network.Comment: Extended technical report of an IEEE ICDCS 2017 publicatio

    Role based behavior analysis

    Get PDF
    Tese de mestrado, Segurança Informática, Universidade de Lisboa, Faculdade de Ciências, 2009Nos nossos dias, o sucesso de uma empresa depende da sua agilidade e capacidade de se adaptar a condições que se alteram rapidamente. Dois requisitos para esse sucesso são trabalhadores proactivos e uma infra-estrutura ágil de Tecnologias de Informacão/Sistemas de Informação (TI/SI) que os consiga suportar. No entanto, isto nem sempre sucede. Os requisitos dos utilizadores ao nível da rede podem nao ser completamente conhecidos, o que causa atrasos nas mudanças de local e reorganizações. Além disso, se não houver um conhecimento preciso dos requisitos, a infraestrutura de TI/SI poderá ser utilizada de forma ineficiente, com excessos em algumas áreas e deficiências noutras. Finalmente, incentivar a proactividade não implica acesso completo e sem restrições, uma vez que pode deixar os sistemas vulneráveis a ameaças externas e internas. O objectivo do trabalho descrito nesta tese é desenvolver um sistema que consiga caracterizar o comportamento dos utilizadores do ponto de vista da rede. Propomos uma arquitectura de sistema modular para extrair informação de fluxos de rede etiquetados. O processo é iniciado com a criação de perfis de utilizador a partir da sua informação de fluxos de rede. Depois, perfis com características semelhantes são agrupados automaticamente, originando perfis de grupo. Finalmente, os perfis individuais são comprados com os perfis de grupo, e os que diferem significativamente são marcados como anomalias para análise detalhada posterior. Considerando esta arquitectura, propomos um modelo para descrever o comportamento de rede dos utilizadores e dos grupos. Propomos ainda métodos de visualização que permitem inspeccionar rapidamente toda a informação contida no modelo. O sistema e modelo foram avaliados utilizando um conjunto de dados reais obtidos de um operador de telecomunicações. Os resultados confirmam que os grupos projectam com precisão comportamento semelhante. Além disso, as anomalias foram as esperadas, considerando a população subjacente. Com a informação que este sistema consegue extrair dos dados em bruto, as necessidades de rede dos utilizadores podem sem supridas mais eficazmente, os utilizadores suspeitos são assinalados para posterior análise, conferindo uma vantagem competitiva a qualquer empresa que use este sistema.In our days, the success of a corporation hinges on its agility and ability to adapt to fast changing conditions. Proactive workers and an agile IT/IS infrastructure that can support them is a requirement for this success. Unfortunately, this is not always the case. The user’s network requirements may not be fully understood, which slows down relocation and reorganization. Also, if there is no grasp on the real requirements, the IT/IS infrastructure may not be efficiently used, with waste in some areas and deficiencies in others. Finally, enabling proactivity does not mean full unrestricted access, since this may leave the systems vulnerable to outsider and insider threats. The purpose of the work described on this thesis is to develop a system that can characterize user network behavior. We propose a modular system architecture to extract information from tagged network flows. The system process begins by creating user profiles from their network flows’ information. Then, similar profiles are automatically grouped into clusters, creating role profiles. Finally, the individual profiles are compared against the roles, and the ones that differ significantly are flagged as anomalies for further inspection. Considering this architecture, we propose a model to describe user and role network behavior. We also propose visualization methods to quickly inspect all the information contained in the model. The system and model were evaluated using a real dataset from a large telecommunications operator. The results confirm that the roles accurately map similar behavior. The anomaly results were also expected, considering the underlying population. With the knowledge that the system can extract from the raw data, the users network needs can be better fulfilled, the anomalous users flagged for inspection, giving an edge in agility for any company that uses it

    Anomaly-Based Intrusion Detection by Modeling Probability Distributions of Flow Characteristics

    Get PDF
    In recent years, with the increased use of network communication, the risk of compromising the information has grown immensely. Intrusions have evolved and become more sophisticated. Hence, classical detection systems show poor performance in detecting novel attacks. Although much research has been devoted to improving the performance of intrusion detection systems, few methods can achieve consistently efficient results with the constant changes in network communications. This thesis proposes an intrusion detection system based on modeling distributions of network flow statistics in order to achieve a high detection rate for known and stealthy attacks. The proposed model aggregates the traffic at the IP subnetwork level using a hierarchical heavy hitters algorithm. This aggregated traffic is used to build the distribution of network statistics for the most frequent IPv4 addresses encountered as destination. The obtained probability density functions are learned by the Extreme Learning Machine method which is a single-hidden layer feedforward neural network. In this thesis, different sequential and batch learning strategies are proposed in order to analyze the efficiency of this proposed approach. The performance of the model is evaluated on the ISCX-IDS 2012 dataset consisting of injection attacks, HTTP flooding, DDoS and brute force intrusions. The experimental results of the thesis indicate that the presented method achieves an average detection rate of 91% while having a low misclassification rate of 9%, which is on par with the state-of-the-art approaches using this dataset. In addition, the proposed method can be utilized as a network behavior analysis tool specifically for DDoS mitigation, since it can isolate aggregated IPv4 addresses from the rest of the network traffic, thus supporting filtering out DDoS attacks
    • …
    corecore