13 research outputs found

    No NAT'd User left Behind: Fingerprinting Users behind NAT from NetFlow Records alone

    Full text link
    It is generally recognized that the traffic generated by an individual connected to a network acts as his biometric signature. Several tools exploit this fact to fingerprint and monitor users. Often, though, these tools assume to access the entire traffic, including IP addresses and payloads. This is not feasible on the grounds that both performance and privacy would be negatively affected. In reality, most ISPs convert user traffic into NetFlow records for a concise representation that does not include, for instance, any payloads. More importantly, large and distributed networks are usually NAT'd, thus a few IP addresses may be associated to thousands of users. We devised a new fingerprinting framework that overcomes these hurdles. Our system is able to analyze a huge amount of network traffic represented as NetFlows, with the intent to track people. It does so by accurately inferring when users are connected to the network and which IP addresses they are using, even though thousands of users are hidden behind NAT. Our prototype implementation was deployed and tested within an existing large metropolitan WiFi network serving about 200,000 users, with an average load of more than 1,000 users simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned out to be very effective, with an accuracy greater than 90%. We also devised new tools and refined existing ones that may be applied to other contexts related to NetFlow analysis

    GT: Picking up the Truth from the Ground for Internet Traffic

    Get PDF
    Much of Internet traffic modeling, firewall, and intrusion detection research requires traces where some ground truth regarding application and protocol is associated with each packet or flow. This paper presents the design, development and experimental evaluation of gt, an open source software toolset for associating ground truth information with Internet traffic traces. By probing the monitored host's kernel to obtain information on active Internet sessions, gt gathers ground truth at the application level. Preliminary exper- imental results show that gt's effectiveness comes at little cost in terms of overhead on the hosting machines. Furthermore, when coupled with other packet inspection mechanisms, gt can derive ground truth not only in terms of applications (e.g., e-mail), but also in terms of protocols (e.g., SMTP vs. POP3

    Exploring EDNS-client-subnet adopters in your free time

    Full text link

    Link homophily in the application layer and its usage in traffic classification

    Get PDF
    Abstract-This paper addresses the following questions. Is there link homophily in the application layer traffic? If so, can it be used to accurately classify traffic in network trace data without relying on payloads or properties at the flow level? Our research shows that the answers to both of these questions are affirmative in real network trace data. Specifically, we define link homophily to be the tendency for flows with common IP hosts to have the same application (P2P, Web, etc.) compared to randomly selected flows. The presence of link homophily in trace data provides us with statistical dependencies between flows that share common IP hosts. We utilize these dependencies to classify application layer traffic without relying on payloads or properties at the flow level. In particular, we introduce a new statistical relational learning algorithm, called Neighboring Link Classifier with Relaxation Labeling (NLC+RL). Our algorithm has no training phase and does not require features to be constructed. All that it needs to start the classification process is traffic information on a small portion of the initial flows, which we refer to as seeds. In all our traces, NLC+RL achieves above 90% accuracy with less than 5% seed size; it is robust to errors in the seeds and various seed-selection biases; and it is able to accurately classify challenging traffic such as P2P with over 90% Precision and Recall

    Unsupervised host behavior classification from connection patterns

    Get PDF
    International audienceA novel host behavior classification approach is proposed as a preliminary step toward traffic classification and anomaly detection in network communication. Though many attempts described in the literature were devoted to flow or application classifications, these approaches are not always adaptable to operational constraints of traffic monitoring (expected to work even without packet payload, without bidirectionality, on highspeed networks or from flow reports only...). Instead, the classification proposed here relies on the leading idea that traffic is relevantly analyzed in terms of host typical behaviors: typical connection patterns of both legitimate applications (data sharing, downloading,...) and anomalous (eventually aggressive) behaviors are obtained by profiling traffic at the host level using unsupervised statistical classification. Classification at the host level is not reducible to flow or application classification, and neither is the contrary: they are different operations which might have complementary roles in network management. The proposed host classification is based on a nine-dimensional feature space evaluating host Internet connectivity, dispersion and exchanged traffic content. A Minimum Spanning Tree (MST) clustering technique is developed that does not require any supervised learning step to produce a set of statistically established typical host behaviors. Not relying on a priori defined classes of known behaviors enables the procedure to discover new host behaviors, that potentially were never observed before. This procedure is applied to traffic collected over the entire year 2008 on a transpacific (Japan/USA) link. A cross-validation of this unsupervised classification against a classical port-based inspection and a state-of-the-art method provides assessment of the meaningfulness and the relevance of the obtained classes for host behaviors

    Detecting Networks Employing Algorithmically Generated Domain Names

    Get PDF
    Recent Botnets such as Conficker, Kraken and Torpig have used DNS based "domain fluxing" for command-and-control, where each Bot queries for existence of a series of domain names and the owner has to register only one such domain name. In this report, we develop a methodology to detect such "domain fluxes" in DNS traffic by looking for patterns inherent to domain names that are generated algorithmically, in contrast to those generated by humans. In particular, we look at distribution of alphanumeric characters as well as bigrams in all domains that are mapped to the same set of IP-addresses. We present and compare the performance of several distance metrics, including KL-distance and Edit distance. We train by using a good data set of domains obtained via a crawl of domains mapped to all IPv4 address space and modeling bad data sets based on behaviors seen so far and expected. We also apply our methodology to packet traces collected at two Tier-1 ISPs and show we can automatically detect domain fluxing as used by Conficker botnet with minimal false positives. We are also able to detect new botnets and other malicious networks using our method

    Caracterização multi-escalar de tráfego em redes protegidas

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaAtualmente, a Internet pode ser vista como uma mistura de diversos serviços e aplicações que correm sobre protocolos comuns. O aparecimento de inúmeras aplicações Web mudou o paradigma de interação dos utilizadores, colocando-os num papel mais ativo, permitindo aos utilizadores da Internet partilhar fotos, vídeos e muito mais. A análise do perfil de cada utilizador, tanto em redes wired como wireless, tornou-se muito interessante para tarefas como a otimização de recursos da rede, personalização de serviços e segurança. Nesta dissertação pretende-se recolher um conjunto sistemático de capturas de tráfego correspondentes à utilização de diversas aplicações Web e efetuar a caraterização estatística do tráfego correspondente a cada aplicação em redes protegidas. O tráfego obtido (e as respetivas estatísticas) será posteriormente utilizado para validar metodologias de identificação de aplicações e caraterização do perfil de utilizadores da Internet. O desenvolvimento de diversas metodologias estatísticas permite caraterizar o tráfego associado a cada utilizador (tanto em redes wireless como wired) com base em informação estatística do tráfego por ele gerado enquanto utiliza os diversos serviços de rede. Neste sentido, é muito importante dispor de capturas de tráfego real que sejam representativas de uma utilização comum das diversas aplicações Web. Serviços on-line como notícias, email, redes sociais, partilha de fotografias e de vídeos podem ser estudados e caraterizados através da análise estatística do tráfego gerado pela utilização de aplicações como jornais on-line, Youtube, Flickr, GMail, Facebook, entre outras. Ao extrair as métricas de tráfego ao nível da camada 2, realizar a decomposição baseada em Wavelets e analisar os escalogramas obtidos, será possível avaliar as diferentes componentes de tempo e de frequência do tráfego analisado. Será então possível definir um perfil de comunicação capaz de descrever o espetro de frequência característico de cada aplicação web. Consequentemente, será possível identificar as aplicações utilizadas pelos diferentes clientes ligados e criar perfis de utilizadores com precisão.Nowadays, Internet can be seen as an mix of services and applications that run over common protocols. The emergence of several web-based applications changed the users interaction paradigm by placing them in a more active role, allowing users to share photos, videos and much more. The analysis of each user profile, both in wired and wireless networks, can become very interesting for tasks such as network resources optimization, service customization and security. This thesis aims to collect a systematic set of traffic captures corresponding to the use of several web-based applications in protected networks and perform a statistical traffic characterization for each application. The captured traffic (and the corresponding statistics) will be subsequently used to validate the methodologies developed to identify applications and characterize the traffic associated to each user. There are several statistical methodologies that allows the identification of users profiles (on both wireless and wired networks) based on statistical information collected from the traffic generated while using the different network services. In this sense, it is very important to have real traffic captures that are representative of a common use of several web-based applications. On-line services, such as news, e-mail, social networking, photo sharing and videos can be studied and characterized through the statistical analysis of the traffic captured while using applications such as on-line newspapers, Youtube, Flickr, GMail, Facebbok, among others. By extracting layer 2 traffic metrics, performing a wavelet decomposition and analyzing the obtained scalograms, it is possible to evaluate the time and frequency components of the analyzed traffic. A communication profile can then be defined in order to describe the frequency spectrum that is characteristic of each web-based application. By doing that, it will be possible to identify the different applications used by the connected clients and build accurate users profiles

    Analysis and Defense of Emerging Malware Attacks

    Get PDF
    The persistent evolution of malware intrusion brings great challenges to current anti-malware industry. First, the traditional signature-based detection and prevention schemes produce outgrown signature databases for each end-host user and user has to install the AV tool and tolerate consuming huge amount of resources for pairwise matching. At the other side of malware analysis, the emerging malware can detect its running environment and determine whether it should infect the host or not. Hence, traditional dynamic malware analysis can no longer find the desired malicious logic if the targeted environment cannot be extracted in advance. Both these two problems uncover that current malware defense schemes are too passive and reactive to fulfill the task. The goal of this research is to develop new analysis and protection schemes for the emerging malware threats. Firstly, this dissertation performs a detailed study on recent targeted malware attacks. Based on the study, we develop a new technique to perform effectively and efficiently targeted malware analysis. Second, this dissertation studies a new trend of massive malware intrusion and proposes a new protection scheme to proactively defend malware attack. Lastly, our focus is new P2P malware. We propose a new scheme, which is named as informed active probing, for large-scale P2P malware analysis and detection. In further, our internet-wide evaluation shows our active probing scheme can successfully detect malicious P2P malware and its corresponding malicious servers
    corecore