119 research outputs found

    BGP Anomaly Detection with Balanced Datasets

    Get PDF
    We use machine learning techniques to build predictive models for anomaly detection in the Border Gateway Protocol (BGP). Imbalanced datasets of network anomalies pose limitations to building predictive models for anomaly detection. In order to achieve better classification performance measures, we use resampling methods to balance classes in the datasets. We use undersampling, oversampling and combination techniques to change class distributions of the datasets. In this paper we build predictive models based on preprocessed network anomaly datasets of known Internet network anomalies and observe improvement in classifier performance measures compared to those reported in our previous work. We propose to use resampling combination techniques on datasets along with Decision Tree and Naïve Bayes classifiers in order to achieve the best trade-off between (1) the F-measure and the length of model training time, and (2) avoiding overfitting and loss of information

    BGP Hijacking Classification

    Get PDF
    Recent reports show that BGP hijacking has increased substantially. BGP hijacking allows malicious ASes to obtain IP prefixes for spamming as well as intercepting or blackholing traffic. While systems to prevent hijacks are hard to deploy and require the cooperation of many other organizations, techniques to detect hijacks have been a popular area of study. In this paper, we classify detected hijack events in order to document BGP detectors output and understand the nature of reported events. We introduce four categories of BGP hijack: typos, prepending mistakes, origin changes, and forged AS paths. We leverage AS hegemony-a measure of dependency in AS relationship-to identify forged AS paths in a fast and efficient way. Besides, we utilize heuristic approaches to find common operators\u27 mistakes such as typos and AS prepending mistakes. The proposed approach classifies our collected ground truth into four categories with 95.71% accuracy. We characterize publicly reported alarms (e.g. BGPMon) with our trained classifier and find 4%, 1%, and 2% of typos, prepend mistakes, and BGP hijacking with a forged AS path, respectively

    The BGP Visibility Toolkit: detecting anomalous internet routing behavior

    Get PDF
    In this paper, we propose the BGP Visibility Toolkit, a system for detecting and analyzing anomalous behavior in the Internet. We show that interdomain prefix visibility can be used to single out cases of erroneous demeanors resulting from misconfiguration or bogus routing policies. The implementation of routing policies with BGP is a complicated process, involving fine-tuning operations and interactions with the policies of the other active ASes. Network operators might end up with faulty configurations or unintended routing policies that prevent the success of their strategies and impact their revenues. As part of the Visibility Toolkit, we propose the BGP Visibility Scanner, a tool which identifies limited visibility prefixes in the Internet. The tool enables operators to provide feedback on the expected visibility status of prefixes. We build a unique set of ground-truth prefixes qualified by their ASes as intended or unintended to have limited visibility. Using a machine learning algorithm, we train on this unique dataset an alarm system that separates with 95% accuracy the prefixes with unintended limited visibility. Hence, we find that visibility features are generally powerful to detect prefixes which are suffering from inadvertent effects of routing policies. Limited visibility could render a whole prefix globally unreachable. This points towards a serious problem, as limited reachability of a non-negligible set of prefixes undermines the global connectivity of the Internet. We thus verify the correlation between global visibility and global connectivity of prefixes.This work was sup-ported in part by the European Community's Seventh Framework Programme (FP7/2007-2013) under Grant 317647 (Leone)

    Towards a Reliable Comparison and Evaluation of Network Intrusion Detection Systems Based on Machine Learning Approaches

    Get PDF
    Presently, we are living in a hyper-connected world where millions of heterogeneous devices are continuously sharing information in different application contexts for wellness, improving communications, digital businesses, etc. However, the bigger the number of devices and connections are, the higher the risk of security threats in this scenario. To counteract against malicious behaviours and preserve essential security services, Network Intrusion Detection Systems (NIDSs) are the most widely used defence line in communications networks. Nevertheless, there is no standard methodology to evaluate and fairly compare NIDSs. Most of the proposals elude mentioning crucial steps regarding NIDSs validation that make their comparison hard or even impossible. This work firstly includes a comprehensive study of recent NIDSs based on machine learning approaches, concluding that almost all of them do not accomplish with what authors of this paper consider mandatory steps for a reliable comparison and evaluation of NIDSs. Secondly, a structured methodology is proposed and assessed on the UGR'16 dataset to test its suitability for addressing network attack detection problems. The guideline and steps recommended will definitively help the research community to fairly assess NIDSs, although the definitive framework is not a trivial task and, therefore, some extra effort should still be made to improve its understandability and usability further

    CHASING THE UNKNOWN: A PREDICTIVE MODEL TO DEMYSTIFY BGP COMMUNITY SEMANTICS

    Get PDF
    The Border Gateway Protocol (BGP) specifies an optional communities attribute for traffic engineering, route manipulation, remotely-triggered blackholing, and other services. However, communities have neither unifying semantics nor cryptographic protections and often propagate much farther than intended. Consequently, Autonomous System (AS) operators are free to define their own community values. This research is a proof-of-concept for a machine learning approach to prediction of community semantics; it attempts a quantitative measurement of semantic predictability between different AS semantic schemata. Ground-truth community semantics data were collated and manually labeled according to a unified taxonomy of community services. Various classification algorithms, including a feed-forward Multi-Layer Perceptron and a Random Forest, were used as the estimator for a One-vs-All multi-class model and trained according to a feature set engineered from this data. The best model's performance on the test set indicates as much as 89.15% of these semantics can be accurately predicted according to a proposed standard taxonomy of community services. This model was additionally applied to historical BGP data from various route collectors to estimate the taxonomic distribution of communities transiting the control plane.http://archive.org/details/chasingtheunknow1094566047Outstanding ThesisCivilian, CyberCorps - Scholarship For ServiceApproved for public release. distribution is unlimite

    Addressing practical challenges for anomaly detection in backbone networks

    Get PDF
    Network monitoring has always been a topic of foremost importance for both network operators and researchers for multiple reasons ranging from anomaly detection to tra c classi cation or capacity planning. Nowadays, as networks become more and more complex, tra c increases and security threats reproduce, achieving a deeper understanding of what is happening in the network has become an essential necessity. In particular, due to the considerable growth of cybercrime, research on the eld of anomaly detection has drawn signi cant attention in recent years and tons of proposals have been made. All the same, when it comes to deploying solutions in real environments, some of them fail to meet some crucial requirements. Taking this into account, this thesis focuses on lling this gap between the research and the non-research world. Prior to the start of this work, we identify several problems. First, there is a clear lack of detailed and updated information on the most common anomalies and their characteristics. Second, unawareness of sampled data is still common although the performance of anomaly detection algorithms is severely a ected. Third, operators currently need to invest many work-hours to manually inspect and also classify detected anomalies to act accordingly and take the appropriate mitigation measures. This is further exacerbated due to the high number of false positives and false negatives and because anomaly detection systems are often perceived as extremely complex black boxes. Analysing an issue is essential to fully comprehend the problem space and to be able to tackle it properly. Accordingly, the rst block of this thesis seeks to obtain detailed and updated real-world information on the most frequent anomalies occurring in backbone networks. It rst reports on the performance of di erent commercial systems for anomaly detection and analyses the types of network nomalies detected. Afterwards, it focuses on further investigating the characteristics of the anomalies found in a backbone network using one of the tools for more than half a year. Among other results, this block con rms the need of applying sampling in an operational environment as well as the unacceptably high number of false positives and false negatives still reported by current commercial tools. On the whole, the presence of ampling in large networks for monitoring purposes has become almost mandatory and, therefore, all anomaly detection algorithms that do not take that into account might report incorrect results. In the second block of this thesis, the dramatic impact of sampling on the performance of well-known anomaly detection techniques is analysed and con rmed. However, we show that the results change signi cantly depending on the sampling technique used and also on the common metric selected to perform the comparison. In particular, we show that, Packet Sampling outperforms Flow Sampling unlike previously reported. Furthermore, we observe that Selective Sampling (SES), a sampling technique that focuses on small ows, obtains much better results than traditional sampling techniques for scan detection. Consequently, we propose Online Selective Sampling, a sampling technique that obtains the same good performance for scan detection than SES but works on a per-packet basis instead of keeping all ows in memory. We validate and evaluate our proposal and show that it can operate online and uses much less resources than SES. Although the literature is plenty of techniques for detecting anomalous events, research on anomaly classi cation and extraction (e.g., to further investigate what happened or to share evidence with third parties involved) is rather marginal. This makes it harder for network operators to analise reported anomalies because they depend solely on their experience to do the job. Furthermore, this task is an extremely time-consuming and error-prone process. The third block of this thesis targets this issue and brings it together with the knowledge acquired in the previous blocks. In particular, it presents a system for automatic anomaly detection, extraction and classi cation with high accuracy and very low false positives. We deploy the system in an operational environment and show its usefulness in practice. The fourth and last block of this thesis presents a generalisation of our system that focuses on analysing all the tra c, not only network anomalies. This new system seeks to further help network operators by summarising the most signi cant tra c patterns in their network. In particular, we generalise our system to deal with big network tra c data. In particular, it deals with src/dst IPs, src/dst ports, protocol, src/dst Autonomous Systems, layer 7 application and src/dst geolocation. We rst deploy a prototype in the European backbone network of G EANT and show that it can process large amounts of data quickly and build highly informative and compact reports that are very useful to help comprehending what is happening in the network. Second, we deploy it in a completely di erent scenario and show how it can also be successfully used in a real-world use case where we analyse the behaviour of highly distributed devices related with a critical infrastructure sector.La monitoritzaci o de xarxa sempre ha estat un tema de gran import ancia per operadors de xarxa i investigadors per m ultiples raons que van des de la detecci o d'anomalies fins a la classi caci o d'aplicacions. Avui en dia, a mesura que les xarxes es tornen m es i m es complexes, augmenta el tr ansit de dades i les amenaces de seguretat segueixen creixent, aconseguir una comprensi o m es profunda del que passa a la xarxa s'ha convertit en una necessitat essencial. Concretament, degut al considerable increment del ciberactivisme, la investigaci o en el camp de la detecci o d'anomalies ha crescut i en els darrers anys s'han fet moltes i diverses propostes. Tot i aix o, quan s'intenten desplegar aquestes solucions en entorns reals, algunes d'elles no compleixen alguns requisits fonamentals. Tenint aix o en compte, aquesta tesi se centra a omplir aquest buit entre la recerca i el m on real. Abans d'iniciar aquest treball es van identi car diversos problemes. En primer lloc, hi ha una clara manca d'informaci o detallada i actualitzada sobre les anomalies m es comuns i les seves caracter stiques. En segona inst ancia, no tenir en compte la possibilitat de treballar amb nom es part de les dades (mostreig de tr ansit) continua sent bastant est es tot i el sever efecte en el rendiment dels algorismes de detecci o d'anomalies. En tercer lloc, els operadors de xarxa actualment han d'invertir moltes hores de feina per classi car i inspeccionar manualment les anomalies detectades per actuar en conseqüencia i prendre les mesures apropiades de mitigaci o. Aquesta situaci o es veu agreujada per l'alt nombre de falsos positius i falsos negatius i perqu e els sistemes de detecci o d'anomalies s on sovint percebuts com caixes negres extremadament complexes. Analitzar un tema es essencial per comprendre plenament l'espai del problema i per poder-hi fer front de forma adequada. Per tant, el primer bloc d'aquesta tesi pret en proporcionar informaci o detallada i actualitzada del m on real sobre les anomalies m es freqüents en una xarxa troncal. Primer es comparen tres eines comercials per a la detecci o d'anomalies i se n'estudien els seus punts forts i febles, aix com els tipus d'anomalies de xarxa detectats. Posteriorment, s'investiguen les caracter stiques de les anomalies que es troben en la mateixa xarxa troncal utilitzant una de les eines durant m es de mig any. Entre d'altres resultats, aquest bloc con rma la necessitat de l'aplicaci o de mostreig de tr ansit en un entorn operacional, aix com el nombre inacceptablement elevat de falsos positius i falsos negatius en eines comercials actuals. En general, el mostreig de tr ansit de dades de xarxa ( es a dir, treballar nom es amb una part de les dades) en grans xarxes troncals s'ha convertit en gaireb e obligatori i, per tant, tots els algorismes de detecci o d'anomalies que no ho tenen en compte poden veure seriosament afectats els seus resultats. El segon bloc d'aquesta tesi analitza i confi rma el dram atic impacte de mostreig en el rendiment de t ecniques de detecci o d'anomalies plenament acceptades a l'estat de l'art. No obstant, es mostra que els resultats canvien signi cativament depenent de la t ecnica de mostreig utilitzada i tamb e en funci o de la m etrica usada per a fer la comparativa. Contr ariament als resultats reportats en estudis previs, es mostra que Packet Sampling supera Flow Sampling. A m es, a m es, s'observa que Selective Sampling (SES), una t ecnica de mostreig que se centra en mostrejar fluxes petits, obt e resultats molt millors per a la detecci o d'escanejos que no pas les t ecniques tradicionals de mostreig. En conseqü encia, proposem Online Selective Sampling, una t ecnica de mostreig que obt e el mateix bon rendiment per a la detecci o d'escanejos que SES, per o treballa paquet per paquet enlloc de mantenir tots els fluxes a mem oria. Despr es de validar i evaluar la nostra proposta, demostrem que es capa c de treballar online i utilitza molts menys recursos que SES. Tot i la gran quantitat de tècniques proposades a la literatura per a la detecci o d'esdeveniments an omals, la investigaci o per a la seva posterior classi caci o i extracci o (p.ex., per investigar m es a fons el que va passar o per compartir l'evid encia amb tercers involucrats) es m es aviat marginal. Aix o fa que sigui m es dif cil per als operadors de xarxa analalitzar les anomalies reportades, ja que depenen unicament de la seva experi encia per fer la feina. A m es a m es, aquesta tasca es un proc es extremadament lent i propens a errors. El tercer bloc d'aquesta tesi se centra en aquest tema tenint tamb e en compte els coneixements adquirits en els blocs anteriors. Concretament, presentem un sistema per a la detecci o extracci o i classi caci o autom atica d'anomalies amb una alta precisi o i molt pocs falsos positius. Adicionalment, despleguem el sistema en un entorn operatiu i demostrem la seva utilitat pr actica. El quart i ultim bloc d'aquesta tesi presenta una generalitzaci o del nostre sistema que se centra en l'an alisi de tot el tr ansit, no nom es en les anomalies. Aquest nou sistema pret en ajudar m es als operadors ja que resumeix els patrons de tr ansit m es importants de la seva xarxa. En particular, es generalitza el sistema per fer front al "big data" (una gran quantitat de dades). En particular, el sistema tracta IPs origen i dest i, ports origen i destí , protocol, Sistemes Aut onoms origen i dest , aplicaci o que ha generat el tr ansit i fi nalment, dades de geolocalitzaci o (tamb e per origen i dest ). Primer, despleguem un prototip a la xarxa europea per a la recerca i la investigaci o (G EANT) i demostrem que el sistema pot processar grans quantitats de dades r apidament aix com crear informes altament informatius i compactes que s on de gran utilitat per ajudar a comprendre el que est a succeint a la xarxa. En segon lloc, despleguem la nostra eina en un escenari completament diferent i mostrem com tamb e pot ser utilitzat amb exit en un cas d' us en el m on real en el qual s'analitza el comportament de dispositius altament distribuïts

    Tiresias: Online Anomaly Detection for Hierarchical Operational Network Data

    Full text link
    Operational network data, management data such as customer care call logs and equipment system logs, is a very important source of information for network operators to detect problems in their networks. Unfortunately, there is lack of efficient tools to automatically track and detect anomalous events on operational data, causing ISP operators to rely on manual inspection of this data. While anomaly detection has been widely studied in the context of network data, operational data presents several new challenges, including the volatility and sparseness of data, and the need to perform fast detection (complicating application of schemes that require offline processing or large/stable data sets to converge). To address these challenges, we propose Tiresias, an automated approach to locating anomalous events on hierarchical operational data. Tiresias leverages the hierarchical structure of operational data to identify high-impact aggregates (e.g., locations in the network, failure modes) likely to be associated with anomalous events. To accommodate different kinds of operational network data, Tiresias consists of an online detection algorithm with low time and space complexity, while preserving high detection accuracy. We present results from two case studies using operational data collected at a large commercial IP network operated by a Tier-1 ISP: customer care call logs and set-top box crash logs. By comparing with a reference set verified by the ISP's operational group, we validate that Tiresias can achieve >94% accuracy in locating anomalies. Tiresias also discovered several previously unknown anomalies in the ISP's customer care cases, demonstrating its effectiveness

    A system for the detection of limited visibility in BGP

    Get PDF
    Mención Internacional en el título de doctorThe performance of the global routing system is vital to thousands of entities operating the Autonomous Systems (ASes) which make up the Internet. The Border Gateway Protocol (BGP) is currently responsible for the exchange of reachability information and the selection of paths according to their specified routing policies. BGP thus enables traffic to flow from any point to another connected to the Internet. The manner traffic flows if often influenced by entities in the Internet according to their preferences. The latter are implemented in the form of routing policies by tweaking BGP configurations. Routing policies are usually complex and aim to achieve a myriad goals, including technical, economic and political purposes. Additionally, individual network managers need to permanently adapt to the interdomain routing changes and, by engineering the Internet traffic, optimize the use of their network. Despite the flexibility offered, the implementation of routing policies is a complicated process in itself, involving fine-tuning operations. Thus, it is an error-prone task and operators might end up with faulty configurations that impact the efficacy of their strategies or, more importantly, their revenues. Withal, even when correctly defining legitimate routing policies, unforeseen interactions between ASes have been observed to cause important disruptions that affect the global routing system. The main reason behind this resides in the fact that the actual inter-domain routing is the result of the interplay of many routing policies from ASes across the Internet, possibly bringing about a different outcome than the one expected. In this thesis, we perform an extensive analysis of the intricacies emerging from the complex netting of routing policies at the interdomain level, in the context of the current operational status of the Internet. Abundant implications on the way traffic flows in the Internet arise from the convolution of routing policies at a global scale, at times resulting in ASes using suboptimal ill-favored paths or in the undetected propagation of configuration errors in routing system. We argue here that monitoring prefix visibility at the interdomain level can be used to detect cases of faulty configurations or backfired routing policies, which disrupt the functionality of the routing system. We show that the lack of global prefix visibility can offer early warning signs for anomalous events which, despite their impact, often remain hidden from state of the art tools. Additionally, we show that such unintended Internet behavior not only degrades the efficacy of the routing policies implemented by operators, causing their traffic to follow ill-favored paths, but can also point out problems in the global connectivity of prefixes. We further observe that majority of prefixes suffering from limited visibility at the interdomain level is a set of more-specific prefixes, often used by network operators to fulfill binding traffic engineering needs. One important task achieved through the use of routing policies for traffic engineering is the control and optimization of the routing function in order to allow the ASes to engineer the incoming traffic. The advertisement of more-specific prefixes, also known as prefix deaggregation, provides network operators with a fine-grained method to control the interdomain ingress traffic, given that the longest-prefix match rule over-rides any other routing policy applied to the covering lessspecific prefixes. Nevertheless, however efficient, this traffic engineering tool comes with a cost, which is usually externalized to the entire Internet community. Prefix deaggregation is a known reason for the artificial inflation of the BGP routing table, which can further affect the scalability of the global routing system. Looking past the main motivation for deploying deaggregation in the first place, we identify and analyze here the economic impact of this type of strategy. We propose a general Internet model to analyze the effect that advertising more-specific prefixes has on the incoming transit traffic burstiness. We show that deaggregation combined with selective advertisements (further defined as strategic deaggregation) has a traffic stabilization side-effect, which translates into a decrease of the transit traffic bill. Next, we develop a methodology for Internet Service Providers (ISPs) to monitor general occurrences of deaggregation within their customer base. Furthermore, the ISPs can detect selective advertisements of deaggregated prefixes, and thus identify customers which may impact the business of their providers. We apply the proposed methodology on a complete set of data including routing, traffic, topological and billing information provided by an operational ISP and we discuss the obtained results.Programa Oficial de Doctorado en Ingeniería TelemáticaPresidente: Arturo Azcorra Saloña.- Secretario: Steffano Vissichio.- Vocal: Kc. Claff

    Detecting IP prefix hijack events using BGP activity and AS connectivity analysis

    Get PDF
    The Border Gateway Protocol (BGP), the main component of core Internet connectivity, suffers vulnerability issues related to the impersonation of the ownership of IP prefixes for Autonomous Systems (ASes). In this context, a number of studies have focused on securing the BGP through several techniques, such as monitoring-based, historical-based and statistical-based behavioural models. In spite of the significant research undertaken, the proposed solutions cannot detect the IP prefix hijack accurately or even differentiate it from other types of attacks that could threaten the performance of the BGP. This research proposes three novel detection methods aimed at tracking the behaviour of BGP edge routers and detecting IP prefix hijacks based on statistical analysis of variance, the attack signature approach and a classification-based technique. The first detection method uses statistical analysis of variance to identify hijacking behaviour through the normal operation of routing information being exchanged among routers and their behaviour during the occurrence of IP prefix hijacking. However, this method failed to find any indication of IP prefix hijacking because of the difficulty of having raw BGP data hijacking-free. The research also proposes another detection method that parses BGP advertisements (announcements) and checks whether IP prefixes are announced or advertised by more than one AS. If so, events are selected for further validation using Regional Internet Registry (RIR) databases to determine whether the ASes announcing the prefixes are owned by the same organisation or different organisations. Advertisements for the same IP prefix made by ASes owned by different organisations are subsequently identified as hijacking events. The proposed algorithm of the detection method was validated using the 2008 YouTube Pakistan hijack event; the analysis demonstrates that the algorithm qualitatively increases the accuracy of detecting IP prefix hijacks. The algorithm is very accurate as long as the RIRs (Regional Internet Registries) are updated concurrently with hijacking detection. The detection method and can be integrated and work with BGP routers separately. Another detection method is proposed to detect IP prefix hijacking using a combination of signature-based (parsing-based) and classification-based techniques. The parsing technique is used as a pre-processing phase before the classification-based method. Some features are extracted based on the connectivity behaviour of the suspicious ASes given by the parsing technique. In other words, this detection method tracks the behaviour of the suspicious ASes and follows up with an analysis of their interaction with directly and indirectly connected neighbours based on a set of features extracted from the ASPATH information about the suspicious ASes. Before sending the extracted feature values to the best five classifiers that can work with the specifications of an implemented classification dataset, the detection method computes the similarity between benign and malicious behaviours to determine to what extent the classifiers can distinguish suspicious behaviour from benign behaviour and then detect the hijacking. Evaluation tests of the proposed algorithm demonstrated that the detection method was able to detect the hijacks with 96% accuracy and can be integrated and work with BGP routers separately.Saudi Cultural Burea

    Distributed Internet security and measurement

    Get PDF
    The Internet has developed into an important economic, military, academic, and social resource. It is a complex network, comprised of tens of thousands of independently operated networks, called Autonomous Systems (ASes). A significant strength of the Internet\u27s design, one which enabled its rapid growth in terms of users and bandwidth, is that its underlying protocols (such as IP, TCP, and BGP) are distributed. Users and networks alike can attach and detach from the Internet at will, without causing major disruptions to global Internet connectivity. This dissertation shows that the Internet\u27s distributed, and often redundant structure, can be exploited to increase the security of its protocols, particularly BGP (the Internet\u27s interdomain routing protocol). It introduces Pretty Good BGP, an anomaly detection protocol coupled with an automated response that can protect individual networks from BGP attacks. It also presents statistical measurements of the Internet\u27s structure and uses them to create a model of Internet growth. This work could be used, for instance, to test upcoming routing protocols on ensemble of large, Internet-like graphs. Finally, this dissertation shows that while the Internet is designed to be agnostic to political influence, it is actually quite centralized at the country level. With the recent rise in country-level Internet policies, such as nation-wide censorship and warrantless wiretaps, this centralized control could have significant impact on international reachability
    corecore