1,822 research outputs found
Addressing practical challenges for anomaly detection in backbone networks
Network monitoring has always been a topic of foremost importance for both network operators and researchers for multiple reasons ranging from anomaly detection to tra c classi cation or capacity planning. Nowadays, as networks become more and more complex, tra c increases and security threats reproduce, achieving a deeper understanding of what is happening in the network has become an essential necessity. In particular, due to the considerable growth of cybercrime, research on the eld of anomaly detection has drawn signi cant attention in recent years and tons of proposals have been made. All the same, when it comes to deploying solutions in real environments, some of them fail to meet some crucial requirements. Taking this into account, this thesis focuses on lling this gap between the research
and the non-research world. Prior to the start of this work, we identify several problems. First, there is a clear lack of detailed and updated information on the most common anomalies and their characteristics. Second, unawareness of sampled data is still common although the performance of anomaly detection algorithms is severely a ected. Third, operators currently need to invest many work-hours to manually inspect and also classify detected anomalies to act accordingly and take the appropriate mitigation measures. This is further exacerbated due to the high number of false positives and false negatives and because anomaly detection systems are often perceived as extremely complex black boxes. Analysing an issue is essential to fully comprehend the problem space and to be able to tackle it properly. Accordingly, the rst block of this thesis seeks to obtain detailed and updated real-world information on the most frequent anomalies occurring in backbone networks. It rst reports on the performance of di erent commercial systems for anomaly detection and analyses the types of network nomalies detected. Afterwards, it focuses on further investigating the characteristics of the anomalies found in a backbone network using one of the tools for more than half a year. Among other results, this block con rms the need of applying sampling in an operational environment as well as the unacceptably high number of false positives and false negatives still reported by current commercial tools. On the whole, the presence of ampling in large networks for monitoring purposes has become almost mandatory and, therefore, all anomaly detection algorithms that do not take that into account might report incorrect results.
In the second block of this thesis, the dramatic impact of sampling on the performance of well-known anomaly detection techniques is analysed and con rmed. However, we show that the results change signi cantly depending on the sampling technique used and also on
the common metric selected to perform the comparison. In particular, we show that, Packet Sampling outperforms Flow Sampling unlike previously reported. Furthermore, we observe that Selective Sampling (SES), a sampling technique that focuses on small ows, obtains much better results than traditional sampling techniques for scan detection. Consequently, we propose Online Selective Sampling, a sampling technique that obtains the same good performance for scan detection than SES but works on a per-packet basis instead of keeping all
ows in memory. We validate and evaluate our proposal and show that it can operate online and uses much less resources than SES.
Although the literature is plenty of techniques for detecting anomalous events, research on anomaly classi cation and extraction (e.g., to further investigate what happened or to share evidence with third parties involved) is rather marginal. This makes it harder for network operators to analise reported anomalies because they depend solely on their experience to do the job. Furthermore, this task is an extremely
time-consuming and error-prone process.
The third block of this thesis targets this issue and brings it together with the knowledge acquired in the previous blocks. In particular, it presents a system for automatic anomaly detection, extraction and classi cation with high accuracy and very low false positives. We deploy the system in an operational environment and show its usefulness in practice.
The fourth and last block of this thesis presents a generalisation of our system that focuses on analysing all the tra c, not only network anomalies. This new system seeks to further help network operators by summarising the most signi cant tra c patterns in their network. In particular, we generalise our system to deal with big network tra c data. In particular, it deals with src/dst IPs, src/dst ports, protocol, src/dst
Autonomous Systems, layer 7 application and src/dst geolocation. We rst deploy a prototype in the European backbone network of G EANT and show that it can process large amounts of data quickly and build highly informative and compact reports that are very useful to help comprehending what is happening in the network. Second, we deploy it in a completely di erent scenario and show how it can also be successfully used in a real-world use case where we analyse the behaviour of highly distributed devices related with a critical infrastructure sector.La monitoritzaci o de xarxa sempre ha estat un tema de gran import ancia per operadors de xarxa i investigadors per m ultiples raons que van des de la detecci o d'anomalies fins a la classi caci o d'aplicacions. Avui en dia, a mesura que les xarxes es tornen m es i m es complexes, augmenta el tr ansit de dades i les amenaces de seguretat segueixen creixent, aconseguir una comprensi o m es profunda del que passa a la xarxa s'ha convertit en una necessitat essencial.
Concretament, degut al considerable increment del ciberactivisme, la investigaci o en el camp de la detecci o d'anomalies ha crescut i en els darrers anys s'han fet moltes i diverses propostes. Tot i aix o, quan s'intenten desplegar aquestes solucions en entorns reals, algunes d'elles no compleixen alguns requisits fonamentals. Tenint aix o en compte, aquesta tesi se centra a omplir aquest buit entre la recerca i el m on real. Abans d'iniciar aquest treball es van identi car diversos problemes. En primer lloc, hi ha una clara manca d'informaci o detallada i actualitzada sobre les anomalies m es comuns i les seves caracter stiques. En segona inst ancia, no tenir en compte la possibilitat de treballar amb nom es part de les dades (mostreig de tr ansit) continua sent bastant est es tot i el sever efecte en el rendiment dels algorismes de detecci o d'anomalies. En tercer lloc, els operadors de xarxa actualment han d'invertir moltes hores de feina per classi car i inspeccionar manualment les anomalies detectades per actuar en conseqüencia i prendre les mesures apropiades de mitigaci o. Aquesta situaci o es veu agreujada per l'alt nombre de falsos positius i falsos negatius i perqu e els sistemes de detecci o d'anomalies s on sovint percebuts com caixes negres extremadament complexes.
Analitzar un tema es essencial per comprendre plenament l'espai del problema i per poder-hi fer front de forma adequada. Per tant, el primer bloc d'aquesta tesi pret en proporcionar informaci o detallada i actualitzada del m on real sobre les anomalies m es freqüents en una xarxa troncal.
Primer es comparen tres eines comercials per a la detecci o d'anomalies i se n'estudien els seus punts forts i febles, aix com els tipus
d'anomalies de xarxa detectats. Posteriorment, s'investiguen les caracter stiques de les anomalies que es troben en la mateixa xarxa troncal utilitzant una de les eines durant m es de mig any. Entre d'altres resultats, aquest bloc con rma la necessitat de l'aplicaci o de mostreig de tr ansit en un entorn operacional, aix com el nombre inacceptablement elevat de falsos positius i falsos negatius en eines comercials actuals.
En general, el mostreig de tr ansit de dades de xarxa ( es a dir, treballar nom es amb una part de les dades) en grans xarxes troncals s'ha convertit en gaireb e obligatori i, per tant, tots els algorismes de detecci o d'anomalies que no ho tenen en compte poden veure seriosament afectats els seus resultats. El segon bloc d'aquesta tesi analitza i confi rma el dram atic impacte de mostreig en el rendiment de t ecniques de detecci o d'anomalies plenament acceptades a l'estat de l'art. No obstant, es mostra que els resultats canvien signi cativament depenent de la
t ecnica de mostreig utilitzada i tamb e en funci o de la m etrica usada per a fer la comparativa. Contr ariament als resultats reportats en estudis previs, es mostra que Packet Sampling supera Flow Sampling. A m es, a m es, s'observa que Selective Sampling (SES), una t ecnica de mostreig que se centra en mostrejar fluxes petits, obt e resultats molt millors per a la detecci o d'escanejos que no pas les t ecniques tradicionals de mostreig. En conseqü encia, proposem Online Selective Sampling, una t ecnica de mostreig que obt e el mateix bon rendiment per a la detecci o d'escanejos que SES, per o treballa paquet per paquet enlloc de mantenir tots els fluxes a mem oria. Despr es de validar i evaluar la nostra proposta, demostrem que es capa c de treballar online i utilitza molts menys recursos que SES. Tot i la gran quantitat de tècniques proposades a la literatura per a la detecci o d'esdeveniments an omals, la investigaci o per a la seva posterior classi caci o i extracci o
(p.ex., per investigar m es a fons el que va passar o per compartir l'evid encia amb tercers involucrats) es m es aviat marginal. Aix o fa que sigui m es dif cil per als operadors de xarxa analalitzar les anomalies reportades, ja que depenen unicament de la seva experi encia per fer la feina. A m es a m es, aquesta tasca es un proc es extremadament lent i propens a errors. El tercer bloc d'aquesta tesi se centra en aquest tema tenint tamb e en compte els coneixements adquirits en els blocs anteriors. Concretament, presentem un sistema per a la detecci o extracci o i classi caci o autom atica d'anomalies amb una alta precisi o i molt pocs falsos positius. Adicionalment, despleguem el sistema en un entorn operatiu i demostrem la seva utilitat pr actica. El quart i ultim bloc d'aquesta tesi presenta una generalitzaci o del nostre sistema que se centra en l'an alisi de tot el tr ansit, no nom es en les anomalies. Aquest nou sistema pret en ajudar m es als operadors ja que resumeix els patrons de tr ansit m es importants de la seva xarxa. En particular, es generalitza el sistema per fer front al "big data" (una gran quantitat de dades). En particular, el sistema tracta IPs origen i dest i, ports origen i destà , protocol, Sistemes Aut onoms origen i dest , aplicaci o que ha generat el tr ansit i fi nalment, dades de geolocalitzaci o (tamb e per origen i dest ). Primer, despleguem un prototip a la xarxa europea per a la recerca i la investigaci o (G EANT) i demostrem que el sistema pot processar grans quantitats de dades r apidament aix com crear informes altament informatius i compactes que s on de gran utilitat per ajudar a comprendre el que est a succeint a la xarxa. En segon lloc, despleguem la nostra eina en un escenari completament diferent i mostrem com tamb e pot ser utilitzat amb exit en un cas d' us en el m on real en el qual s'analitza el comportament de dispositius altament distribuïts
Detection of unsolicited web browsing with clustering and statistical analysis
Unsolicited web browsing denotes illegitimate accessing or processing web content. The harmful activity varies from extracting e-mail information to downloading entire website for duplication. In addition, computer criminals prevent legitimate users from gaining access to websites by implementing a denial of service attack with high-volume legitimate traffic. These offences are accomplished by preprogrammed machines that avoid rate-dependent intrusion detection systems. Therefore, it is assumed in this thesis that the only difference between a legitimate and malicious web session is in the intention rather than physical characteristics or network-layer information. As a result, the main aim of this research has been to provide a method of malicious intention detection. This has been accomplished by two-fold process. Initially, to discover most recent and popular transitions of lawful users, a clustering method has been introduced based on entropy minimisation. In principle, by following popular transitions among the web objects, the legitimate users are placed in low-entropy clusters, as opposed to the undesired hosts whose transitions are uncommon, and lead to placement in high-entropy clusters. In addition, by comparing distributions of sequences of requests generated by the actual and malicious users across the clusters, it is possible to discover whether or not a website is under attack. Secondly, a set of statistical measurements have been tested to detect the actual intention of browsing hosts. The intention classification based on Bayes factors and likelihood analysis have provided the best results. The combined approach has been validated against actual web traces (i.e. datasets), and generated promising results
New Methods for Network Traffic Anomaly Detection
In this thesis we examine the efficacy of applying outlier detection techniques to understand the behaviour of anomalies in communication network traffic. We have identified several shortcomings. Our most finding is that known techniques either focus on characterizing the spatial or temporal behaviour of traffic but rarely both. For example DoS attacks are anomalies which violate temporal patterns while port scans violate the spatial equilibrium of network traffic. To address this observed weakness we have designed a new method for outlier detection based spectral decomposition of the Hankel matrix. The Hankel matrix is spatio-temporal correlation matrix and has been used in many other domains including climate data analysis and econometrics. Using our approach we can seamlessly integrate the discovery of both spatial and temporal anomalies. Comparison with other state of the art methods in the networks community confirms that our approach can discover both DoS and port scan attacks. The spectral decomposition of the Hankel matrix is closely tied to the problem of inference in Linear Dynamical Systems (LDS). We introduce a new problem, the Online Selective Anomaly Detection (OSAD) problem, to model the situation where the objective is to report new anomalies in the system and suppress know faults. For example, in the network setting an operator may be interested in triggering an alarm for malicious attacks but not on faults caused by equipment failure. In order to solve OSAD we combine techniques from machine learning and control theory in a unique fashion. Machine Learning ideas are used to learn the parameters of an underlying data generating system. Control theory techniques are used to model the feedback and modify the residual generated by the data generating state model. Experiments on synthetic and real data sets confirm that the OSAD problem captures a general scenario and tightly integrates machine learning and control theory to solve a practical problem
Machine Learning Aided Static Malware Analysis: A Survey and Tutorial
Malware analysis and detection techniques have been evolving during the last
decade as a reflection to development of different malware techniques to evade
network-based and host-based security protections. The fast growth in variety
and number of malware species made it very difficult for forensics
investigators to provide an on time response. Therefore, Machine Learning (ML)
aided malware analysis became a necessity to automate different aspects of
static and dynamic malware investigation. We believe that machine learning
aided static analysis can be used as a methodological approach in technical
Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware
analysis that has been thoroughly studied before. In this paper, we address
this research gap by conducting an in-depth survey of different machine
learning methods for classification of static characteristics of 32-bit
malicious Portable Executable (PE32) Windows files and develop taxonomy for
better understanding of these techniques. Afterwards, we offer a tutorial on
how different machine learning techniques can be utilized in extraction and
analysis of a variety of static characteristic of PE binaries and evaluate
accuracy and practical generalization of these techniques. Finally, the results
of experimental study of all the method using common data was given to
demonstrate the accuracy and complexity. This paper may serve as a stepping
stone for future researchers in cross-disciplinary field of machine learning
aided malware forensics.Comment: 37 Page
Reliable Navigational Scene Perception for Autonomous Ships in Maritime Environment
Due to significant advances in robotics and transportation, research on autonomous
ships has attracted considerable attention. The most critical task is to make the
ships capable of accurately, reliably, and intelligently detecting their surroundings
to achieve high levels of autonomy. Three deep learning-based models are constructed
in this thesis to perform complex perceptual tasks such as identifying ships,
analysing encounter situations, and recognising water surface objects. In this thesis,
sensors, including the Automatic Identification System (AIS) and cameras, provide
critical information for scene perception. Specifically, the AIS enables mid-range
and long-range detection, assisting the decision-making system to take suitable and
decisive action. A Convolutional Neural Network-Ship Movement Modes Classification
(CNN-SMMC) is used to detect ships or objects. Following that, a Semi-
Supervised Convolutional Encoder-Decoder Network (SCEDN) is developed to
classify ship encounter situations and make a collision avoidance plan for the moving
ships or objects. Additionally, cameras are used to detect short-range objects, a
supplementary solution to ships or objects not equipped with an AIS. A Water Obstacle
Detection Network based on Image Segmentation (WODIS) is developed to
find potential threat targets. A series of quantifiable experiments have demonstrated
that these models can provide reliable scene perception for autonomous ships
Vehicle Keypoint Detection and Fine-Grained Classification using Deep Learning
Los sistemas de detección de puntos clave en vehÃculos y de clasificación por marca y modelo han visto como sus capacidades evolucionaban a un ritmo nunca antes visto, pasando de rendimientos pobres a resultados increÃbles en cuestión de unos años. La irrupción de las redes neuronales convolucionales y la disponibilidad de datos y sistemas de procesamiento cada vez más potentes han permitido que, mediante el uso de modelos cada vez más complejos, estos y muchos otros problemas sean afrontados y resueltos con enfoques muy diversos.
Esta tesis se centra en el problema de detección de puntos clave y clasificación a nivel de marca y modelo de vehÃculos con un enfoque basado en aprendizaje profundo. Tras el análisis de los conjuntos datos existentes para afrontar ambas tareas se ha optado por crear tres bases de datos especÃficas. La primera, orientada a la detección de puntos clave en vehÃculos, es una mejora y extensión del famoso conjunto de datos PASCAL3D+, reetiquetando parte del mismo y añadiendo nuevos keypoints e imágenes para aportar mayor variabilidad. La segunda, se trata de un conjunto de prueba de clasificación de vehÃculos por marca y modelo basado en The PREVENTION dataset, una base de datos de predicción de trayectoria de vehÃculos en entornos de circulación real. Por último, un conjunto de datos cruzados (Cross-dataset) compuesto por las marcas y modelos comunes de tres de las principales bases de datos de clasificación de vehÃculos, CompCars, VMMR-db y Frontal-103.
El sistema de detección de puntos clave se basa en un método de detección de pose en humanos que mediante el uso de redes neuronales convolucionales y capas de-convolucionales genera, a partir de una imagen de entrada, un mapa de calor por cada punto clave. La red ha sido modificada para ajustarse al problema de detección de puntos clave en vehÃculos obteniendo resultados que mejoran el estado del arte sin hacer uso de complejas arquitecturas o metodologÃas. Adicionalmente se ha analizado la idoneidad de los puntos clave de PASCAL3D+, validando la propuesta de nuevos puntos clave como una mejor alternativa.
El sistema de clasificación de vehÃculos por marca y modelo se basa en el uso de redes preentrenadas en el famoso conjunto de datos ImageNet y adaptadas al problema de clasificación de vehÃculos. Uno de los problemas detectados en el estado del arte es la saturación de los resultados en las bases de datos existentes que, por otra parte, se encuentran sesgadas, limitando la capacidad de generalización de los modelos entrenados con ellas. Se han usado múltiples técnicas de aprendizaje y ponderación de los datos para tratar de aliviar el impacto del sesgo de los conjuntos de datos. Para poder evaluar la capacidad de generalización en situaciones reales de los modelos entrenados, se ha hecho uso del conjunto de pruebas derivado del PREVENTION dataset. Adicionalmente, se ha hecho uso del Cross-dataset para evaluar la complejidad de las bases de datos existentes y las capacidades de generalización de los modelos entrenados con ellas. Se demuestra que, sin hacer uso de complejas arquitecturas, se pueden obtener resultados competitivos y la necesidad de un conjunto de datos que refleje de manera adecuada el mundo real para poder afrontar adecuadamente el problema de clasificación de vehÃculos.Vehicle keypoint detection and fine-grained classification systems have seen their capabilities
evolve at an unprecedented rate, from poor performance to incredible results in a matter of a
few years. The advent of convolutional neural networks and the availability of large amounts of
data and progress in computational capabilities have allowed these and many other problems to
be tackled and solved with very different approaches using increasingly complex models.
This thesis focuses on the problems of keypoint detection and fine-grained classification of
vehicles with a deep learning approach. After the analysis of the existing datasets to tackle both
tasks, three new datasets have been built. The first one, oriented to the detection of keypoints
in vehicles, is an improvement and extension of the famous PASCAL3D+ dataset, re-labelling
part of it and adding new keypoints and images to provide more variability. The second is
a vehicle make and model classification test set based on the PREVENTION dataset, a realworld
driving scenario vehicle trajectory prediction dataset. Finally, a cross-dataset composed
of common makes and models from three major vehicle classification databases, CompCars,
VMMR-db and Frontal-103.
The keypoint detection system is based on a human pose detection method that by using
convolutional neural networks and deconvolutional layers generates, from an input image, a
heat map for each keypoint. The network has been modified to fit the problem of keypoint
detection in vehicles obtaining results that improve the state of the art without using complex
architectures or methodologies. Additionally, the suitability of the PASCAL3D+ keypoints has
been analysed, validating the proposal of new keypoints as a better alternative.
The vehicle make and model classification system is based on the use of ImageNet pre-trained
networks and fine-tuned for the vehicle classification problem. One of the problems detected in
the state of the art is the saturation of the results in the existing datasets, which, moreover,
are biased, limiting the generalisation capacity of the models trained with them. Multiple data
learning and weighting techniques have been used to try to alleviate the impact of dataset
bias. In order to assess the generalisation capabilities of the trained models in real situations,
the PREVENTION test set has been used. Additionally, the cross-dataset has been used to
evaluate the complexity of the existing datasets and the generalisation capabilities of the models
trained with them. It is shown that competitive results can be achieved without the use of
complex architectures and that a high quality dataset that adequately reflects the real world is
needed in order to properly address the vehicle classification problem
Denial-of-service attack modelling and detection for HTTP/2 services
Businesses and society alike have been heavily dependent on Internet-based services, albeit with experiences of constant and annoying disruptions caused by the adversary class. A malicious attack that can prevent establishment of Internet connections to web servers, initiated from legitimate client machines, is termed as a Denial of Service (DoS) attack; volume and intensity of which is rapidly growing thanks to the readily available attack tools and the ever-increasing network bandwidths. A majority of contemporary web servers are built on the HTTP/1.1 communication protocol. As a consequence, all literature found on DoS attack modelling and appertaining detection techniques, addresses only HTTP/1.x network traffic. This thesis presents a model of DoS attack traffic against servers employing the new communication protocol, namely HTTP/2.
The HTTP/2 protocol significantly differs from its predecessor and introduces new messaging formats and data exchange mechanisms. This creates an urgent need to understand how malicious attacks including Denial of Service, can be launched against HTTP/2 services. Moreover, the ability of attackers to vary the network traffic models to stealthy affects web services, thereby requires extensive research and modelling.
This research work not only provides a novel model for DoS attacks against HTTP/2 services, but also provides a model of stealthy variants of such attacks, that can disrupt routine web services. Specifically, HTTP/2 traffic patterns that consume computing resources of a server, such as CPU utilisation and memory consumption, were thoroughly explored and examined. The study presents four HTTP/2 attack models. The first being a flooding-based attack model, the second being a distributed model, the third and fourth are variant DoS attack models. The attack traffic analysis conducted in this study employed four machine learning techniques, namely Naïve Bayes, Decision Tree, JRip and Support Vector Machines.
The HTTP/2 normal traffic model portrays online activities of human users. The model thus formulated was employed to also generate flash-crowd traffic, i.e. a large volume of normal traffic that incapacitates a web server, similar in fashion to a DoS attack, albeit with non-malicious intent. Flash-crowd traffic generated based on the defined model was used to populate the dataset of legitimate network traffic, to fuzz the machine learning-based attack detection process. The two variants of DoS attack traffic differed in terms of the traffic intensities and the inter-packet arrival delays introduced to better analyse the type and quality of DoS attacks that can be launched against HTTP/2 services.
A detailed analysis of HTTP/2 features is also presented to rank relevant network traffic features for all four traffic models presented. These features were ranked based on legitimate as well as attack traffic observations conducted in this study. The study shows that machine learning-based analysis yields better classification performance, i.e. lower percentage of incorrectly classified instances, when the proposed HTTP/2 features are employed compared to when HTTP/1.1 features alone are used.
The study shows how HTTP/2 DoS attack can be modelled, and how future work can extend the proposed model to create variant attack traffic models that can bypass intrusion-detection systems. Likewise, as the Internet traffic and the heterogeneity of Internet-connected devices are projected to increase significantly, legitimate traffic can yield varying traffic patterns, demanding further analysis. The significance of having current legitimate traffic datasets, together with the scope to extend the DoS attack models presented herewith, suggest that research in the DoS attack analysis and detection area will benefit from the work presented in this thesis
- …