2,119 research outputs found

    Neural Network Models for TCP - SYN Flood Denial of Service Attack Detection With Source and Destination Anonymization

    Get PDF
    The Internet has become a necessity in today\u27s digital world, and making it secure is a pressing concern. Hackers are investing ever-increasing efforts to compromise Internet nodes with novel techniques. According to Forbes, every minute, $ 2,900,000 is lost to cybercrime. A common cyber-attack is Denial of Service (DoS) or Distributed Denial of Service (DDoS) attacks, which may bring a network to a standstill, and unless mitigated, network services could be halted for an extended period. The attack can occur at any layer of the OSI model. This thesis focuses on SYN Flood DoS/DDoS attacks, also known as TCP Flood attacks, and studies the use of artificial neural networks to detect the attacks. Specific neural network models used in this thesis are the Gated Recurrent Units (GRU), the Long Short-Term Memory (LSTM), and a semi-supervised model on label propagation. All neural network models detect attacks by analyzing the individual hexadecimal values in the packet header. A novelty of the approach followed in this thesis is that the neural networks do not consider the lexical values of the network packet (MAC addresses, IP addresses, and port numbers) as input features in their traffic analysis. Instead, the neural network models are designed and trained to detect malicious traffic based on the time pattern of TCP flags. The neural networks base their analysis of traffic on time-sequenced patterns. An important hyperparameter discussed in this paper is the size of the lookup window, that is, the number of past packets the model can access to predict the next packet. Evaluation results based on datasets presented in this thesis show that the accuracies of the GRU, CNN/LSTM, and label propagation models are 81%, 93%, and 96%, respectively

    Addressing practical challenges for anomaly detection in backbone networks

    Get PDF
    Network monitoring has always been a topic of foremost importance for both network operators and researchers for multiple reasons ranging from anomaly detection to tra c classi cation or capacity planning. Nowadays, as networks become more and more complex, tra c increases and security threats reproduce, achieving a deeper understanding of what is happening in the network has become an essential necessity. In particular, due to the considerable growth of cybercrime, research on the eld of anomaly detection has drawn signi cant attention in recent years and tons of proposals have been made. All the same, when it comes to deploying solutions in real environments, some of them fail to meet some crucial requirements. Taking this into account, this thesis focuses on lling this gap between the research and the non-research world. Prior to the start of this work, we identify several problems. First, there is a clear lack of detailed and updated information on the most common anomalies and their characteristics. Second, unawareness of sampled data is still common although the performance of anomaly detection algorithms is severely a ected. Third, operators currently need to invest many work-hours to manually inspect and also classify detected anomalies to act accordingly and take the appropriate mitigation measures. This is further exacerbated due to the high number of false positives and false negatives and because anomaly detection systems are often perceived as extremely complex black boxes. Analysing an issue is essential to fully comprehend the problem space and to be able to tackle it properly. Accordingly, the rst block of this thesis seeks to obtain detailed and updated real-world information on the most frequent anomalies occurring in backbone networks. It rst reports on the performance of di erent commercial systems for anomaly detection and analyses the types of network nomalies detected. Afterwards, it focuses on further investigating the characteristics of the anomalies found in a backbone network using one of the tools for more than half a year. Among other results, this block con rms the need of applying sampling in an operational environment as well as the unacceptably high number of false positives and false negatives still reported by current commercial tools. On the whole, the presence of ampling in large networks for monitoring purposes has become almost mandatory and, therefore, all anomaly detection algorithms that do not take that into account might report incorrect results. In the second block of this thesis, the dramatic impact of sampling on the performance of well-known anomaly detection techniques is analysed and con rmed. However, we show that the results change signi cantly depending on the sampling technique used and also on the common metric selected to perform the comparison. In particular, we show that, Packet Sampling outperforms Flow Sampling unlike previously reported. Furthermore, we observe that Selective Sampling (SES), a sampling technique that focuses on small ows, obtains much better results than traditional sampling techniques for scan detection. Consequently, we propose Online Selective Sampling, a sampling technique that obtains the same good performance for scan detection than SES but works on a per-packet basis instead of keeping all ows in memory. We validate and evaluate our proposal and show that it can operate online and uses much less resources than SES. Although the literature is plenty of techniques for detecting anomalous events, research on anomaly classi cation and extraction (e.g., to further investigate what happened or to share evidence with third parties involved) is rather marginal. This makes it harder for network operators to analise reported anomalies because they depend solely on their experience to do the job. Furthermore, this task is an extremely time-consuming and error-prone process. The third block of this thesis targets this issue and brings it together with the knowledge acquired in the previous blocks. In particular, it presents a system for automatic anomaly detection, extraction and classi cation with high accuracy and very low false positives. We deploy the system in an operational environment and show its usefulness in practice. The fourth and last block of this thesis presents a generalisation of our system that focuses on analysing all the tra c, not only network anomalies. This new system seeks to further help network operators by summarising the most signi cant tra c patterns in their network. In particular, we generalise our system to deal with big network tra c data. In particular, it deals with src/dst IPs, src/dst ports, protocol, src/dst Autonomous Systems, layer 7 application and src/dst geolocation. We rst deploy a prototype in the European backbone network of G EANT and show that it can process large amounts of data quickly and build highly informative and compact reports that are very useful to help comprehending what is happening in the network. Second, we deploy it in a completely di erent scenario and show how it can also be successfully used in a real-world use case where we analyse the behaviour of highly distributed devices related with a critical infrastructure sector.La monitoritzaci o de xarxa sempre ha estat un tema de gran import ancia per operadors de xarxa i investigadors per m ultiples raons que van des de la detecci o d'anomalies fins a la classi caci o d'aplicacions. Avui en dia, a mesura que les xarxes es tornen m es i m es complexes, augmenta el tr ansit de dades i les amenaces de seguretat segueixen creixent, aconseguir una comprensi o m es profunda del que passa a la xarxa s'ha convertit en una necessitat essencial. Concretament, degut al considerable increment del ciberactivisme, la investigaci o en el camp de la detecci o d'anomalies ha crescut i en els darrers anys s'han fet moltes i diverses propostes. Tot i aix o, quan s'intenten desplegar aquestes solucions en entorns reals, algunes d'elles no compleixen alguns requisits fonamentals. Tenint aix o en compte, aquesta tesi se centra a omplir aquest buit entre la recerca i el m on real. Abans d'iniciar aquest treball es van identi car diversos problemes. En primer lloc, hi ha una clara manca d'informaci o detallada i actualitzada sobre les anomalies m es comuns i les seves caracter stiques. En segona inst ancia, no tenir en compte la possibilitat de treballar amb nom es part de les dades (mostreig de tr ansit) continua sent bastant est es tot i el sever efecte en el rendiment dels algorismes de detecci o d'anomalies. En tercer lloc, els operadors de xarxa actualment han d'invertir moltes hores de feina per classi car i inspeccionar manualment les anomalies detectades per actuar en conseqüencia i prendre les mesures apropiades de mitigaci o. Aquesta situaci o es veu agreujada per l'alt nombre de falsos positius i falsos negatius i perqu e els sistemes de detecci o d'anomalies s on sovint percebuts com caixes negres extremadament complexes. Analitzar un tema es essencial per comprendre plenament l'espai del problema i per poder-hi fer front de forma adequada. Per tant, el primer bloc d'aquesta tesi pret en proporcionar informaci o detallada i actualitzada del m on real sobre les anomalies m es freqüents en una xarxa troncal. Primer es comparen tres eines comercials per a la detecci o d'anomalies i se n'estudien els seus punts forts i febles, aix com els tipus d'anomalies de xarxa detectats. Posteriorment, s'investiguen les caracter stiques de les anomalies que es troben en la mateixa xarxa troncal utilitzant una de les eines durant m es de mig any. Entre d'altres resultats, aquest bloc con rma la necessitat de l'aplicaci o de mostreig de tr ansit en un entorn operacional, aix com el nombre inacceptablement elevat de falsos positius i falsos negatius en eines comercials actuals. En general, el mostreig de tr ansit de dades de xarxa ( es a dir, treballar nom es amb una part de les dades) en grans xarxes troncals s'ha convertit en gaireb e obligatori i, per tant, tots els algorismes de detecci o d'anomalies que no ho tenen en compte poden veure seriosament afectats els seus resultats. El segon bloc d'aquesta tesi analitza i confi rma el dram atic impacte de mostreig en el rendiment de t ecniques de detecci o d'anomalies plenament acceptades a l'estat de l'art. No obstant, es mostra que els resultats canvien signi cativament depenent de la t ecnica de mostreig utilitzada i tamb e en funci o de la m etrica usada per a fer la comparativa. Contr ariament als resultats reportats en estudis previs, es mostra que Packet Sampling supera Flow Sampling. A m es, a m es, s'observa que Selective Sampling (SES), una t ecnica de mostreig que se centra en mostrejar fluxes petits, obt e resultats molt millors per a la detecci o d'escanejos que no pas les t ecniques tradicionals de mostreig. En conseqü encia, proposem Online Selective Sampling, una t ecnica de mostreig que obt e el mateix bon rendiment per a la detecci o d'escanejos que SES, per o treballa paquet per paquet enlloc de mantenir tots els fluxes a mem oria. Despr es de validar i evaluar la nostra proposta, demostrem que es capa c de treballar online i utilitza molts menys recursos que SES. Tot i la gran quantitat de tècniques proposades a la literatura per a la detecci o d'esdeveniments an omals, la investigaci o per a la seva posterior classi caci o i extracci o (p.ex., per investigar m es a fons el que va passar o per compartir l'evid encia amb tercers involucrats) es m es aviat marginal. Aix o fa que sigui m es dif cil per als operadors de xarxa analalitzar les anomalies reportades, ja que depenen unicament de la seva experi encia per fer la feina. A m es a m es, aquesta tasca es un proc es extremadament lent i propens a errors. El tercer bloc d'aquesta tesi se centra en aquest tema tenint tamb e en compte els coneixements adquirits en els blocs anteriors. Concretament, presentem un sistema per a la detecci o extracci o i classi caci o autom atica d'anomalies amb una alta precisi o i molt pocs falsos positius. Adicionalment, despleguem el sistema en un entorn operatiu i demostrem la seva utilitat pr actica. El quart i ultim bloc d'aquesta tesi presenta una generalitzaci o del nostre sistema que se centra en l'an alisi de tot el tr ansit, no nom es en les anomalies. Aquest nou sistema pret en ajudar m es als operadors ja que resumeix els patrons de tr ansit m es importants de la seva xarxa. En particular, es generalitza el sistema per fer front al "big data" (una gran quantitat de dades). En particular, el sistema tracta IPs origen i dest i, ports origen i destí , protocol, Sistemes Aut onoms origen i dest , aplicaci o que ha generat el tr ansit i fi nalment, dades de geolocalitzaci o (tamb e per origen i dest ). Primer, despleguem un prototip a la xarxa europea per a la recerca i la investigaci o (G EANT) i demostrem que el sistema pot processar grans quantitats de dades r apidament aix com crear informes altament informatius i compactes que s on de gran utilitat per ajudar a comprendre el que est a succeint a la xarxa. En segon lloc, despleguem la nostra eina en un escenari completament diferent i mostrem com tamb e pot ser utilitzat amb exit en un cas d' us en el m on real en el qual s'analitza el comportament de dispositius altament distribuïts

    Community-Based Behavioral Understanding of Mobility Trends and Public Attitude through Transportation User and Agency Interactions on Social Media in the Emergence of Covid-19

    Get PDF
    The increased availability of technology-enabled transportation options and modern communication devices (smartphones, in particular) is transforming travel-related decision-making in the population differently at different places, points in time, modes of transportation, and socio-economic groups. The emergence of COVID-19 made the dynamics of passenger travel behavior more complex, forcing a worldwide, unparalleled change in human travel behavior and introducing a new normal into their existence. This dissertation explores the potential of social media platforms (SMPs) as a viable alternative to traditional approaches (e.g., travel surveys) to understand the complex dynamics of people’s mobility patterns in the emergence of COVID-19. In this dissertation, we focus on three objectives. First, a novel approach to developing comparative infographics of emerging transportation trends is introduced by natural language processing and data-driven techniques using large-scale social media data. Second, a methodology has been developed to model community-based travel behavior under different socioeconomic and demographic factors at the community level in the emergence of COVID-19 on Twitter, inferring users’ demographics to overcome sampling bias. Third, the communication patterns of different transportation agencies on Twitter regarding message kinds, communication sufficiency, consistency, and coordination were examined by applying text mining techniques and dynamic network analysis. The methodologies and findings of the dissertation will allow real-time monitoring of transportation trends by agencies, researchers, and professionals. Potential applications of the work may include: (1) identifying spatial diversity of public mobility needs and concerns through social media platforms; (2) developing new policies that would satisfy the diverse needs at different locations; (3) introducing new plans to support and celebrate equity, diversity, and inclusion in the transportation sector that would improve the efficient flow of goods and services; (4) designing new methods to model community-based travel behavior at different scales (e.g., census block, zip code, etc.) using social media data inferring users’ socio-economic and demographic properties; and (5) implementing efficient policies to improve existing communication plans, critical information dissemination efficacy, and coordination of different transportation actors to raise awareness among passengers in general and during unprecedented health crises in the fragmented communication world

    Deteção de propagação de ameaças e exfiltração de dados em redes empresariais

    Get PDF
    Modern corporations face nowadays multiple threats within their networks. In an era where companies are tightly dependent on information, these threats can seriously compromise the safety and integrity of sensitive data. Unauthorized access and illicit programs comprise a way of penetrating the corporate networks, able to traversing and propagating to other terminals across the private network, in search of confidential data and business secrets. The efficiency of traditional security defenses are being questioned with the number of data breaches occurred nowadays, being essential the development of new active monitoring systems with artificial intelligence capable to achieve almost perfect detection in very short time frames. However, network monitoring and storage of network activity records are restricted and limited by legal laws and privacy strategies, like encryption, aiming to protect the confidentiality of private parties. This dissertation proposes methodologies to infer behavior patterns and disclose anomalies from network traffic analysis, detecting slight variations compared with the normal profile. Bounded by network OSI layers 1 to 4, raw data are modeled in features, representing network observations, and posteriorly, processed by machine learning algorithms to classify network activity. Assuming the inevitability of a network terminal to be compromised, this work comprises two scenarios: a self-spreading force that propagates over internal network and a data exfiltration charge which dispatch confidential info to the public network. Although features and modeling processes have been tested for these two cases, it is a generic operation that can be used in more complex scenarios as well as in different domains. The last chapter describes the proof of concept scenario and how data was generated, along with some evaluation metrics to perceive the model’s performance. The tests manifested promising results, ranging from 96% to 99% for the propagation case and 86% to 97% regarding data exfiltration.Nos dias de hoje, várias organizações enfrentam múltiplas ameaças no interior da sua rede. Numa época onde as empresas dependem cada vez mais da informação, estas ameaças podem compremeter seriamente a segurança e a integridade de dados confidenciais. O acesso não autorizado e o uso de programas ilícitos constituem uma forma de penetrar e ultrapassar as barreiras organizacionais, sendo capazes de propagarem-se para outros terminais presentes no interior da rede privada com o intuito de atingir dados confidenciais e segredos comerciais. A eficiência da segurança oferecida pelos sistemas de defesa tradicionais está a ser posta em causa devido ao elevado número de ataques de divulgação de dados sofridos pelas empresas. Desta forma, o desenvolvimento de novos sistemas de monitorização ativos usando inteligência artificial é crucial na medida de atingir uma deteção mais precisa em curtos períodos de tempo. No entanto, a monitorização e o armazenamento dos registos da atividade da rede são restritos e limitados por questões legais e estratégias de privacidade, como a cifra dos dados, visando proteger a confidencialidade das entidades. Esta dissertação propõe metodologias para inferir padrões de comportamento e revelar anomalias através da análise de tráfego que passa na rede, detetando pequenas variações em comparação com o perfil normal de atividade. Delimitado pelas camadas de rede OSI 1 a 4, os dados em bruto são modelados em features, representando observações de rede e, posteriormente, processados por algoritmos de machine learning para classificar a atividade de rede. Assumindo a inevitabilidade de um terminal ser comprometido, este trabalho compreende dois cenários: um ataque que se auto-propaga sobre a rede interna e uma tentativa de exfiltração de dados que envia informações para a rede pública. Embora os processos de criação de features e de modelação tenham sido testados para estes dois casos, é uma operação genérica que pode ser utilizada em cenários mais complexos, bem como em domínios diferentes. O último capítulo inclui uma prova de conceito e descreve o método de criação dos dados, com a utilização de algumas métricas de avaliação de forma a espelhar a performance do modelo. Os testes mostraram resultados promissores, variando entre 96% e 99% para o caso da propagação e entre 86% e 97% relativamente ao roubo de dados.Mestrado em Engenharia de Computadores e Telemátic

    Knowledge discovery from trajectories

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesAs a newly proliferating study area, knowledge discovery from trajectories has attracted more and more researchers from different background. However, there is, until now, no theoretical framework for researchers gaining a systematic view of the researches going on. The complexity of spatial and temporal information along with their combination is producing numerous spatio-temporal patterns. In addition, it is very probable that a pattern may have different definition and mining methodology for researchers from different background, such as Geographic Information Science, Data Mining, Database, and Computational Geometry. How to systematically define these patterns, so that the whole community can make better use of previous research? This paper is trying to tackle with this challenge by three steps. First, the input trajectory data is classified; second, taxonomy of spatio-temporal patterns is developed from data mining point of view; lastly, the spatio-temporal patterns appeared on the previous publications are discussed and put into the theoretical framework. In this way, researchers can easily find needed methodology to mining specific pattern in this framework; also the algorithms needing to be developed can be identified for further research. Under the guidance of this framework, an application to a real data set from Starkey Project is performed. Two questions are answers by applying data mining algorithms. First is where the elks would like to stay in the whole range, and the second is whether there are corridors among these regions of interest
    • …
    corecore