2,119 research outputs found
Recommended from our members
A new spatial model for predicting multivariate counts : anticipating pedestrian crashes across neighborhoods and firm births across counties
textTransportation research regularly relies on data exhibiting both space and time dimensions. Thanks to the rise of smartphones, Bluetooth, and other devices, geo-referenced data collection enables application of more behaviorally realistic -- but complex -- models that account for spatial autocorrelation, temporal correlation, and possible time-space interactions (e.g., time-lagged effects from a neighboring unit's response). One promising area is crash count prediction, where crash frequencies (and severities) at zones, intersections, and along roadways will generally exhibit some spatial relationships, due to missing variables, causal mechanisms, and other ties. This dissertation work proposes and estimates a spatial multivariate count model and provides two case studies to implement such model. One case study is in the context of pedestrian-vehicle crash counts across zones in Austin, Texas, while accounting for network features (e.g., lane-miles and intersection density), land use factors (such as land use entropy and residential accessibility to commercial activities), population and job densities, and school access. The other case study pertains to new firm births by industries across U.S. counties while controlling for population density, agglomeration economies (e.g., percentage of firms with more than 100 people), wealth, and median age. The new model specification captures region-wide heterogeneity (thanks to extra variation introduced by the lognormal component in the mean crash-rate specification), correlations across two (or more) count types (in the same zone), and spatial autocorrelation among unobserved components. This new approach and associated application allow analysts to distinguish covariates' effects on multivariate crash and other counts from spatial spillover effects and cross-response correlations. This work adds to the literature by providing guidance on what types of specifications best reflect spatial count data while facilitating estimation (using large data sets) and illuminating the level and nature of spatial autocorrelation, multivariate correlation, and region-wide (latent) heterogeneity that exists in crash data after controlling for a host of observable factors.Civil, Architectural, and Environmental Engineerin
Neural Network Models for TCP - SYN Flood Denial of Service Attack Detection With Source and Destination Anonymization
The Internet has become a necessity in today\u27s digital world, and making it secure is a pressing concern. Hackers are investing ever-increasing efforts to compromise Internet nodes with novel techniques. According to Forbes, every minute, $ 2,900,000 is lost to cybercrime. A common cyber-attack is Denial of Service (DoS) or Distributed Denial of Service (DDoS) attacks, which may bring a network to a standstill, and unless mitigated, network services could be halted for an extended period. The attack can occur at any layer of the OSI model. This thesis focuses on SYN Flood DoS/DDoS attacks, also known as TCP Flood attacks, and studies the use of artificial neural networks to detect the attacks. Specific neural network models used in this thesis are the Gated Recurrent Units (GRU), the Long Short-Term Memory (LSTM), and a semi-supervised model on label propagation. All neural network models detect attacks by analyzing the individual hexadecimal values in the packet header. A novelty of the approach followed in this thesis is that the neural networks do not consider the lexical values of the network packet (MAC addresses, IP addresses, and port numbers) as input features in their traffic analysis. Instead, the neural network models are designed and trained to detect malicious traffic based on the time pattern of TCP flags. The neural networks base their analysis of traffic on time-sequenced patterns. An important hyperparameter discussed in this paper is the size of the lookup window, that is, the number of past packets the model can access to predict the next packet. Evaluation results based on datasets presented in this thesis show that the accuracies of the GRU, CNN/LSTM, and label propagation models are 81%, 93%, and 96%, respectively
Addressing practical challenges for anomaly detection in backbone networks
Network monitoring has always been a topic of foremost importance for both network operators and researchers for multiple reasons ranging from anomaly detection to tra c classi cation or capacity planning. Nowadays, as networks become more and more complex, tra c increases and security threats reproduce, achieving a deeper understanding of what is happening in the network has become an essential necessity. In particular, due to the considerable growth of cybercrime, research on the eld of anomaly detection has drawn signi cant attention in recent years and tons of proposals have been made. All the same, when it comes to deploying solutions in real environments, some of them fail to meet some crucial requirements. Taking this into account, this thesis focuses on lling this gap between the research
and the non-research world. Prior to the start of this work, we identify several problems. First, there is a clear lack of detailed and updated information on the most common anomalies and their characteristics. Second, unawareness of sampled data is still common although the performance of anomaly detection algorithms is severely a ected. Third, operators currently need to invest many work-hours to manually inspect and also classify detected anomalies to act accordingly and take the appropriate mitigation measures. This is further exacerbated due to the high number of false positives and false negatives and because anomaly detection systems are often perceived as extremely complex black boxes. Analysing an issue is essential to fully comprehend the problem space and to be able to tackle it properly. Accordingly, the rst block of this thesis seeks to obtain detailed and updated real-world information on the most frequent anomalies occurring in backbone networks. It rst reports on the performance of di erent commercial systems for anomaly detection and analyses the types of network nomalies detected. Afterwards, it focuses on further investigating the characteristics of the anomalies found in a backbone network using one of the tools for more than half a year. Among other results, this block con rms the need of applying sampling in an operational environment as well as the unacceptably high number of false positives and false negatives still reported by current commercial tools. On the whole, the presence of ampling in large networks for monitoring purposes has become almost mandatory and, therefore, all anomaly detection algorithms that do not take that into account might report incorrect results.
In the second block of this thesis, the dramatic impact of sampling on the performance of well-known anomaly detection techniques is analysed and con rmed. However, we show that the results change signi cantly depending on the sampling technique used and also on
the common metric selected to perform the comparison. In particular, we show that, Packet Sampling outperforms Flow Sampling unlike previously reported. Furthermore, we observe that Selective Sampling (SES), a sampling technique that focuses on small ows, obtains much better results than traditional sampling techniques for scan detection. Consequently, we propose Online Selective Sampling, a sampling technique that obtains the same good performance for scan detection than SES but works on a per-packet basis instead of keeping all
ows in memory. We validate and evaluate our proposal and show that it can operate online and uses much less resources than SES.
Although the literature is plenty of techniques for detecting anomalous events, research on anomaly classi cation and extraction (e.g., to further investigate what happened or to share evidence with third parties involved) is rather marginal. This makes it harder for network operators to analise reported anomalies because they depend solely on their experience to do the job. Furthermore, this task is an extremely
time-consuming and error-prone process.
The third block of this thesis targets this issue and brings it together with the knowledge acquired in the previous blocks. In particular, it presents a system for automatic anomaly detection, extraction and classi cation with high accuracy and very low false positives. We deploy the system in an operational environment and show its usefulness in practice.
The fourth and last block of this thesis presents a generalisation of our system that focuses on analysing all the tra c, not only network anomalies. This new system seeks to further help network operators by summarising the most signi cant tra c patterns in their network. In particular, we generalise our system to deal with big network tra c data. In particular, it deals with src/dst IPs, src/dst ports, protocol, src/dst
Autonomous Systems, layer 7 application and src/dst geolocation. We rst deploy a prototype in the European backbone network of G EANT and show that it can process large amounts of data quickly and build highly informative and compact reports that are very useful to help comprehending what is happening in the network. Second, we deploy it in a completely di erent scenario and show how it can also be successfully used in a real-world use case where we analyse the behaviour of highly distributed devices related with a critical infrastructure sector.La monitoritzaci o de xarxa sempre ha estat un tema de gran import ancia per operadors de xarxa i investigadors per m ultiples raons que van des de la detecci o d'anomalies fins a la classi caci o d'aplicacions. Avui en dia, a mesura que les xarxes es tornen m es i m es complexes, augmenta el tr ansit de dades i les amenaces de seguretat segueixen creixent, aconseguir una comprensi o m es profunda del que passa a la xarxa s'ha convertit en una necessitat essencial.
Concretament, degut al considerable increment del ciberactivisme, la investigaci o en el camp de la detecci o d'anomalies ha crescut i en els darrers anys s'han fet moltes i diverses propostes. Tot i aix o, quan s'intenten desplegar aquestes solucions en entorns reals, algunes d'elles no compleixen alguns requisits fonamentals. Tenint aix o en compte, aquesta tesi se centra a omplir aquest buit entre la recerca i el m on real. Abans d'iniciar aquest treball es van identi car diversos problemes. En primer lloc, hi ha una clara manca d'informaci o detallada i actualitzada sobre les anomalies m es comuns i les seves caracter stiques. En segona inst ancia, no tenir en compte la possibilitat de treballar amb nom es part de les dades (mostreig de tr ansit) continua sent bastant est es tot i el sever efecte en el rendiment dels algorismes de detecci o d'anomalies. En tercer lloc, els operadors de xarxa actualment han d'invertir moltes hores de feina per classi car i inspeccionar manualment les anomalies detectades per actuar en conseqüencia i prendre les mesures apropiades de mitigaci o. Aquesta situaci o es veu agreujada per l'alt nombre de falsos positius i falsos negatius i perqu e els sistemes de detecci o d'anomalies s on sovint percebuts com caixes negres extremadament complexes.
Analitzar un tema es essencial per comprendre plenament l'espai del problema i per poder-hi fer front de forma adequada. Per tant, el primer bloc d'aquesta tesi pret en proporcionar informaci o detallada i actualitzada del m on real sobre les anomalies m es freqüents en una xarxa troncal.
Primer es comparen tres eines comercials per a la detecci o d'anomalies i se n'estudien els seus punts forts i febles, aix com els tipus
d'anomalies de xarxa detectats. Posteriorment, s'investiguen les caracter stiques de les anomalies que es troben en la mateixa xarxa troncal utilitzant una de les eines durant m es de mig any. Entre d'altres resultats, aquest bloc con rma la necessitat de l'aplicaci o de mostreig de tr ansit en un entorn operacional, aix com el nombre inacceptablement elevat de falsos positius i falsos negatius en eines comercials actuals.
En general, el mostreig de tr ansit de dades de xarxa ( es a dir, treballar nom es amb una part de les dades) en grans xarxes troncals s'ha convertit en gaireb e obligatori i, per tant, tots els algorismes de detecci o d'anomalies que no ho tenen en compte poden veure seriosament afectats els seus resultats. El segon bloc d'aquesta tesi analitza i confi rma el dram atic impacte de mostreig en el rendiment de t ecniques de detecci o d'anomalies plenament acceptades a l'estat de l'art. No obstant, es mostra que els resultats canvien signi cativament depenent de la
t ecnica de mostreig utilitzada i tamb e en funci o de la m etrica usada per a fer la comparativa. Contr ariament als resultats reportats en estudis previs, es mostra que Packet Sampling supera Flow Sampling. A m es, a m es, s'observa que Selective Sampling (SES), una t ecnica de mostreig que se centra en mostrejar fluxes petits, obt e resultats molt millors per a la detecci o d'escanejos que no pas les t ecniques tradicionals de mostreig. En conseqü encia, proposem Online Selective Sampling, una t ecnica de mostreig que obt e el mateix bon rendiment per a la detecci o d'escanejos que SES, per o treballa paquet per paquet enlloc de mantenir tots els fluxes a mem oria. Despr es de validar i evaluar la nostra proposta, demostrem que es capa c de treballar online i utilitza molts menys recursos que SES. Tot i la gran quantitat de tècniques proposades a la literatura per a la detecci o d'esdeveniments an omals, la investigaci o per a la seva posterior classi caci o i extracci o
(p.ex., per investigar m es a fons el que va passar o per compartir l'evid encia amb tercers involucrats) es m es aviat marginal. Aix o fa que sigui m es dif cil per als operadors de xarxa analalitzar les anomalies reportades, ja que depenen unicament de la seva experi encia per fer la feina. A m es a m es, aquesta tasca es un proc es extremadament lent i propens a errors. El tercer bloc d'aquesta tesi se centra en aquest tema tenint tamb e en compte els coneixements adquirits en els blocs anteriors. Concretament, presentem un sistema per a la detecci o extracci o i classi caci o autom atica d'anomalies amb una alta precisi o i molt pocs falsos positius. Adicionalment, despleguem el sistema en un entorn operatiu i demostrem la seva utilitat pr actica. El quart i ultim bloc d'aquesta tesi presenta una generalitzaci o del nostre sistema que se centra en l'an alisi de tot el tr ansit, no nom es en les anomalies. Aquest nou sistema pret en ajudar m es als operadors ja que resumeix els patrons de tr ansit m es importants de la seva xarxa. En particular, es generalitza el sistema per fer front al "big data" (una gran quantitat de dades). En particular, el sistema tracta IPs origen i dest i, ports origen i destà , protocol, Sistemes Aut onoms origen i dest , aplicaci o que ha generat el tr ansit i fi nalment, dades de geolocalitzaci o (tamb e per origen i dest ). Primer, despleguem un prototip a la xarxa europea per a la recerca i la investigaci o (G EANT) i demostrem que el sistema pot processar grans quantitats de dades r apidament aix com crear informes altament informatius i compactes que s on de gran utilitat per ajudar a comprendre el que est a succeint a la xarxa. En segon lloc, despleguem la nostra eina en un escenari completament diferent i mostrem com tamb e pot ser utilitzat amb exit en un cas d' us en el m on real en el qual s'analitza el comportament de dispositius altament distribuïts
Community-Based Behavioral Understanding of Mobility Trends and Public Attitude through Transportation User and Agency Interactions on Social Media in the Emergence of Covid-19
The increased availability of technology-enabled transportation options and modern communication devices (smartphones, in particular) is transforming travel-related decision-making in the population differently at different places, points in time, modes of transportation, and socio-economic groups. The emergence of COVID-19 made the dynamics of passenger travel behavior more complex, forcing a worldwide, unparalleled change in human travel behavior and introducing a new normal into their existence. This dissertation explores the potential of social media platforms (SMPs) as a viable alternative to traditional approaches (e.g., travel surveys) to understand the complex dynamics of people’s mobility patterns in the emergence of COVID-19. In this dissertation, we focus on three objectives. First, a novel approach to developing comparative infographics of emerging transportation trends is introduced by natural language processing and data-driven techniques using large-scale social media data. Second, a methodology has been developed to model community-based travel behavior under different socioeconomic and demographic factors at the community level in the emergence of COVID-19 on Twitter, inferring users’ demographics to overcome sampling bias. Third, the communication patterns of different transportation agencies on Twitter regarding message kinds, communication sufficiency, consistency, and coordination were examined by applying text mining techniques and dynamic network analysis.
The methodologies and findings of the dissertation will allow real-time monitoring of transportation trends by agencies, researchers, and professionals. Potential applications of the work may include: (1) identifying spatial diversity of public mobility needs and concerns through social media platforms; (2) developing new policies that would satisfy the diverse needs at different locations; (3) introducing new plans to support and celebrate equity, diversity, and inclusion in the transportation sector that would improve the efficient flow of goods and services; (4) designing new methods to model community-based travel behavior at different scales (e.g., census block, zip code, etc.) using social media data inferring users’ socio-economic and demographic properties; and (5) implementing efficient policies to improve existing communication plans, critical information dissemination efficacy, and coordination of different transportation actors to raise awareness among passengers in general and during unprecedented health crises in the fragmented communication world
Deteção de propagação de ameaças e exfiltração de dados em redes empresariais
Modern corporations face nowadays multiple threats within their networks. In an era where companies are tightly dependent on information, these threats can seriously compromise the safety and integrity of sensitive data. Unauthorized access and illicit programs comprise a way of penetrating the corporate networks, able to traversing and propagating to other terminals across the private network, in search of confidential data and business secrets. The efficiency of traditional security defenses are being questioned with the number of data breaches occurred nowadays, being essential the development of new active monitoring systems with artificial intelligence capable to achieve almost perfect detection in very short time frames. However, network monitoring and storage of network activity records are restricted and limited by legal laws
and privacy strategies, like encryption, aiming to protect the confidentiality of private parties. This dissertation proposes methodologies to infer behavior patterns and disclose anomalies from network traffic analysis, detecting slight variations compared with the normal profile. Bounded by network OSI layers 1 to 4, raw data are modeled in features, representing network observations, and posteriorly, processed by machine learning algorithms to classify network activity. Assuming the inevitability of a network terminal to be compromised, this work comprises two scenarios: a self-spreading force that propagates over internal network and a data exfiltration charge which dispatch confidential info to the public network. Although features and modeling processes have been tested for these two cases, it is a generic operation that can be used in
more complex scenarios as well as in different domains. The last chapter describes the proof of concept scenario and how data was generated, along with some evaluation metrics to perceive the model’s performance. The tests manifested promising results, ranging from 96% to 99% for the propagation case and 86% to 97% regarding data exfiltration.Nos dias de hoje, várias organizações enfrentam múltiplas ameaças no interior da sua rede. Numa época onde as empresas dependem cada vez mais da
informação, estas ameaças podem compremeter seriamente a segurança e a integridade de dados confidenciais. O acesso não autorizado e o uso de programas ilÃcitos constituem uma forma de penetrar e ultrapassar as barreiras organizacionais, sendo capazes de propagarem-se para outros terminais presentes no interior da rede privada com o intuito de atingir dados confidenciais e segredos comerciais. A eficiência da segurança oferecida pelos sistemas de defesa tradicionais está a ser posta em causa devido ao elevado número de ataques de divulgação de dados sofridos pelas empresas. Desta forma, o desenvolvimento de novos sistemas de monitorização ativos usando inteligência artificial é crucial na medida de atingir uma deteção mais precisa em curtos perÃodos de tempo. No entanto, a monitorização e o armazenamento dos registos da atividade da rede são restritos e limitados por questões legais e estratégias de privacidade, como a cifra dos dados, visando proteger a confidencialidade das entidades. Esta dissertação propõe metodologias para inferir padrões de comportamento e revelar anomalias através da análise de
tráfego que passa na rede, detetando pequenas variações em comparação com o perfil normal de atividade. Delimitado pelas camadas de rede OSI 1
a 4, os dados em bruto são modelados em features, representando observações de rede e, posteriormente, processados por algoritmos de machine learning para classificar a atividade de rede. Assumindo a inevitabilidade de um terminal ser comprometido, este trabalho compreende dois cenários: um ataque que se auto-propaga sobre a rede interna e uma tentativa de exfiltração de dados que envia informações para a rede pública. Embora os processos de criação de features e de modelação tenham sido testados para estes dois casos, é uma operação genérica que pode ser utilizada em cenários mais complexos, bem como em domÃnios diferentes. O último capÃtulo inclui uma prova de conceito e descreve o método de criação dos dados, com a utilização de algumas métricas de avaliação de forma a espelhar a performance do modelo. Os testes mostraram resultados promissores, variando entre 96% e 99% para o caso da propagação e entre 86% e 97% relativamente ao roubo de dados.Mestrado em Engenharia de Computadores e Telemátic
Knowledge discovery from trajectories
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesAs a newly proliferating study area, knowledge discovery from trajectories has
attracted more and more researchers from different background. However, there is, until now, no theoretical framework for researchers gaining a systematic view of the
researches going on. The complexity of spatial and temporal information along with
their combination is producing numerous spatio-temporal patterns. In addition, it is
very probable that a pattern may have different definition and mining methodology for researchers from different background, such as Geographic Information Science, Data Mining, Database, and Computational Geometry. How to systematically define these
patterns, so that the whole community can make better use of previous research? This
paper is trying to tackle with this challenge by three steps. First, the input trajectory data is classified; second, taxonomy of spatio-temporal patterns is developed from data mining point of view; lastly, the spatio-temporal patterns appeared on the previous publications are discussed and put into the theoretical framework. In this way, researchers can easily find needed methodology to mining specific pattern in this framework; also the algorithms needing to be developed can be identified for further research. Under the guidance of this framework, an application to a real data set from Starkey Project is performed. Two questions are answers by applying data mining
algorithms. First is where the elks would like to stay in the whole range, and the second
is whether there are corridors among these regions of interest
Recommended from our members
IEEE 802.11 wireless LAN traffic analysis: a cross-layer approach
textThe deployment of broadband wireless data networks, e.g., wireless local area
networks (WLANs) [29], experienced tremendous growth in the last several
years, and this trend is continuously gaining momentum. In fact, WLAN is
becoming an indispensable component of the modern telecommunication infrastructure.
Despite this optimistic outlook, however, little is known about
the impact of the wireless channel on the characteristics of WLAN traffic.
This dissertation characterizes the correlation structures of WLAN channel
with traffic statistics from a cross-layer point of view, and provides new measurement
methodologies and statistical models for WLAN networks.
Currently WLAN standards are designed within the paradigm of the
layered network architecture. For example, the architecture of IEEE 802.11
vii
is almost identical to the Ethernet. However, wireless networks are fundamentally
different from their wired peers due to the shift of transmission media
from cables to over-the-air radio waves. This transition exposes wireless
systems to the influence of radio propagation, and more importantly, to the
temporal and spacial fluctuations of the radio channel that can actually be
propagated up to upper layers. However, the current WLAN architecture isolates
network layers, and largely ignores this impact. Therefore, we believe
that a cross-layer based approach is necessary to understand and reflect this
underlying impact of the channel to the upper layers of the network, especially
in relation to WLAN traffic behavior.
Measurement is one of the fundamental tools used to quantify radio
propagation. As part of this dissertation, a complete framework for a measurement
methodology, including hardware, software, and measurement procedures,
is established. Characteristics of the propagation channel are estimated
from measurement data, and the channel knowledge is applied to the upper
layers for more realistic and accurate modeling.
In WLAN environments, knowledge of the traffic characteristics is essential
for proper network provisioning, and for improving the performance
of the IEEE 802.11 standard and network devices, e.g., to design improved
MAC schemes, or to build better buffer scheduling algorithms with channel
knowledge, etc. Built upon extensive WLAN traffic traces, this dissertation
work presents cross-layer models for WLAN throughput predictions, traffic
statistics, and link layer characteristics.
viii
The main goal of this dissertation work is to experiment with and develop
new methods for identifying channel characteristics. Thereby utilizing
this knowledge, we show how to predict and improve WLAN performance.
Within the framework of the developed cross-layer measurement methodology,
we conducted extensive measurements in different physical environments
and different settings such as office buildings and stores, and (1) show that
the impact of the propagation channel can be quantified by using simple large
scale channel metric (throughput over longer period of time), and (2) also
present the existence of a Doppler effect within today’s WLAN packet traffic
at sub-second time scales. We also show the real-world WLAN usage pattern
from our measurement results. From this data, we conclude that the key issues
to study WLAN networks include accurate site-specific propagation channel
modeling and real-time autonomous traffic control.Electrical and Computer Engineerin
- …