1,723 research outputs found

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    Web usage mining for click fraud detection

    Get PDF
    Estágio realizado na AuditMark e orientado pelo Eng.º Pedro FortunaTese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Multi-Sensor Data Fusion for Travel Time Estimation

    Get PDF
    The importance of travel time estimation has increased due to the central role it plays in a number of emerging intelligent transport systems and services including Advanced Traveller Information Systems (ATIS), Urban Traffic Control (UTC), Dynamic Route Guidance (DRG), Active Traffic Management (ATM), and network performance monitoring. Along with the emerging of new sensor technologies, the much greater volumes of near real time data provided by these new sensor systems create opportunities for significant improvement in travel time estimation. Data fusion as a recent technique leads to a promising solution to this problem. This thesis presents the development and testing of new methods of multi-sensor data fusion for the accurate, reliable and robust estimation of travel time. This thesis reviews the state-of-art data fusion approaches and its application in transport domain, and discusses both of opportunities and challenging of applying data fusion into travel time estimation in a heterogeneous real time data environment. For a particular England highway scenario where ILDs and ANPR data are largely available, a simple but practical fusion method is proposed to estimate the travel time based on a novel relationship between space-mean-speed and time-mean-speed. In developing a general fusion framework which is able to fuse ILDs, GPS and ANPR data, the Kalman filter is identified as the most appropriate fundamental fusion technique upon which to construct the required framework. This is based both on the ability of the Kalman filter to flexibly accommodate well-established traffic flow models which describe the internal physical relation between the observed variables and objective estimates and on its ability to integrate and propagate in a consistent fashion the uncertainty associated with different data sources. Although the standard linear Kalman filter has been used for multi-sensor travel time estimation in the previous research, the novelty of this research is to develop a nonlinear Kalman filter (EKF and UKF) fusion framework which improves the estimation performance over those methods based on the linear Kalman filter. This proposed framework is validated by both of simulation and real-world scenarios, and is demonstrated the effectiveness of estimating travel time by fusing multi-sensor sources

    Performance Analytics of Cloud Networks

    Get PDF
    As the world becomes more inter-connected and dependent on the Internet, networks become ever more pervasive, and the stresses placed upon them more demanding. Similarly, the expectations of networks to maintain a high level of performance have also increased. Network performance is highly important to any business that operates online, depends on web traffic, runs any part of their infrastructure in a cloud environment, or even hosts their own network infrastructure. Depending upon the exact nature of a network, whether it be local or wide-area, 10 or 100 Gigabit, it will have distinct performance characteristics and it is important for a business or individual operating on the network to understand those performance characteristics and how they affect operations. To better understand our networks, it is necessary that we test them to measure their performance capabilities and track these metrics over time. In our work, we provide an in-depth analysis of how best to run cloud benchmarks to increase our network intelligence and how we can use the results of those benchmarks to predict future performance and identify performance anomalies. To achieve this, we explain how to effectively run cloud benchmarks and propose a scheduling algorithm for running large numbers of cloud benchmarks daily. We then use the performance data gathered from this method to conduct a thorough analysis of the performance characteristics of a cloud network, train neural networks to forecast future throughput based on historical results and detect performance anomalies as they occur

    Denial of Service in Web-Domains: Building Defenses Against Next-Generation Attack Behavior

    Get PDF
    The existing state-of-the-art in the field of application layer Distributed Denial of Service (DDoS) protection is generally designed, and thus effective, only for static web domains. To the best of our knowledge, our work is the first that studies the problem of application layer DDoS defense in web domains of dynamic content and organization, and for next-generation bot behaviour. In the first part of this thesis, we focus on the following research tasks: 1) we identify the main weaknesses of the existing application-layer anti-DDoS solutions as proposed in research literature and in the industry, 2) we obtain a comprehensive picture of the current-day as well as the next-generation application-layer attack behaviour and 3) we propose novel techniques, based on a multidisciplinary approach that combines offline machine learning algorithms and statistical analysis, for detection of suspicious web visitors in static web domains. Then, in the second part of the thesis, we propose and evaluate a novel anti-DDoS system that detects a broad range of application-layer DDoS attacks, both in static and dynamic web domains, through the use of advanced techniques of data mining. The key advantage of our system relative to other systems that resort to the use of challenge-response tests (such as CAPTCHAs) in combating malicious bots is that our system minimizes the number of these tests that are presented to valid human visitors while succeeding in preventing most malicious attackers from accessing the web site. The results of the experimental evaluation of the proposed system demonstrate effective detection of current and future variants of application layer DDoS attacks

    Bayesian Models Applied to Cyber Security Anomaly Detection Problems

    Get PDF
    Cyber security is an important concern for all individuals, organisations and governments globally. Cyber attacks have become more sophisticated, frequent and dangerous than ever, and traditional anomaly detection methods have been proved to be less effective when dealing with these new classes of cyber threats. In order to address this, both classical and Bayesian models offer a valid and innovative alternative to the traditional signature-based methods, motivating the increasing interest in statistical research that it has been observed in recent years. In this review, we provide a description of some typical cyber security challenges, typical types of data and statistical methods, paying special attention to Bayesian approaches for these problems

    Self-learning Anomaly Detection in Industrial Production

    Get PDF

    Network anomalies detection via event analysis and correlation by a smart system

    Get PDF
    The multidisciplinary of contemporary societies compel us to look at Information Technology (IT) systems as one of the most significant grants that we can remember. However, its increase implies a mandatory security force for users, a force in the form of effective and robust tools to combat cybercrime to which users, individual or collective, are ex-posed almost daily. Monitoring and detection of this kind of problem must be ensured in real-time, allowing companies to intervene fruitfully, quickly and in unison. The proposed framework is based on an organic symbiosis between credible, affordable, and effective open-source tools for data analysis, relying on Security Information and Event Management (SIEM), Big Data and Machine Learning (ML) techniques commonly applied for the development of real-time monitoring systems. Dissecting this framework, it is composed of a system based on SIEM methodology that provides monitoring of data in real-time and simultaneously saves the information, to assist forensic investigation teams. Secondly, the application of the Big Data concept is effective in manipulating and organising the flow of data. Lastly, the use of ML techniques that help create mechanisms to detect possible attacks or anomalies on the network. This framework is intended to provide a real-time analysis application in the institution ISCTE – Instituto Universitário de Lisboa (Iscte), offering a more complete, efficient, and secure monitoring of the data from the different devices comprising the network.A multidisciplinaridade das sociedades contemporâneas obriga-nos a perspetivar os sistemas informáticos como uma das maiores dádivas de que há memória. Todavia o seu incremento implica uma mandatária força de segurança para utilizadores, força essa em forma de ferramentas eficazes e robustas no combate ao cibercrime a que os utilizadores, individuais ou coletivos, são sujeitos quase diariamente. A monitorização e deteção deste tipo de problemas tem de ser assegurada em tempo real, permitindo assim, às empresas intervenções frutuosas, rápidas e em uníssono. A framework proposta é alicerçada numa simbiose orgânica entre ferramentas open source credíveis, acessíveis pecuniariamente e eficazes na monitorização de dados, recorrendo a um sistema baseado em técnicas de Security Information and Event Management (SIEM), Big Data e Machine Learning (ML) comumente aplicadas para a criação de sistemas de monitorização em tempo real. Dissecando esta framework, é composta pela metodologia SIEM que possibilita a monitorização de dados em tempo real e em simultâneo guardar a informação, com o objetivo de auxiliar as equipas de investigação forense. Em segundo lugar, a aplicação do conceito Big Data eficaz na manipulação e organização do fluxo dos dados. Por último, o uso de técnicas de ML que ajudam a criação de mecanismos de deteção de possíveis ataques ou anomalias na rede. Esta framework tem como objetivo uma aplicação de análise em tempo real na instituição ISCTE – Instituto Universitário de Lisboa (Iscte), apresentando uma monitorização mais completa, eficiente e segura dos dados dos diversos dispositivos presentes na mesma

    Comparing Anomaly-Based Network Intrusion Detection Approaches Under Practical Aspects

    Get PDF
    While many of the currently used network intrusion detection systems (NIDS) employ signature-based approaches, there is an increasing research interest in the examination of anomaly-based detection methods, which seem to be more suited for recognizing zero-day attacks. Nevertheless, requirements for their practical deployment, as well as objective and reproducible evaluation methods, are hereby often neglected. The following thesis defines aspects that are crucial for a practical evaluation of anomaly-based NIDS, such as the focus on modern attack types, the restriction to one-class classification methods, the exclusion of known attacks from the training phase, a low false detection rate, and consideration of the runtime efficiency. Based on those principles, a framework dedicated to developing, testing and evaluating models for the detection of network anomalies is proposed. It is applied to two datasets featuring modern traffic, namely the UNSW-NB15 and the CIC-IDS-2017 datasets, in order to compare and evaluate commonly-used network intrusion detection methods. The implemented approaches include, among others, a highly configurable network flow generator, a payload analyser, a one-hot encoder, a one-class support vector machine, and an autoencoder. The results show a significant difference between the two chosen datasets: While for the UNSW-NB15 dataset several reasonably well performing model combinations for both the autoencoder and the one-class SVM can be found, most of them yield unsatisfying results when the CIC-IDS-2017 dataset is used.Obwohl viele der derzeit genutzten Systeme zur Erkennung von Netzwerkangriffen (engl. NIDS) signaturbasierte Ansätze verwenden, gibt es ein wachsendes Forschungsinteresse an der Untersuchung von anomaliebasierten Erkennungsmethoden, welche zur Identifikation von Zero-Day-Angriffen geeigneter erscheinen. Gleichwohl werden hierbei Bedingungen für deren praktischen Einsatz oft vernachlässigt, ebenso wie objektive und reproduzierbare Evaluationsmethoden. Die folgende Arbeit definiert Aspekte, die für eine praxisorientierte Evaluation unabdingbar sind. Dazu zählen ein Schwerpunkt auf modernen Angriffstypen, die Beschränkung auf One-Class Classification Methoden, der Ausschluss von bereits bekannten Angriffen aus dem Trainingsdatensatz, niedrige Falscherkennungsraten sowie die Berücksichtigung der Laufzeiteffizienz. Basierend auf diesen Prinzipien wird ein Rahmenkonzept vorgeschlagen, das für das Entwickeln, Testen und Evaluieren von Modellen zur Erkennung von Netzwerkanomalien bestimmt ist. Dieses wird auf zwei Datensätze mit modernem Netzwerkverkehr, namentlich auf den UNSW-NB15 und den CIC-IDS- 2017 Datensatz, angewendet, um häufig genutzte NIDS-Methoden zu vergleichen und zu evaluieren. Die für diese Arbeit implementierten Ansätze beinhalten, neben anderen, einen weit konfigurierbaren Netzwerkflussgenerator, einen Nutzdatenanalysierer, einen One-Hot-Encoder, eine One-Class Support Vector Machine sowie einen Autoencoder. Die Resultate zeigen einen großen Unterschied zwischen den beiden ausgewählten Datensätzen: Während für den UNSW-NB15 Datensatz verschiedene angemessen gut funktionierende Modellkombinationen, sowohl für den Autoencoder als auch für die One-Class SVM, gefunden werden können, bringen diese für den CIC-IDS-2017 Datensatz meist unbefriedigende Ergebnisse
    corecore