3,257 research outputs found

    Classification hardness for supervised learners on 20 years of intrusion detection data

    Get PDF
    This article consolidates analysis of established (NSL-KDD) and new intrusion detection datasets (ISCXIDS2012, CICIDS2017, CICIDS2018) through the use of supervised machine learning (ML) algorithms. The uniformity in analysis procedure opens up the option to compare the obtained results. It also provides a stronger foundation for the conclusions about the efficacy of supervised learners on the main classification task in network security. This research is motivated in part to address the lack of adoption of these modern datasets. Starting with a broad scope that includes classification by algorithms from different families on both established and new datasets has been done to expand the existing foundation and reveal the most opportune avenues for further inquiry. After obtaining baseline results, the classification task was increased in difficulty, by reducing the available data to learn from, both horizontally and vertically. The data reduction has been included as a stress-test to verify if the very high baseline results hold up under increasingly harsh constraints. Ultimately, this work contains the most comprehensive set of results on the topic of intrusion detection through supervised machine learning. Researchers working on algorithmic improvements can compare their results to this collection, knowing that all results reported here were gathered through a uniform framework. This work's main contributions are the outstanding classification results on the current state of the art datasets for intrusion detection and the conclusion that these methods show remarkable resilience in classification performance even when aggressively reducing the amount of data to learn from

    A robust, reliable and deployable framework for In-vehicle security

    Full text link
    Cyber attacks on financial and government institutions, critical infrastructure, voting systems, businesses, modern vehicles, etc., are on the rise. Fully connected autonomous vehicles are more vulnerable than ever to hacking and data theft. This is due to the fact that the protocols used for in-vehicle communication i.e. controller area network (CAN), FlexRay, local interconnect network (LIN), etc., lack basic security features such as message authentication, which makes it vulnerable to a wide range of attacks including spoofing attacks. This research presents methods to protect the vehicle against spoofing attacks. The proposed methods exploit uniqueness in the electronic control unit electronic control unit (ECU) and the physical channel between transmitting and destination nodes for linking the received packet to the source. Impurities in the digital device, physical channel, imperfections in design, material, and length of the channel contribute to the uniqueness of artifacts. I propose novel techniques for electronic control unit (ECU) identification in this research to address security vulnerabilities of the in-vehicle communication. The reliable ECU identification has the potential to prevent spoofing attacks launched over the CAN due to the inconsideration of the message authentication. In this regard, my techniques models the ECU-specific random distortion caused by the imperfections in digital-to-analog converter digital to analog converter (DAC), and semiconductor impurities in the transmitting ECU for fingerprinting. I also model the channel-specific random distortion, impurities in the physical channel, imperfections in design, material, and length of the channel are contributing factors behind physically unclonable artifacts. The lumped element model is used to characterize channel-specific distortions. This research exploits the distortion of the device (ECU) and distortion due to the channel to identify the transmitter and hence authenticate the transmitter.Ph.D.College of Engineering & Computer ScienceUniversity of Michigan-Dearbornhttps://deepblue.lib.umich.edu/bitstream/2027.42/154568/1/Azeem Hafeez Final Disseration.pdfDescription of Azeem Hafeez Final Disseration.pdf : Dissertatio

    Modélisation formelle des systèmes de détection d'intrusions

    Get PDF
    L’écosystème de la cybersécurité évolue en permanence en termes du nombre, de la diversité, et de la complexité des attaques. De ce fait, les outils de détection deviennent inefficaces face à certaines attaques. On distingue généralement trois types de systèmes de détection d’intrusions : détection par anomalies, détection par signatures et détection hybride. La détection par anomalies est fondée sur la caractérisation du comportement habituel du système, typiquement de manière statistique. Elle permet de détecter des attaques connues ou inconnues, mais génère aussi un très grand nombre de faux positifs. La détection par signatures permet de détecter des attaques connues en définissant des règles qui décrivent le comportement connu d’un attaquant. Cela demande une bonne connaissance du comportement de l’attaquant. La détection hybride repose sur plusieurs méthodes de détection incluant celles sus-citées. Elle présente l’avantage d’être plus précise pendant la détection. Des outils tels que Snort et Zeek offrent des langages de bas niveau pour l’expression de règles de reconnaissance d’attaques. Le nombre d’attaques potentielles étant très grand, ces bases de règles deviennent rapidement difficiles à gérer et à maintenir. De plus, l’expression de règles avec état dit stateful est particulièrement ardue pour reconnaître une séquence d’événements. Dans cette thèse, nous proposons une approche stateful basée sur les diagrammes d’état-transition algébriques (ASTDs) afin d’identifier des attaques complexes. Les ASTDs permettent de représenter de façon graphique et modulaire une spécification, ce qui facilite la maintenance et la compréhension des règles. Nous étendons la notation ASTD avec de nouvelles fonctionnalités pour représenter des attaques complexes. Ensuite, nous spécifions plusieurs attaques avec la notation étendue et exécutons les spécifications obtenues sur des flots d’événements à l’aide d’un interpréteur pour identifier des attaques. Nous évaluons aussi les performances de l’interpréteur avec des outils industriels tels que Snort et Zeek. Puis, nous réalisons un compilateur afin de générer du code exécutable à partir d’une spécification ASTD, capable d’identifier de façon efficiente les séquences d’événements.Abstract : The cybersecurity ecosystem continuously evolves with the number, the diversity, and the complexity of cyber attacks. Generally, we have three types of Intrusion Detection System (IDS) : anomaly-based detection, signature-based detection, and hybrid detection. Anomaly detection is based on the usual behavior description of the system, typically in a static manner. It enables detecting known or unknown attacks but also generating a large number of false positives. Signature based detection enables detecting known attacks by defining rules that describe known attacker’s behavior. It needs a good knowledge of attacker behavior. Hybrid detection relies on several detection methods including the previous ones. It has the advantage of being more precise during detection. Tools like Snort and Zeek offer low level languages to represent rules for detecting attacks. The number of potential attacks being large, these rule bases become quickly hard to manage and maintain. Moreover, the representation of stateful rules to recognize a sequence of events is particularly arduous. In this thesis, we propose a stateful approach based on algebraic state-transition diagrams (ASTDs) to identify complex attacks. ASTDs allow a graphical and modular representation of a specification, that facilitates maintenance and understanding of rules. We extend the ASTD notation with new features to represent complex attacks. Next, we specify several attacks with the extended notation and run the resulting specifications on event streams using an interpreter to identify attacks. We also evaluate the performance of the interpreter with industrial tools such as Snort and Zeek. Then, we build a compiler in order to generate executable code from an ASTD specification, able to efficiently identify sequences of events

    Web server load prediction and anomaly detection from hypertext transfer protocol logs

    Get PDF
    As network traffic increases and new intrusions occur, anomaly detection solutions based on machine learning are necessary to detect previously unknown intrusion patterns. Most of the developed models require a labelled dataset, which can be challenging owing to a shortage of publicly available datasets. These datasets are often too small to effectively train machine learning models, which further motivates the use of real unlabeled traffic. By using real traffic, it is possible to more accurately simulate the types of anomalies that might occur in a real-world network and improve the performance of the detection model. We present a method able to predict and categorize anomalies without the aid of a labelled dataset, demonstrating the model’s usability while also gathering a dataset from real noisy network traffic. The proposed long short-term memory (LTSM) based intrusion detection system was tested in a real-world setting of an antivirus company and was successful in detecting various intrusions using 5-minute windowing over both the predicted and real update curves thereby demonstrating its usefulness. Our contribution was the development of a robust model generally applicable to any hypertext transfer protocol (HTTP) traffic with almost real-time anomaly detection, while also outperforming earlier studies in terms of prediction accuracy

    Click Fraud Detection in Online and In-app Advertisements: A Learning Based Approach

    Get PDF
    Click Fraud is the fraudulent act of clicking on pay-per-click advertisements to increase a site’s revenue, to drain revenue from the advertiser, or to inflate the popularity of content on social media platforms. In-app advertisements on mobile platforms are among the most common targets for click fraud, which makes companies hesitant to advertise their products. Fraudulent clicks are supposed to be caught by ad providers as part of their service to advertisers, which is commonly done using machine learning methods. However: (1) there is a lack of research in current literature addressing and evaluating the different techniques of click fraud detection and prevention, (2) threat models composed of active learning systems (smart attackers) can mislead the training process of the fraud detection model by polluting the training data, (3) current deep learning models have significant computational overhead, (4) training data is often in an imbalanced state, and balancing it still results in noisy data that can train the classifier incorrectly, and (5) datasets with high dimensionality cause increased computational overhead and decreased classifier correctness -- while existing feature selection techniques address this issue, they have their own performance limitations. By extending the state-of-the-art techniques in the field of machine learning, this dissertation provides the following solutions: (i) To address (1) and (2), we propose a hybrid deep-learning-based model which consists of an artificial neural network, auto-encoder and semi-supervised generative adversarial network. (ii) As a solution for (3), we present Cascaded Forest and Extreme Gradient Boosting with less hyperparameter tuning. (iii) To overcome (4), we propose a row-wise data reduction method, KSMOTE, which filters out noisy data samples both in the raw data and the synthetically generated samples. (iv) For (5), we propose different column-reduction methods such as multi-time-scale Time Series analysis for fraud forecasting, using binary labeled imbalanced datasets and hybrid filter-wrapper feature selection approaches

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Unsupervised Intrusion Detection with Cross-Domain Artificial Intelligence Methods

    Get PDF
    Cybercrime is a major concern for corporations, business owners, governments and citizens, and it continues to grow in spite of increasing investments in security and fraud prevention. The main challenges in this research field are: being able to detect unknown attacks, and reducing the false positive ratio. The aim of this research work was to target both problems by leveraging four artificial intelligence techniques. The first technique is a novel unsupervised learning method based on skip-gram modeling. It was designed, developed and tested against a public dataset with popular intrusion patterns. A high accuracy and a low false positive rate were achieved without prior knowledge of attack patterns. The second technique is a novel unsupervised learning method based on topic modeling. It was applied to three related domains (network attacks, payments fraud, IoT malware traffic). A high accuracy was achieved in the three scenarios, even though the malicious activity significantly differs from one domain to the other. The third technique is a novel unsupervised learning method based on deep autoencoders, with feature selection performed by a supervised method, random forest. Obtained results showed that this technique can outperform other similar techniques. The fourth technique is based on an MLP neural network, and is applied to alert reduction in fraud prevention. This method automates manual reviews previously done by human experts, without significantly impacting accuracy

    A Big Data Analytical Framework for Intrusion Detection Based On Novel Elephant Herding Optimized Finite Dirichlet Mixture Models

    Get PDF
    For the purpose of identifying a wide variety of hostile activity in cyberspace, an Intrusion Detection System (IDS) is a crucial instrument. However, traditional IDSs have limitations in detecting zero-day attacks, which can lead to high false alarm rates. To address this issue, it is crucial to integrate the monitoring and analysis of network data with decision-making methods that can identify anomalous events accurately. By combining these approaches, organizations can develop more effective cybersecurity measures and better protect their networks from cyber threats. In this study, we proposed a novel called the Elephant Herding Optimized Finite Dirichlet Mixture Model (EHO-FDMM). This framework consists of three modules: capture and logging, pre-processing, and an innovative IDS method based on the EHO-FDMM. The NSL-KDD and UNSW-NB15 datasets are used to assess this framework's performance. The empirical findings show that selecting the optimum model that accurately fits the network data is aided by statistical analysis of the data. The EHO-FDMM-based intrusion detection method also offers a lower False Alarm Rate (FPR) and greater Detection Rate (DR) than the other three strong methods. The EHO-FDMM and exact interval of confidence bounds were used to create the suggested method's ability to detect even minute variations between legal and attack routes. These methods are based on correlations and proximity measurements, which are ineffective against contemporary assaults that imitate everyday actions

    Performance Evaluation of Network Anomaly Detection Systems

    Get PDF
    Nowadays, there is a huge and growing concern about security in information and communication technology (ICT) among the scientific community because any attack or anomaly in the network can greatly affect many domains such as national security, private data storage, social welfare, economic issues, and so on. Therefore, the anomaly detection domain is a broad research area, and many different techniques and approaches for this purpose have emerged through the years. Attacks, problems, and internal failures when not detected early may badly harm an entire Network system. Thus, this thesis presents an autonomous profile-based anomaly detection system based on the statistical method Principal Component Analysis (PCADS-AD). This approach creates a network profile called Digital Signature of Network Segment using Flow Analysis (DSNSF) that denotes the predicted normal behavior of a network traffic activity through historical data analysis. That digital signature is used as a threshold for volume anomaly detection to detect disparities in the normal traffic trend. The proposed system uses seven traffic flow attributes: Bits, Packets and Number of Flows to detect problems, and Source and Destination IP addresses and Ports, to provides the network administrator necessary information to solve them. Via evaluation techniques, addition of a different anomaly detection approach, and comparisons to other methods performed in this thesis using real network traffic data, results showed good traffic prediction by the DSNSF and encouraging false alarm generation and detection accuracy on the detection schema. The observed results seek to contribute to the advance of the state of the art in methods and strategies for anomaly detection that aim to surpass some challenges that emerge from the constant growth in complexity, speed and size of today’s large scale networks, also providing high-value results for a better detection in real time.Atualmente, existe uma enorme e crescente preocupação com segurança em tecnologia da informação e comunicação (TIC) entre a comunidade científica. Isto porque qualquer ataque ou anomalia na rede pode afetar a qualidade, interoperabilidade, disponibilidade, e integridade em muitos domínios, como segurança nacional, armazenamento de dados privados, bem-estar social, questões econômicas, e assim por diante. Portanto, a deteção de anomalias é uma ampla área de pesquisa, e muitas técnicas e abordagens diferentes para esse propósito surgiram ao longo dos anos. Ataques, problemas e falhas internas quando não detetados precocemente podem prejudicar gravemente todo um sistema de rede. Assim, esta Tese apresenta um sistema autônomo de deteção de anomalias baseado em perfil utilizando o método estatístico Análise de Componentes Principais (PCADS-AD). Essa abordagem cria um perfil de rede chamado Assinatura Digital do Segmento de Rede usando Análise de Fluxos (DSNSF) que denota o comportamento normal previsto de uma atividade de tráfego de rede por meio da análise de dados históricos. Essa assinatura digital é utilizada como um limiar para deteção de anomalia de volume e identificar disparidades na tendência de tráfego normal. O sistema proposto utiliza sete atributos de fluxo de tráfego: bits, pacotes e número de fluxos para detetar problemas, além de endereços IP e portas de origem e destino para fornecer ao administrador de rede as informações necessárias para resolvê-los. Por meio da utilização de métricas de avaliação, do acrescimento de uma abordagem de deteção distinta da proposta principal e comparações com outros métodos realizados nesta tese usando dados reais de tráfego de rede, os resultados mostraram boas previsões de tráfego pelo DSNSF e resultados encorajadores quanto a geração de alarmes falsos e precisão de deteção. Com os resultados observados nesta tese, este trabalho de doutoramento busca contribuir para o avanço do estado da arte em métodos e estratégias de deteção de anomalias, visando superar alguns desafios que emergem do constante crescimento em complexidade, velocidade e tamanho das redes de grande porte da atualidade, proporcionando também alta performance. Ainda, a baixa complexidade e agilidade do sistema proposto contribuem para que possa ser aplicado a deteção em tempo real
    corecore