1,515 research outputs found

    An autonomous labeling approach to support vector machines algorithms for network traffic anomaly detection

    Get PDF
    In the past years, several support vector machines (SVM) novelty detection approaches have been applied on the network intrusion detection field. The main advantage of these approaches is that they can characterize normal traffic even when trained with datasets containing not only normal traffic but also a number of attacks. Unfortunately, these algorithms seem to be accurate only when the normal traffic vastly outnumbers the number of attacks present in the dataset. A situation which can not be always hold. This work presents an approach for autonomous labeling of normal traffic as a way of dealing with situations where class distribution does not present the imbalance required for SVM algorithms. In this case, the autonomous labeling process is made by SNORT, a misuse-based intrusion detection system. Experiments conducted on the 1998 DARPA dataset show that the use of the proposed autonomous labeling approach not only outperforms existing SVM alternatives but also, under some attack distributions, obtains improvements over SNORT itself.Fil: Catania, Carlos Adrian. Universidad Nacional de Cuyo; ArgentinaFil: Bromberg, Facundo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; Argentina. Universidad Tecnológica Nacional. Facultad Regional Mendoza. Departamento de Sistemas de Información. Laboratorio DHARMA; ArgentinaFil: Garcia Garino, Carlos Gabriel. Universidad Nacional de Cuyo. Facultad de Ingeniería; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; Argentin

    Improving Accuracy of Intrusion Detection Model Using PCA and optimized SVM

    Get PDF
    Intrusion detection is very essential for providing security to different network domains and is mostly used for locating and tracing the intruders. There are many problems with traditional intrusion detection models (IDS) such as low detection capability against unknown network attack, high false alarm rate and insufficient analysis capability. Hence the major scope of the research in this domain is to develop an intrusion detection model with improved accuracy and reduced training time. This paper proposes a hybrid intrusiondetection model by integrating the principal component analysis (PCA) and support vector machine (SVM). The novelty of the paper is the optimization of kernel parameters of the SVM classifier using automatic parameter selection technique. This technique optimizes the punishment factor (C) and kernel parameter gamma (γ), thereby improving the accuracy of the classifier and reducing the training and testing time. The experimental results obtained on the NSL KDD and gurekddcup dataset show that the proposed technique performs better with higher accuracy, faster convergence speed and better generalization. Minimum resources are consumed as the classifier input requires reduced feature set for optimum classification. A comparative analysis of hybrid models with the proposed model is also performed

    Spatiotemporal anomaly detection: streaming architecture and algorithms

    Get PDF
    Includes bibliographical references.2020 Summer.Anomaly detection is the science of identifying one or more rare or unexplainable samples or events in a dataset or data stream. The field of anomaly detection has been extensively studied by mathematicians, statisticians, economists, engineers, and computer scientists. One open research question remains the design of distributed cloud-based architectures and algorithms that can accurately identify anomalies in previously unseen, unlabeled streaming, multivariate spatiotemporal data. With streaming data, time is of the essence, and insights are perishable. Real-world streaming spatiotemporal data originate from many sources, including mobile phones, supervisory control and data acquisition enabled (SCADA) devices, the internet-of-things (IoT), distributed sensor networks, and social media. Baseline experiments are performed on four (4) non-streaming, static anomaly detection multivariate datasets using unsupervised offline traditional machine learning (TML), and unsupervised neural network techniques. Multiple architectures, including autoencoders, generative adversarial networks, convolutional networks, and recurrent networks, are adapted for experimentation. Extensive experimentation demonstrates that neural networks produce superior detection accuracy over TML techniques. These same neural network architectures can be extended to process unlabeled spatiotemporal streaming using online learning. Space and time relationships are further exploited to provide additional insights and increased anomaly detection accuracy. A novel domain-independent architecture and set of algorithms called the Spatiotemporal Anomaly Detection Environment (STADE) is formulated. STADE is based on federated learning architecture. STADE streaming algorithms are based on a geographically unique, persistently executing neural networks using online stochastic gradient descent (SGD). STADE is designed to be pluggable, meaning that alternative algorithms may be substituted or combined to form an ensemble. STADE incorporates a Stream Anomaly Detector (SAD) and a Federated Anomaly Detector (FAD). The SAD executes at multiple locations on streaming data, while the FAD executes at a single server and identifies global patterns and relationships among the site anomalies. Each STADE site streams anomaly scores to the centralized FAD server for further spatiotemporal dependency analysis and logging. The FAD is based on recent advances in DNN-based federated learning. A STADE testbed is implemented to facilitate globally distributed experimentation using low-cost, commercial cloud infrastructure provided by Microsoftâ„¢. STADE testbed sites are situated in the cloud within each continent: Africa, Asia, Australia, Europe, North America, and South America. Communication occurs over the commercial internet. Three STADE case studies are investigated. The first case study processes commercial air traffic flows, the second case study processes global earthquake measurements, and the third case study processes social media (i.e., Twitterâ„¢) feeds. These case studies confirm that STADE is a viable architecture for the near real-time identification of anomalies in streaming data originating from (possibly) computationally disadvantaged, geographically dispersed sites. Moreover, the addition of the FAD provides enhanced anomaly detection capability. Since STADE is domain-independent, these findings can be easily extended to additional application domains and use cases

    TOWARDS A HOLISTIC EFFICIENT STACKING ENSEMBLE INTRUSION DETECTION SYSTEM USING NEWLY GENERATED HETEROGENEOUS DATASETS

    Get PDF
    With the exponential growth of network-based applications globally, there has been a transformation in organizations\u27 business models. Furthermore, cost reduction of both computational devices and the internet have led people to become more technology dependent. Consequently, due to inordinate use of computer networks, new risks have emerged. Therefore, the process of improving the speed and accuracy of security mechanisms has become crucial.Although abundant new security tools have been developed, the rapid-growth of malicious activities continues to be a pressing issue, as their ever-evolving attacks continue to create severe threats to network security. Classical security techniquesfor instance, firewallsare used as a first line of defense against security problems but remain unable to detect internal intrusions or adequately provide security countermeasures. Thus, network administrators tend to rely predominantly on Intrusion Detection Systems to detect such network intrusive activities. Machine Learning is one of the practical approaches to intrusion detection that learns from data to differentiate between normal and malicious traffic. Although Machine Learning approaches are used frequently, an in-depth analysis of Machine Learning algorithms in the context of intrusion detection has received less attention in the literature.Moreover, adequate datasets are necessary to train and evaluate anomaly-based network intrusion detection systems. There exist a number of such datasetsas DARPA, KDDCUP, and NSL-KDDthat have been widely adopted by researchers to train and evaluate the performance of their proposed intrusion detection approaches. Based on several studies, many such datasets are outworn and unreliable to use. Furthermore, some of these datasets suffer from a lack of traffic diversity and volumes, do not cover the variety of attacks, have anonymized packet information and payload that cannot reflect the current trends, or lack feature set and metadata.This thesis provides a comprehensive analysis of some of the existing Machine Learning approaches for identifying network intrusions. Specifically, it analyzes the algorithms along various dimensionsnamely, feature selection, sensitivity to the hyper-parameter selection, and class imbalance problemsthat are inherent to intrusion detection. It also produces a new reliable dataset labeled Game Theory and Cyber Security (GTCS) that matches real-world criteria, contains normal and different classes of attacks, and reflects the current network traffic trends. The GTCS dataset is used to evaluate the performance of the different approaches, and a detailed experimental evaluation to summarize the effectiveness of each approach is presented. Finally, the thesis proposes an ensemble classifier model composed of multiple classifiers with different learning paradigms to address the issue of detection accuracy and false alarm rate in intrusion detection systems

    Machine learning for Internet of Things data analysis: A survey

    Get PDF
    Rapid developments in hardware, software, and communication technologies have allowed the emergence of Internet-connected sensory devices that provide observation and data measurement from the physical world. By 2020, it is estimated that the total number of Internet-connected devices being used will be between 25 and 50 billion. As the numbers grow and technologies become more mature, the volume of data published will increase. Internet-connected devices technology, referred to as Internet of Things (IoT), continues to extend the current Internet by providing connectivity and interaction between the physical and cyber worlds. In addition to increased volume, the IoT generates Big Data characterized by velocity in terms of time and location dependency, with a variety of multiple modalities and varying data quality. Intelligent processing and analysis of this Big Data is the key to developing smart IoT applications. This article assesses the different machine learning methods that deal with the challenges in IoT data by considering smart cities as the main use case. The key contribution of this study is presentation of a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information. The potential and challenges of machine learning for IoT data analytics will also be discussed. A use case of applying Support Vector Machine (SVM) on Aarhus Smart City traffic data is presented for a more detailed exploration.Comment: Digital Communications and Networks (2017

    Detecção de anomalias na partilha de ficheiros em ambientes empresariais

    Get PDF
    File sharing is the activity of making archives (documents, videos, photos) available to other users. Enterprises use file sharing to make archives available to their employees or clients. The availability of these files can be done through an internal network, cloud service (external) or even Peer-to-Peer (P2P). Most of the time, the files within the file sharing service have sensitive information that cannot be disclosed. Equifax data breach attack exploited a zero-day attack that allowed arbitrary code execution, leading to a huge data breach as over 143 million user information was presumed compromised. Ransomware is a type of malware that encrypts computer data (documents, media, ...) making it inaccessible to the user, demanding a ransom for the decryption of the data. This type of malware has been a serious threat to enterprises. WannaCry and NotPetya are some examples of ransomware that had a huge impact on enterprises with big amounts of ransoms, for example WannaCry reached more than 142,361.51inransoms.Inthisdissertation,wepurposeasystemthatcandetectfilesharinganomalieslikeransomware(WannaCry,NotPetya)andtheft(Equifaxbreach),andalsotheirpropagation.Thesolutionconsistsofnetworkmonitoring,thecreationofcommunicationprofilesforeachuser/machine,ananalysisalgorithmusingmachinelearningandacountermeasuremechanismincaseananomalyisdetected.Partilhadeficheiroseˊaatividadededisponibilizarficheiros(documentos,vıˊdeos,fotos)autilizadores.Asempresasusamapartilhadeficheirosparadisponibilizarficheirosaosseusutilizadoresetrabalhadores.Adisponibilidadedestesficheirospodeserfeitaapartirdeumaredeinterna,servic\codenuvem(externo)ouateˊPonto−a−Ponto.Normalmente,osficheiroscontidosnoservic\codepartilhadeficheirosconte^mdadosconfidenciaisquena~opodemserdivulgados.Oataquedeviolac\ca~odedadosrealizadoaEquifaxexplorouumavulnerabilidadedediazeroquepermitiuexecuc\ca~odecoˊdigoarbitraˊrio,levandoaqueainformac\ca~ode143milho~esdeutilizadoresfossecomprometida.Ransomwareeˊumtipodemalwarequecifraosdadosdocomputador(documentos,multimeˊdia...)tornando−osinacessıˊveisaoutilizador,exigindoaesteumresgateparadecifraressesdados.Estetipodemalwaretemsidoumagrandeameac\caaˋsempresasatuais.WannaCryeNotPetyasa~oalgunsexemplosdeRansomwarequetiveramumgrandeimpactocomgrandesquantiasderesgate,WannaCryalcanc\coumaisde142,361.51 in ransoms. In this dissertation, we purpose a system that can detect file sharing anomalies like ransomware (WannaCry, NotPetya) and theft (Equifax breach), and also their propagation. The solution consists of network monitoring, the creation of communication profiles for each user/machine, an analysis algorithm using machine learning and a countermeasure mechanism in case an anomaly is detected.Partilha de ficheiros é a atividade de disponibilizar ficheiros (documentos, vídeos, fotos) a utilizadores. As empresas usam a partilha de ficheiros para disponibilizar ficheiros aos seus utilizadores e trabalhadores. A disponibilidade destes ficheiros pode ser feita a partir de uma rede interna, serviço de nuvem (externo) ou até Ponto-a-Ponto. Normalmente, os ficheiros contidos no serviço de partilha de ficheiros contêm dados confidenciais que não podem ser divulgados. O ataque de violação de dados realizado a Equifax explorou uma vulnerabilidade de dia zero que permitiu execução de código arbitrário, levando a que a informação de 143 milhões de utilizadores fosse comprometida. Ransomware é um tipo de malware que cifra os dados do computador (documentos, multimédia...) tornando-os inacessíveis ao utilizador, exigindo a este um resgate para decifrar esses dados. Este tipo de malware tem sido uma grande ameaça às empresas atuais. WannaCry e NotPetya são alguns exemplos de Ransomware que tiveram um grande impacto com grandes quantias de resgate, WannaCry alcançou mais de 142,361.51 em resgates. Neste tabalho, propomos um sistema que consiga detectar anomalias na partilha de ficheiros, como o ransomware (WannaCry, NotPetya) e roubo de dados (violação de dados Equifax), bem como a sua propagação. A solução consiste na monitorização da rede da empresa, na criação de perfis para cada utilizador/máquina, num algoritmo de machine learning para análise dos dados e num mecanismo que bloqueie a máquina afetada no caso de se detectar uma anomalia.Mestrado em Engenharia de Computadores e Telemátic

    A near-autonomous and incremental intrusion detection system through active learning of known and unknown attacks

    Full text link
    Intrusion detection is a traditional practice of security experts, however, there are several issues which still need to be tackled. Therefore, in this paper, after highlighting these issues, we present an architecture for a hybrid Intrusion Detection System (IDS) for an adaptive and incremental detection of both known and unknown attacks. The IDS is composed of supervised and unsupervised modules, namely, a Deep Neural Network (DNN) and the K-Nearest Neighbors (KNN) algorithm, respectively. The proposed system is near-autonomous since the intervention of the expert is minimized through the active learning (AL) approach. A query strategy for the labeling process is presented, it aims at teaching the supervised module to detect unknown attacks and improve the detection of the already-known attacks. This teaching is achieved through sliding windows (SW) in an incremental fashion where the DNN is retrained when the data is available over time, thus rendering the IDS adaptive to cope with the evolutionary aspect of the network traffic. A set of experiments was conducted on the CICIDS2017 dataset in order to evaluate the performance of the IDS, promising results were obtained.Comment: 6 pages, 3 figures, 32 references, conferenc
    • …
    corecore