79 research outputs found

    A Survey of Methods for Encrypted Traffic Classification and Analysis

    Get PDF
    With the widespread use of encrypted data transport network traffic encryption is becoming a standard nowadays. This presents a challenge for traffic measurement, especially for analysis and anomaly detection methods which are dependent on the type of network traffic. In this paper, we survey existing approaches for classification and analysis of encrypted traffic. First, we describe the most widespread encryption protocols used throughout the Internet. We show that the initiation of an encrypted connection and the protocol structure give away a lot of information for encrypted traffic classification and analysis. Then, we survey payload and feature-based classification methods for encrypted traffic and categorize them using an established taxonomy. The advantage of some of described classification methods is the ability to recognize the encrypted application protocol in addition to the encryption protocol. Finally, we make a comprehensive comparison of the surveyed feature-based classification methods and present their weaknesses and strengths.Šifrování síťového provozu se v dnešní době stalo standardem. To přináší vysoké nároky na monitorování síťového provozu, zejména pak na analýzu provozu a detekci anomálií, které jsou závislé na znalosti typu síťového provozu. V tomto článku přinášíme přehled existujících způsobů klasifikace a analýzy šifrovaného provozu. Nejprve popisujeme nejrozšířenější šifrovací protokoly, a ukazujeme, jakým způsobem lze získat informace pro analýzu a klasifikaci šifrovaného provozu. Následně se zabýváme klasifikačními metodami založenými na obsahu paketů a vlastnostech síťového provozu. Tyto metody klasifikujeme pomocí zavedené taxonomie. Výhodou některých popsaných klasifikačních metod je schopnost rozeznat nejen šifrovací protokol, ale také šifrovaný aplikační protokol. Na závěr porovnáváme silné a slabé stránky všech popsaných klasifikačních metod

    APIC: A method for automated pattern identification and classification

    Get PDF
    Machine Learning (ML) is a transformative technology at the forefront of many modern research endeavours. The technology is generating a tremendous amount of attention from researchers and practitioners, providing new approaches to solving complex classification and regression tasks. While concepts such as Deep Learning have existed for many years, the computational power for realising the utility of these algorithms in real-world applications has only recently become available. This dissertation investigated the efficacy of a novel, general method for deploying ML in a variety of complex tasks, where best feature selection, data-set labelling, model definition and training processes were determined automatically. Models were developed in an iterative fashion, evaluated using both training and validation data sets. The proposed method was evaluated using three distinct case studies, describing complex classification tasks often requiring significant input from human experts. The results achieved demonstrate that the proposed method compares with, and often outperforms, less general, comparable methods designed specifically for each task. Feature selection, data-set annotation, model design and training processes were optimised by the method, where less complex, comparatively accurate classifiers with lower dependency on computational power and human expert intervention were produced. In chapter 4, the proposed method demonstrated improved efficacy over comparable systems, automatically identifying and classifying complex application protocols traversing IP networks. In chapter 5, the proposed method was able to discriminate between normal and anomalous traffic, maintaining accuracy in excess of 99%, while reducing false alarms to a mere 0.08%. Finally, in chapter 6, the proposed method discovered more optimal classifiers than those implemented by comparable methods, with classification scores rivalling those achieved by state-of-the-art systems. The findings of this research concluded that developing a fully automated, general method, exhibiting efficacy in a wide variety of complex classification tasks with minimal expert intervention, was possible. The method and various artefacts produced in each case study of this dissertation are thus significant contributions to the field of ML

    Towards the Deployment of Machine Learning Solutions in Network Traffic Classification: A Systematic Survey

    Get PDF
    International audienceTraffic analysis is a compound of strategies intended to find relationships, patterns, anomalies, and misconfigurations, among others things, in Internet traffic. In particular, traffic classification is a subgroup of strategies in this field that aims at identifying the application's name or type of Internet traffic. Nowadays, traffic classification has become a challenging task due to the rise of new technologies, such as traffic encryption and encapsulation, which decrease the performance of classical traffic classification strategies. Machine Learning gains interest as a new direction in this field, showing signs of future success, such as knowledge extraction from encrypted traffic, and more accurate Quality of Service management. Machine Learning is fast becoming a key tool to build traffic classification solutions in real network traffic scenarios; in this sense, the purpose of this investigation is to explore the elements that allow this technique to work in the traffic classification field. Therefore, a systematic review is introduced based on the steps to achieve traffic classification by using Machine Learning techniques. The main aim is to understand and to identify the procedures followed by the existing works to achieve their goals. As a result, this survey paper finds a set of trends derived from the analysis performed on this domain; in this manner, the authors expect to outline future directions for Machine Learning based traffic classification

    TOWARDS A HOLISTIC EFFICIENT STACKING ENSEMBLE INTRUSION DETECTION SYSTEM USING NEWLY GENERATED HETEROGENEOUS DATASETS

    Get PDF
    With the exponential growth of network-based applications globally, there has been a transformation in organizations\u27 business models. Furthermore, cost reduction of both computational devices and the internet have led people to become more technology dependent. Consequently, due to inordinate use of computer networks, new risks have emerged. Therefore, the process of improving the speed and accuracy of security mechanisms has become crucial.Although abundant new security tools have been developed, the rapid-growth of malicious activities continues to be a pressing issue, as their ever-evolving attacks continue to create severe threats to network security. Classical security techniquesfor instance, firewallsare used as a first line of defense against security problems but remain unable to detect internal intrusions or adequately provide security countermeasures. Thus, network administrators tend to rely predominantly on Intrusion Detection Systems to detect such network intrusive activities. Machine Learning is one of the practical approaches to intrusion detection that learns from data to differentiate between normal and malicious traffic. Although Machine Learning approaches are used frequently, an in-depth analysis of Machine Learning algorithms in the context of intrusion detection has received less attention in the literature.Moreover, adequate datasets are necessary to train and evaluate anomaly-based network intrusion detection systems. There exist a number of such datasetsas DARPA, KDDCUP, and NSL-KDDthat have been widely adopted by researchers to train and evaluate the performance of their proposed intrusion detection approaches. Based on several studies, many such datasets are outworn and unreliable to use. Furthermore, some of these datasets suffer from a lack of traffic diversity and volumes, do not cover the variety of attacks, have anonymized packet information and payload that cannot reflect the current trends, or lack feature set and metadata.This thesis provides a comprehensive analysis of some of the existing Machine Learning approaches for identifying network intrusions. Specifically, it analyzes the algorithms along various dimensionsnamely, feature selection, sensitivity to the hyper-parameter selection, and class imbalance problemsthat are inherent to intrusion detection. It also produces a new reliable dataset labeled Game Theory and Cyber Security (GTCS) that matches real-world criteria, contains normal and different classes of attacks, and reflects the current network traffic trends. The GTCS dataset is used to evaluate the performance of the different approaches, and a detailed experimental evaluation to summarize the effectiveness of each approach is presented. Finally, the thesis proposes an ensemble classifier model composed of multiple classifiers with different learning paradigms to address the issue of detection accuracy and false alarm rate in intrusion detection systems

    Profiling and Identification of Web Applications in Computer Network

    Get PDF
    Characterising network traffic is a critical step for detecting network intrusion or misuse. The traditional way to identify the application associated with a set of traffic flows uses port number and DPI (Deep Packet Inspection), but it is affected by the use of dynamic ports and encryption. The research community proposed models for traffic classification that determined the most important requirements and recommendations for a successful approach. The suggested alternatives could be categorised into four techniques: port-based, packet payload based, host behavioural, and statistical-based. The traditional way to identifying traffic flows typically focuses on using IANA assigned port numbers and deep packet inspection (DPI). However, an increasing number of Internet applications nowadays that frequently use dynamic post assignments and encryption data traffic render these techniques in achieving real-time traffic identification. In recent years, two other techniques have been introduced, focusing on host behaviour and statistical methods, to avoid these limitations. The former technique is based on the idea that hosts generate different communication patterns at the transport layer; by extracting these behavioural patterns, activities and applications can be classified. However, it cannot correctly identify the application names, classifying both Yahoo and Gmail as email. Thereby, studies have focused on using statistical features approach for identifying traffic associated with applications based on machine learning algorithms. This method relies on characteristics of IP flows, minimising the overhead limitations associated with other schemes. Classification accuracy of statistical flow-based approaches, however, depends on the discrimination ability of the traffic features used. NetFlow represents the de-facto standard in monitoring and analysing network traffic, but the information it provides is not enough to describe the application behaviour. The primary challenge is to describe the activity within entirely and among network flows to understand application usage and user behaviour. This thesis proposes novel features to describe precisely a web application behaviour in order to segregate various user activities. Extracting the most discriminative features, which characterise web applications, is a key to gain higher accuracy without being biased by either users or network circumstances. This work investigates novel and superior features that characterize a behaviour of an application based on timing of arrival packets and flows. As part of describing the application behaviour, the research considered the on/off data transfer, defining characteristics for many typical applications, and the amount of data transferred or exchanged. Furthermore, the research considered timing and patterns for user events as part of a network application session. Using an extended set of traffic features output from traffic captures, a supervised machine learning classifier was developed. To this effect, the present work customised the popular tcptrace utility to generate classification features based on traffic burstiness and periods of inactivity for everyday Internet usage. A C5.0 decision tree classifier is applied using the proposed features for eleven different Internet applications, generated by ten users. Overall, the newly proposed features reported a significant level of accuracy (~98%) in classifying the respective applications. Afterwards, uncontrolled data collected from a real environment for a group of 20 users while accessing different applications was used to evaluate the proposed features. The evaluation tests indicated that the method has an accuracy of 87% in identifying the correct network application.Iraqi cultural Attach

    Denial-of-service attack modelling and detection for HTTP/2 services

    Get PDF
    Businesses and society alike have been heavily dependent on Internet-based services, albeit with experiences of constant and annoying disruptions caused by the adversary class. A malicious attack that can prevent establishment of Internet connections to web servers, initiated from legitimate client machines, is termed as a Denial of Service (DoS) attack; volume and intensity of which is rapidly growing thanks to the readily available attack tools and the ever-increasing network bandwidths. A majority of contemporary web servers are built on the HTTP/1.1 communication protocol. As a consequence, all literature found on DoS attack modelling and appertaining detection techniques, addresses only HTTP/1.x network traffic. This thesis presents a model of DoS attack traffic against servers employing the new communication protocol, namely HTTP/2. The HTTP/2 protocol significantly differs from its predecessor and introduces new messaging formats and data exchange mechanisms. This creates an urgent need to understand how malicious attacks including Denial of Service, can be launched against HTTP/2 services. Moreover, the ability of attackers to vary the network traffic models to stealthy affects web services, thereby requires extensive research and modelling. This research work not only provides a novel model for DoS attacks against HTTP/2 services, but also provides a model of stealthy variants of such attacks, that can disrupt routine web services. Specifically, HTTP/2 traffic patterns that consume computing resources of a server, such as CPU utilisation and memory consumption, were thoroughly explored and examined. The study presents four HTTP/2 attack models. The first being a flooding-based attack model, the second being a distributed model, the third and fourth are variant DoS attack models. The attack traffic analysis conducted in this study employed four machine learning techniques, namely Naïve Bayes, Decision Tree, JRip and Support Vector Machines. The HTTP/2 normal traffic model portrays online activities of human users. The model thus formulated was employed to also generate flash-crowd traffic, i.e. a large volume of normal traffic that incapacitates a web server, similar in fashion to a DoS attack, albeit with non-malicious intent. Flash-crowd traffic generated based on the defined model was used to populate the dataset of legitimate network traffic, to fuzz the machine learning-based attack detection process. The two variants of DoS attack traffic differed in terms of the traffic intensities and the inter-packet arrival delays introduced to better analyse the type and quality of DoS attacks that can be launched against HTTP/2 services. A detailed analysis of HTTP/2 features is also presented to rank relevant network traffic features for all four traffic models presented. These features were ranked based on legitimate as well as attack traffic observations conducted in this study. The study shows that machine learning-based analysis yields better classification performance, i.e. lower percentage of incorrectly classified instances, when the proposed HTTP/2 features are employed compared to when HTTP/1.1 features alone are used. The study shows how HTTP/2 DoS attack can be modelled, and how future work can extend the proposed model to create variant attack traffic models that can bypass intrusion-detection systems. Likewise, as the Internet traffic and the heterogeneity of Internet-connected devices are projected to increase significantly, legitimate traffic can yield varying traffic patterns, demanding further analysis. The significance of having current legitimate traffic datasets, together with the scope to extend the DoS attack models presented herewith, suggest that research in the DoS attack analysis and detection area will benefit from the work presented in this thesis

    Using Botnet Technologies to Counteract Network Traffic Analysis

    Get PDF
    Botnets have been problematic for over a decade. They are used to launch malicious activities including DDoS (Distributed-Denial-of-Service), spamming, identity theft, unauthorized bitcoin mining and malware distribution. A recent nation-wide DDoS attacks caused by the Mirai botnet on 10/21/2016 involving 10s of millions of IP addresses took down Twitter, Spotify, Reddit, The New York Times, Pinterest, PayPal and other major websites. In response to take-down campaigns by security personnel, botmasters have developed technologies to evade detection. The most widely used evasion technique is DNS fast-flux, where the botmaster frequently changes the mapping between domain names and IP addresses of the C&C server so that it will be too late or too costly to trace the C&C server locations. Domain names generated with Domain Generation Algorithms (DGAs) are used as the \u27rendezvous\u27 points between botmasters and bots. This work focuses on how to apply botnet technologies (fast-flux and DGA) to counteract network traffic analysis, therefore protecting user privacy. A better understanding of botnet technologies also helps us be pro-active in defending against botnets. First, we proposed two new DGAs using hidden Markov models (HMMs) and Probabilistic Context-Free Grammars (PCFGs) which can evade current detection methods and systems. Also, we developed two HMM-based DGA detection methods that can detect the botnet DGA-generated domain names with/without training sets. This helps security personnel understand the botnet phenomenon and develop pro-active tools to detect botnets. Second, we developed a distributed proxy system using fast-flux to evade national censorship and surveillance. The goal is to help journalists, human right advocates and NGOs in West Africa to have a secure and free Internet. Then we developed a covert data transport protocol to transform arbitrary message into real DNS traffic. We encode the message into benign-looking domain names generated by an HMM, which represents the statistical features of legitimate domain names. This can be used to evade Deep Packet Inspection (DPI) and protect user privacy in a two-way communication. Both applications serve as examples of applying botnet technologies to legitimate use. Finally, we proposed a new protocol obfuscation technique by transforming arbitrary network protocol into another (Network Time Protocol and a video game protocol of Minecraft as examples) in terms of packet syntax and side-channel features (inter-packet delay and packet size). This research uses botnet technologies to help normal users have secure and private communications over the Internet. From our botnet research, we conclude that network traffic is a malleable and artificial construct. Although existing patterns are easy to detect and characterize, they are also subject to modification and mimicry. This means that we can construct transducers to make any communication pattern look like any other communication pattern. This is neither bad nor good for security. It is a fact that we need to accept and use as best we can

    Cyber Security and Critical Infrastructures 2nd Volume

    Get PDF
    The second volume of the book contains the manuscripts that were accepted for publication in the MDPI Special Topic "Cyber Security and Critical Infrastructure" after a rigorous peer-review process. Authors from academia, government and industry contributed their innovative solutions, consistent with the interdisciplinary nature of cybersecurity. The book contains 16 articles, including an editorial that explains the current challenges, innovative solutions and real-world experiences that include critical infrastructure and 15 original papers that present state-of-the-art innovative solutions to attacks on critical systems

    Generative Methods, Meta-learning, and Meta-heuristics for Robust Cyber Defense

    Get PDF
    Cyberspace is the digital communications network that supports the internet of battlefield things (IoBT), the model by which defense-centric sensors, computers, actuators and humans are digitally connected. A secure IoBT infrastructure facilitates real time implementation of the observe, orient, decide, act (OODA) loop across distributed subsystems. Successful hacking efforts by cyber criminals and strategic adversaries suggest that cyber systems such as the IoBT are not secure. Three lines of effort demonstrate a path towards a more robust IoBT. First, a baseline data set of enterprise cyber network traffic was collected and modelled with generative methods allowing the generation of realistic, synthetic cyber data. Next, adversarial examples of cyber packets were algorithmically crafted to fool network intrusion detection systems while maintaining packet functionality. Finally, a framework is presented that uses meta-learning to combine the predictive power of various weak models. This resulted in a meta-model that outperforms all baseline classifiers with respect to overall accuracy of packets, and adversarial example detection rate. The National Defense Strategy underscores cybersecurity as an imperative to defend the homeland and maintain a military advantage in the information age. This research provides both academic perspective and applied techniques to to further the cybersecurity posture of the Department of Defense into the information age

    Detecção de anomalias na partilha de ficheiros em ambientes empresariais

    Get PDF
    File sharing is the activity of making archives (documents, videos, photos) available to other users. Enterprises use file sharing to make archives available to their employees or clients. The availability of these files can be done through an internal network, cloud service (external) or even Peer-to-Peer (P2P). Most of the time, the files within the file sharing service have sensitive information that cannot be disclosed. Equifax data breach attack exploited a zero-day attack that allowed arbitrary code execution, leading to a huge data breach as over 143 million user information was presumed compromised. Ransomware is a type of malware that encrypts computer data (documents, media, ...) making it inaccessible to the user, demanding a ransom for the decryption of the data. This type of malware has been a serious threat to enterprises. WannaCry and NotPetya are some examples of ransomware that had a huge impact on enterprises with big amounts of ransoms, for example WannaCry reached more than 142,361.51inransoms.Inthisdissertation,wepurposeasystemthatcandetectfilesharinganomalieslikeransomware(WannaCry,NotPetya)andtheft(Equifaxbreach),andalsotheirpropagation.Thesolutionconsistsofnetworkmonitoring,thecreationofcommunicationprofilesforeachuser/machine,ananalysisalgorithmusingmachinelearningandacountermeasuremechanismincaseananomalyisdetected.Partilhadeficheiroseˊaatividadededisponibilizarficheiros(documentos,vıˊdeos,fotos)autilizadores.Asempresasusamapartilhadeficheirosparadisponibilizarficheirosaosseusutilizadoresetrabalhadores.Adisponibilidadedestesficheirospodeserfeitaapartirdeumaredeinterna,servic\codenuvem(externo)ouateˊPontoaPonto.Normalmente,osficheiroscontidosnoservic\codepartilhadeficheirosconte^mdadosconfidenciaisquena~opodemserdivulgados.Oataquedeviolac\ca~odedadosrealizadoaEquifaxexplorouumavulnerabilidadedediazeroquepermitiuexecuc\ca~odecoˊdigoarbitraˊrio,levandoaqueainformac\ca~ode143milho~esdeutilizadoresfossecomprometida.Ransomwareeˊumtipodemalwarequecifraosdadosdocomputador(documentos,multimeˊdia...)tornandoosinacessıˊveisaoutilizador,exigindoaesteumresgateparadecifraressesdados.Estetipodemalwaretemsidoumagrandeameac\caaˋsempresasatuais.WannaCryeNotPetyasa~oalgunsexemplosdeRansomwarequetiveramumgrandeimpactocomgrandesquantiasderesgate,WannaCryalcanc\coumaisde142,361.51 in ransoms. In this dissertation, we purpose a system that can detect file sharing anomalies like ransomware (WannaCry, NotPetya) and theft (Equifax breach), and also their propagation. The solution consists of network monitoring, the creation of communication profiles for each user/machine, an analysis algorithm using machine learning and a countermeasure mechanism in case an anomaly is detected.Partilha de ficheiros é a atividade de disponibilizar ficheiros (documentos, vídeos, fotos) a utilizadores. As empresas usam a partilha de ficheiros para disponibilizar ficheiros aos seus utilizadores e trabalhadores. A disponibilidade destes ficheiros pode ser feita a partir de uma rede interna, serviço de nuvem (externo) ou até Ponto-a-Ponto. Normalmente, os ficheiros contidos no serviço de partilha de ficheiros contêm dados confidenciais que não podem ser divulgados. O ataque de violação de dados realizado a Equifax explorou uma vulnerabilidade de dia zero que permitiu execução de código arbitrário, levando a que a informação de 143 milhões de utilizadores fosse comprometida. Ransomware é um tipo de malware que cifra os dados do computador (documentos, multimédia...) tornando-os inacessíveis ao utilizador, exigindo a este um resgate para decifrar esses dados. Este tipo de malware tem sido uma grande ameaça às empresas atuais. WannaCry e NotPetya são alguns exemplos de Ransomware que tiveram um grande impacto com grandes quantias de resgate, WannaCry alcançou mais de 142,361.51 em resgates. Neste tabalho, propomos um sistema que consiga detectar anomalias na partilha de ficheiros, como o ransomware (WannaCry, NotPetya) e roubo de dados (violação de dados Equifax), bem como a sua propagação. A solução consiste na monitorização da rede da empresa, na criação de perfis para cada utilizador/máquina, num algoritmo de machine learning para análise dos dados e num mecanismo que bloqueie a máquina afetada no caso de se detectar uma anomalia.Mestrado em Engenharia de Computadores e Telemátic
    corecore