58 research outputs found
Spatiotemporal Patterns and Predictability of Cyberattacks
Y.C.L. was supported by Air Force Office of Scientific Research (AFOSR) under grant no. FA9550-10-1-0083 and Army Research Office (ARO) under grant no. W911NF-14-1-0504. S.X. was supported by Army Research Office (ARO) under grant no. W911NF-13-1-0141. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Peer reviewedPublisher PD
DAG-Based Attack and Defense Modeling: Don't Miss the Forest for the Attack Trees
This paper presents the current state of the art on attack and defense
modeling approaches that are based on directed acyclic graphs (DAGs). DAGs
allow for a hierarchical decomposition of complex scenarios into simple, easily
understandable and quantifiable actions. Methods based on threat trees and
Bayesian networks are two well-known approaches to security modeling. However
there exist more than 30 DAG-based methodologies, each having different
features and goals. The objective of this survey is to present a complete
overview of graphical attack and defense modeling techniques based on DAGs.
This consists of summarizing the existing methodologies, comparing their
features and proposing a taxonomy of the described formalisms. This article
also supports the selection of an adequate modeling technique depending on user
requirements
Exploiting tactics, techniques, and procedures for malware detection
There has been a meteoric rise in the use of malware to perpetrate cybercrime and more generally, serve the interests of malicious actors. As a result, malware has evolved both in terms of its sheer variety and sophistication. There is hence a need for developing effective malware detection systems to counter this surge. Typically, most such systems nowadays are purely data-driven - they utilise Machine Learning (ML) based approaches which rely on large volumes of data, to spot patterns, detect anomalies, and thus detect malware. In this thesis, we propose a methodology for malware detection on networks that combines human domain knowledge with conventional malware detection approaches to more effectively identify, reason about, and be resilient to malware. Specifically, we use domain knowledge in the form of the Tactics, Techniques, and Procedures (TTPs) described in the MITRE ATT\&CK ontology of adversarial behaviour to build Network Intrusion Detection Systems (NIDS). Through the course of our research, we design and evaluate the first such NIDS that can effectively exploit TTPs for the purpose of malware detection. We then attempt to expand the scope of usability of these TTPs to systems other than our specialised NIDS, and develop a methodology that lets any generic ML-based NIDS exploit these TTPs as model features. We further expand and generalise our approach by modelling it as a multi-label classification problem, which enables us to: (i) detect malware more precisely on the basis of individual TTPs, and (ii) identify the malicious usage of uncommon or rarely-used TTPs. Throughout all our experiments, we rigorously evaluate all our systems on several metrics using large datasets of real-world malware and benign samples. We empirically demonstrate the usefulness of TTPs in the malware detection process, the benefits of a TTP-based approach in reasoning about malware and responding to various challenging conditions, and the overall robustness of our systems to adversarial attack. As a consequence, we establish and improve the state-of-the-art when it comes to detecting network-based malware using TTP-based information. This thesis overall represents a step forward in building automated systems that combine purely-data driven approaches with human expertise in the field of malware analysis
âAn Artificial Intelligence Framework for Supporting Coarse-Grained Workload Classification in Complex Virtual Environments
Cloud-based machine learning tools for enhanced Big Data applications}â, âwhere the main idea is that of predicting the ``\emph{next}'' \emph{workload} occurring against the target Cloud infrastructure via an innovative \emph{ensemble-based approach} that combines the effectiveness of different well-known \emph{classifiers} in order to enhance the whole accuracy of the final classificationâ, âwhich is very relevant at now in the specific context of \emph{Big Data}â. âThe so-called \emph{workload categorization problem} plays a critical role in improving the efficiency and reliability of Cloud-based big data applicationsâ. âImplementation-wiseâ, âour method proposes deploying Cloud entities that participate in the distributed classification approach on top of \emph{virtual machines}â, âwhich represent classical ``commodity'' settings for Cloud-based big data applicationsâ. âGiven a number of known reference workloadsâ, âand an unknown workloadâ, âin this paper we deal with the problem of finding the reference workload which is most similar to the unknown oneâ. âThe depicted scenario turns out to be useful in a plethora of modern information system applicationsâ. âWe name this problem as \emph{coarse-grained workload classification}â, âbecauseâ, âinstead of characterizing the unknown workload in terms of finer behaviorsâ, âsuch as CPUâ, âmemoryâ, âdiskâ, âor network intensive patternsâ, âwe classify the whole unknown workload as one of the (possible) reference workloadsâ. âReference workloads represent a category of workloads that are relevant in a given applicative environmentâ. âIn particularâ, âwe focus our attention on the classification problem described above in the special case represented by \emph{virtualized environments}â. âTodayâ, â\emph{Virtual Machines} (VMs) have become very popular because they offer important advantages to modern computing environments such as cloud computing or server farmsâ. âIn virtualization frameworksâ, âworkload classification is very useful for accountingâ, âsecurity reasonsâ, âor user profilingâ. âHenceâ, âour research makes more sense in such environmentsâ, âand it turns out to be very useful in a special context like Cloud Computingâ, âwhich is emerging nowâ. âIn this respectâ, âour approach consists of running several machine learning-based classifiers of different workload modelsâ, âand then deriving the best classifier produced by the \emph{Dempster-Shafer Fusion}â, âin order to magnify the accuracy of the final classificationâ. âExperimental assessment and analysis clearly confirm the benefits derived from our classification frameworkâ. âThe running programs which produce unknown workloads to be classified are treated in a similar wayâ. âA fundamental aspect of this paper concerns the successful use of data fusion in workload classificationâ. âDifferent types of metrics are in fact fused together using the Dempster-Shafer theory of evidence combinationâ, âgiving a classification accuracy of slightly less than â. âThe acquisition of data from the running processâ, âthe pre-processing algorithmsâ, âand the workload classification are described in detailâ. âVarious classical algorithms have been used for classification to classify the workloadsâ, âand the results are comparedâ
Avatar captcha : telling computers and humans apart via face classification and mouse dynamics.
Bots are malicious, automated computer programs that execute malicious scripts and predefined functions on an affected computer. They pose cybersecurity threats and are one of the most sophisticated and common types of cybercrime tools today. They spread viruses, generate spam, steal personal sensitive information, rig online polls and commit other types of online crime and fraud. They sneak into unprotected systems through the Internet by seeking vulnerable entry points. They access the systemâs resources like a human user does. Now the question arises how do we counter this? How do we prevent bots and on the other hand allow human users to access the system resources? One solution is by designing a CAPTCHA (Completely Automated Public Turing Tests to tell Computers and Humans Apart), a program that can generate and grade tests that most humans can pass but computers cannot. It is used as a tool to distinguish humans from malicious bots. They are a class of Human Interactive Proofs (HIPs) meant to be easily solvable by humans and economically infeasible for computers. Text CAPTCHAs are very popular and commonly used. For each challenge, they generate a sequence of alphabets by distorting standard fonts, requesting users to identify them and type them out. However, they are vulnerable to character segmentation attacks by bots, English language dependent and are increasingly becoming too complex for people to solve. A solution to this is to design Image CAPTCHAs that use images instead of text and require users to identify certain images to solve the challenges. They are user-friendly and convenient for human users and a much more challenging problem for bots to solve. In todayâs Internet world the role of user profiling or user identification has gained a lot of significance. Identity thefts, etc. can be prevented by providing authorized access to resources. To achieve timely response to a security breach frequent user verification is needed. However, this process must be passive, transparent and non-obtrusive. In order for such a system to be practical it must be accurate, efficient and difficult to forge. Behavioral biometric systems are usually less prominent however, they provide numerous and significant advantages over traditional biometric systems. Collection of behavior data is non-obtrusive and cost-effective as it requires no special hardware. While these systems are not unique enough to provide reliable human identification, they have shown to be highly accurate in identity verification. In accomplishing everyday tasks, human beings use different styles, strategies, apply unique skills and knowledge, etc. These define the behavioral traits of the user. Behavioral biometrics attempts to quantify these traits to profile users and establish their identity. Human computer interaction (HCI)-based biometrics comprise of interaction strategies and styles between a human and a computer. These unique user traits are quantified to build profiles for identification. A specific category of HCI-based biometrics is based on recording human interactions with mouse as the input device and is known as Mouse Dynamics. By monitoring the mouse usage activities produced by a user during interaction with the GUI, a unique profile can be created for that user that can help identify him/her. Mouse-based verification approaches do not record sensitive user credentials like usernames and passwords. Thus, they avoid privacy issues. An image CAPTCHA is proposed that incorporates Mouse Dynamics to help fortify it. It displays random images obtained from Yahooâs Flickr. To solve the challenge the user must identify and select a certain class of images. Two theme-based challenges have been designed. They are Avatar CAPTCHA and Zoo CAPTCHA. The former displays human and avatar faces whereas the latter displays different animal species. In addition to the dynamically selected images, while attempting to solve the CAPTCHA, the way each user interacts with the mouse i.e. mouse clicks, mouse movements, mouse cursor screen co-ordinates, etc. are recorded nonobtrusively at regular time intervals. These recorded mouse movements constitute the Mouse Dynamics Signature (MDS) of the user. This MDS provides an additional secure technique to segregate humans from bots. The security of the CAPTCHA is tested by an adversary executing a mouse bot attempting to solve the CAPTCHA challenges
Improving the accuracy of spoofed traffic inference in inter-domain traffic
Ascertaining that a network will forward spoofed traffic usually requires an active probing vantage point in that network, effectively preventing a comprehensive view of this global Internet vulnerability. We argue that broader visibility into the spoofing problem may lie in the capability to infer lack of Source Address Validation (SAV) compliance from large, heavily aggregated Internet traffic data, such as traffic observable at Internet Exchange Points (IXPs). The key idea is to use IXPs as observatories to detect spoofed packets, by leveraging Autonomous System (AS) topology knowledge extracted from Border Gateway Protocol (BGP) data to infer which source addresses should legitimately appear across parts of the IXP switch fabric. In this thesis, we demonstrate that the existing literature does not capture several fundamental challenges to this approach, including noise in BGP data sources, heuristic AS relationship inference, and idiosyncrasies in IXP interconnec- tivity fabrics. We propose Spoofer-IX, a novel methodology to navigate these challenges, leveraging Customer Cone semantics of AS relationships to guide precise classification of inter-domain traffic as In-cone, Out-of-cone ( spoofed ), Unverifiable, Bogon, and Unas- signed. We apply our methodology on extensive data analysis using real traffic data from two distinct IXPs in Brazil, a mid-size and a large-size infrastructure. In the mid-size IXP with more than 200 members, we find an upper bound volume of Out-of-cone traffic to be more than an order of magnitude less than the previous method inferred on the same data, revealing the practical importance of Customer Cone semantics in such analysis. We also found no significant improvement in deployment of SAV in networks using the mid-size IXP between 2017 and 2019. In hopes that our methods and tools generalize to use by other IXPs who want to avoid use of their infrastructure for launching spoofed-source DoS attacks, we explore the feasibility of scaling the system to larger and more diverse IXP infrastructures. To promote this goal, and broad replicability of our results, we make the source code of Spoofer-IX publicly available. This thesis illustrates the subtleties of scientific assessments of operational Internet infrastructure, and the need for a community focus on reproducing and repeating previous methods.A constatação de que uma rede encaminharĂĄ trĂĄfego falsificado geralmente requer um ponto de vantagem ativo de medição nessa rede, impedindo efetivamente uma visĂŁo abrangente dessa vulnerabilidade global da Internet. Isto posto, argumentamos que uma visibilidade mais ampla do problema de spoofing pode estar na capacidade de inferir a falta de conformidade com as prĂĄticas de Source Address Validation (SAV) a partir de dados de trĂĄfego da Internet altamente agregados, como o trĂĄfego observĂĄvel nos Internet Exchange Points (IXPs). A ideia chave ĂŠ usar IXPs como observatĂłrios para detectar pacotes falsificados, aproveitando o conhecimento da topologia de sistemas autĂ´nomos extraĂdo dos dados do protocolo BGP para inferir quais endereços de origem devem aparecer legitimamente nas comunicaçþes atravĂŠs da infra-estrutura de um IXP. Nesta tese, demonstramos que a literatura existente nĂŁo captura diversos desafios fundamentais para essa abordagem, incluindo ruĂdo em fontes de dados BGP, inferĂŞncia heurĂstica de relacionamento de sistemas autĂ´nomos e caracterĂsticas especĂficas de interconectividade nas infraestruturas de IXPs. Propomos o Spoofer-IX, uma nova metodologia para superar esses desafios, utilizando a semântica do Customer Cone de relacionamento de sistemas autĂ´nomos para guiar com precisĂŁo a classificação de trĂĄfego inter-domĂnio como In-cone, Out-of-cone ( spoofed ), Unverifiable, Bogon, e Unassigned. Aplicamos nossa metodologia em anĂĄlises extensivas sobre dados reais de trĂĄfego de dois IXPs distintos no Brasil, uma infraestrutura de mĂŠdio porte e outra de grande porte. No IXP de tamanho mĂŠdio, com mais de 200 membros, encontramos um limite superior do volume de trĂĄfego Out-of-cone uma ordem de magnitude menor que o mĂŠtodo anterior inferiu sob os mesmos dados, revelando a importância prĂĄtica da semântica do Customer Cone em tal anĂĄlise. AlĂŠm disso, nĂŁo encontramos melhorias significativas na implantação do Source Address Validation (SAV) em redes usando o IXP de tamanho mĂŠdio entre 2017 e 2019. Na esperança de que nossos mĂŠtodos e ferramentas sejam aplicĂĄveis para uso por outros IXPs que desejam evitar o uso de sua infraestrutura para iniciar ataques de negação de serviço atravĂŠs de pacotes de origem falsificada, exploramos a viabilidade de escalar o sistema para infraestruturas IXP maiores e mais diversas. Para promover esse objetivo e a ampla replicabilidade de nossos resultados, disponibilizamos publicamente o cĂłdigo fonte do Spoofer-IX. Esta tese ilustra as sutilezas das avaliaçþes cientĂficas da infraestrutura operacional da Internet e a necessidade de um foco da comunidade na reprodução e repetição de mĂŠtodos anteriores
A Multi-Agent Systems Approach for Analysis of Stepping Stone Attacks
Stepping stone attacks are one of the most sophisticated cyber-attacks, in which attackers make a chain of compromised hosts to reach a victim target. In this Dissertation, an analytic model with Multi-Agent systems approach has been proposed to analyze the propagation of stepping stones attacks in dynamic vulnerability graphs. Because the vulnerability configuration in a network is inherently dynamic, in this Dissertation a biased min-consensus technique for dynamic graphs with fixed and switching topology is proposed as a distributed technique to calculate the most vulnerable path for stepping stones attacks in dynamic vulnerability graphs. We use min-plus algebra to analyze and provide necessary and sufficient convergence conditions to the shortest path in the fixed topology case. A necessary condition for the switching topology case is provided.
Most cyber-attacks involve an attacker launching a multi-stage attack by exploiting a sequence of hosts. This multi-stage attack generates a chain of ``stepping stonesâ from the origin to target. The choice of stepping stones is a function of the degree of exploitability, the impact, attackerâs capability, masking origin location, and intent. In this Dissertation, we model and analyze scenarios wherein an attacker employs multiple strategies to choose stepping stones. The problem is modeled as an Adjacency Quadratic Shortest Path using dynamic vulnerability graphs with multi-agent dynamic system approach. With this approach, the shortest stepping stone path with maximum node degree and the shortest stepping stone path with maximum impact are modeled and analyzed.
Because embedded controllers are omnipresent in networks, in this Dissertation as a Risk Mitigation Strategy, a cyber-attack tolerant control strategy for embedded controllers is proposed. A dual redundant control architecture that combines two identical controllers that are switched periodically between active and restart modes is proposed. The strategy is addressed to mitigate the impact due to the corruption of the controller software by an adversary. We analyze the impact of the resetting and restarting the controller software and performance of the switching process. The minimum requirements in the control design, for effective mitigation of cyber-attacks to the control software that implies a âfastâ switching period is provided. The simulation results demonstrate the effectiveness of the proposed strategy when the time to fully reset and restart the controller is faster than the time taken by an adversary to compromise the controller. The results also provide insights into the stability and safety regions and the factors that determine the effectiveness of the proposed strategy
Multibiometric security in wireless communication systems
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 05/08/2010.This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and
WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to authorized users only is provided by a technique referred to hereafter as multibiometric cryptosystem. In brief, the system is built upon a complete challenge/response methodology in order to obtain a high level of security on the basis of user identification by fingerprint and further confirmation by verification of the user through text-dependent speaker recognition.
First is the enrolment phase by which the database of watermarked fingerprints with
memorable texts along with the voice features, based on the same texts, is created by sending them to the server through wireless channel.
Later is the verification stage at which claimed users, ones who claim are genuine, are verified against the database, and it consists of five steps. Initially faced by the identification level, one is asked to first present oneâs fingerprint and a memorable word, former is watermarked into latter, in order for system to authenticate the fingerprint and verify the validity of it by retrieving the challenge for accepted user.
The following three steps then involve speaker recognition including the user
responding to the challenge by text-dependent voice, server authenticating the response, and finally server accepting/rejecting the user.
In order to implement fingerprint watermarking, i.e. incorporating the memorable word as a watermark message into the fingerprint image, an algorithm of five steps has been developed. The first three novel steps having to do with the fingerprint
image enhancement (CLAHE with 'Clip Limit', standard deviation analysis and
sliding neighborhood) have been followed with further two steps for embedding, and
extracting the watermark into the enhanced fingerprint image utilising Discrete
Wavelet Transform (DWT).
In the speaker recognition stage, the limitations of this technique in wireless
communication have been addressed by sending voice feature (cepstral coefficients)
instead of raw sample. This scheme is to reap the advantages of reducing the
transmission time and dependency of the data on communication channel, together
with no loss of packet. Finally, the obtained results have verified the claims
Dynamic Protocol Reverse Engineering a Grammatical Inference Approach
Round trip engineering of software from source code and reverse engineering of software from binary files have both been extensively studied and the state-of-practice have documented tools and techniques. Forward engineering of protocols has also been extensively studied and there are firmly established techniques for generating correct protocols. While observation of protocol behavior for performance testing has been studied and techniques established, reverse engineering of protocol control flow from observations of protocol behavior has not received the same level of attention. State-of-practice in reverse engineering the control flow of computer network protocols is comprised of mostly ad hoc approaches. We examine state-of-practice tools and techniques used in three open source projects: Pidgin, Samba, and rdesktop . We examine techniques proposed by computational learning researchers for grammatical inference. We propose to extend the state-of-art by inferring protocol control flow using grammatical inference inspired techniques to reverse engineer automata representations from captured data flows. We present evidence that grammatical inference is applicable to the problem domain under consideration
- âŚ