152 research outputs found
A pattern-driven corpus to predictive analytics in mitigating SQL injection attack
The back-end database provides accessible and structured storage for each web application’s big data internet web traffic exchanges stemming from cloud-hosted web applications to the Internet of Things (IoT) smart devices in emerging computing. Structured Query Language Injection Attack (SQLIA) remains an intruder’s exploit of choice to steal confidential information from the database of vulnerable front-end web applications with potentially damaging security ramifications.Existing solutions to SQLIA still follows the on-premise web applications server hosting concept which were primarily developed before the recent challenges of the big data mining and as such lack the functionality and ability to cope with new attack signatures concealed in a large volume of web requests. Also, most organisations’ databases and services infrastructure no longer reside on-premise as internet cloud-hosted applications and services are increasingly used which limit existing Structured Query Language Injection (SQLI) detection and prevention approaches that rely on source code scanning. A bio-inspired approach such as Machine Learning (ML) predictive analytics provides functional and scalable mining for big data in the detection and prevention of SQLI in intercepting large volumes of web requests. Unfortunately, lack of availability of robust ready-made data set with patterns and historical data items to train a classifier are issues well known in SQLIA research applying ML in the field of Artificial Intelligence (AI). The purpose-built competition-driven test case data sets are antiquated and not pattern-driven to train a classifier for real-world application. Also, the web application types are so diverse to have an all-purpose generic data set for ML SQLIA mitigation.This thesis addresses the lack of pattern-driven data set by deriving one to predict SQLIA of any size and proposing a technique to obtain a data set on the fly and break the circle of relying on few outdated competitions-driven data sets which exist are not meant to benchmark real-world SQLIA mitigation. The thesis in its contributions derived pattern-driven data set of related member strings that are used in training a supervised learning model with validation through Receiver Operating Characteristic (ROC) curve and Confusion Matrix (CM) with results of low false positives and negatives. We further the evaluations with cross-validation to have obtained a low variance in accuracy that indicates of a successful trained model using the derived pattern-driven data set capable of generalisation of unknown data in the real-world with reduced biases. Also, we demonstrated a proof of concept with a test application by implementing an ML Predictive Analytics to SQLIA detection and prevention using this pattern-driven data set in a test web application. We observed in the experiments carried out in the course of this thesis, a data set of related member strings can be generated from a web expected input data and SQL tokens, including known SQLI signatures. The data set extraction ontology proposed in this thesis for applied ML in SQLIA mitigation in the context of emerging computing of big data internet, and cloud-hosted services set our proposal apart from existing approaches that were mostly on-premise source code scanning and queries structure comparisons of some sort
Recommended from our members
The entangled cyberspace: an integrated approach for predicting cyber-attacks
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonSignificant studies in cyber defence analysis have predominantly revolved around a single linear analysis of information from a single source of evidence (The Network). These studies were limited in their ability to understand the dynamics of entanglements related to cyber-incidents. This research integrates evidence beyond the network in an attempt to understand and predict phases of the kill-chain across the information space.
This research provides a multi-dimensional phased analysis of the traditional kill-chain model using structural vector autoregressive models. In the ‘Entangled Cyberspace Framework’, each phase of the kill-chain corresponds to a single dimension of the information space based on time observations of certain events. Events are represented as time signals, where each phase is characterised by multiple time signals representing multiple events on that phase. Multiple time signals are analysed using structural models for multiple time series analysis (Vector Auto-Regressive models). At each phase of the kill-chain, we perform a lagged co-integration analysis of events across the information space. This nature of analysis detects hidden entanglements that characterise events in the kill-chain beyond the network. The measured prediction accuracy and error measured at each stage of the experiment represents the usefulness of selected events in characterising the defined stage of the kill-chain.
The entangled cyberspace, in theory, is the fusion of three conceptual foundations: a) A multi-dimensional characterisation of cyberspace, b) A sequential phased model for perpetrating cyber-attacks and c) A structural model for integrating and simultaneously analysing multiple sources of evidence. It starts with the characterisation of the information space into different dimensions of interest. The framework goes further to identify evidence sources across these characterised dimensions and integrates them in the analytical context under consideration (e.g. Malware Injection).
The concrete findings show that our approach and analytical methodology are capable of detecting entanglements when applied to a set of entangled activities across the information space. The findings also prove that activities beyond the network have significant effects on the nature of the unfolding cyber-attack vector. The predictive features of events across the kill-chain were also presented in this research as opinion and emotion drivers on the social dimension, packet data details and social and cultural events on the economic layer. Finally, co-integration detected between events across and within dimensions of the information space proves the existence of both inter-dimensional and intra-dimensional entanglements that affect the nature of events unfolding during the kill-chain (from the adversary’s point of view).
The novelty of this research rests in the ability to hop across the information space for detecting evidential clues of activities that are related-to cyber-incidents. This research also expands the standard multi-dimensional information space to include SPEC factors as indicators of cyber-incidents. This research improves the current information security management model, specifically in the monitoring, analysis and detection phases. This research provides a methodology that accommodates a robust evidence base for understanding the attack surface. Practically, this research provides a basis for creating applications and tools for protecting critical national infrastructure by integrating data from social platforms, real-world political, cultural and economic events and the cyber-physical
Unraveling Attacks in Machine Learning-based IoT Ecosystems: A Survey and the Open Libraries Behind Them
The advent of the Internet of Things (IoT) has brought forth an era of
unprecedented connectivity, with an estimated 80 billion smart devices expected
to be in operation by the end of 2025. These devices facilitate a multitude of
smart applications, enhancing the quality of life and efficiency across various
domains. Machine Learning (ML) serves as a crucial technology, not only for
analyzing IoT-generated data but also for diverse applications within the IoT
ecosystem. For instance, ML finds utility in IoT device recognition, anomaly
detection, and even in uncovering malicious activities. This paper embarks on a
comprehensive exploration of the security threats arising from ML's integration
into various facets of IoT, spanning various attack types including membership
inference, adversarial evasion, reconstruction, property inference, model
extraction, and poisoning attacks. Unlike previous studies, our work offers a
holistic perspective, categorizing threats based on criteria such as adversary
models, attack targets, and key security attributes (confidentiality,
availability, and integrity). We delve into the underlying techniques of ML
attacks in IoT environment, providing a critical evaluation of their mechanisms
and impacts. Furthermore, our research thoroughly assesses 65 libraries, both
author-contributed and third-party, evaluating their role in safeguarding model
and data privacy. We emphasize the availability and usability of these
libraries, aiming to arm the community with the necessary tools to bolster
their defenses against the evolving threat landscape. Through our comprehensive
review and analysis, this paper seeks to contribute to the ongoing discourse on
ML-based IoT security, offering valuable insights and practical solutions to
secure ML models and data in the rapidly expanding field of artificial
intelligence in IoT
Advanced Threat Intelligence: Interpretation of Anomalous Behavior in Ubiquitous Kernel Processes
Targeted attacks on digital infrastructures are a rising threat against the confidentiality, integrity, and availability of both IT systems and sensitive data. With the emergence of advanced persistent threats (APTs), identifying and understanding such attacks has become an increasingly difficult task. Current signature-based systems are heavily reliant on fixed patterns that struggle with unknown or evasive applications, while behavior-based solutions usually leave most of the interpretative work to a human analyst.
This thesis presents a multi-stage system able to detect and classify anomalous behavior within a user session by observing and analyzing ubiquitous kernel processes. Application candidates suitable for monitoring are initially selected through an adapted sentiment mining process using a score based on the log likelihood ratio (LLR). For transparent anomaly detection within a corpus of associated events, the author utilizes star structures, a bipartite representation designed to approximate the edit distance between graphs. Templates describing nominal behavior are generated automatically and are used for the computation of both an anomaly score and a report containing all deviating events. The extracted anomalies are classified using the Random Forest (RF) and Support Vector Machine (SVM) algorithms. Ultimately, the newly labeled patterns are mapped to a dedicated APT attacker–defender model that considers objectives, actions, actors, as well as assets, thereby bridging the gap between attack indicators and detailed threat semantics. This enables both risk assessment and decision support
for mitigating targeted attacks.
Results show that the prototype system is capable of identifying 99.8% of all star structure anomalies as benign or malicious. In multi-class scenarios that seek to associate each anomaly with a distinct attack pattern belonging to a particular APT stage we achieve a solid accuracy of 95.7%. Furthermore, we demonstrate that 88.3% of observed attacks could be identified by analyzing and classifying a single ubiquitous Windows process for a mere 10 seconds, thereby eliminating the necessity to monitor each and every (unknown) application running on a system.
With its semantic take on threat detection and classification, the proposed system offers a formal as well as technical solution to an information security challenge of great significance.The financial support by the Christian Doppler Research Association, the Austrian Federal Ministry for Digital and Economic Affairs, and the National Foundation for Research, Technology and Development is gratefully acknowledged
Cyber Security of Critical Infrastructures
Critical infrastructures are vital assets for public safety, economic welfare, and the national security of countries. The vulnerabilities of critical infrastructures have increased with the widespread use of information technologies. As Critical National Infrastructures are becoming more vulnerable to cyber-attacks, their protection becomes a significant issue for organizations as well as nations. The risks to continued operations, from failing to upgrade aging infrastructure or not meeting mandated regulatory regimes, are considered highly significant, given the demonstrable impact of such circumstances. Due to the rapid increase of sophisticated cyber threats targeting critical infrastructures with significant destructive effects, the cybersecurity of critical infrastructures has become an agenda item for academics, practitioners, and policy makers. A holistic view which covers technical, policy, human, and behavioural aspects is essential to handle cyber security of critical infrastructures effectively. Moreover, the ability to attribute crimes to criminals is a vital element of avoiding impunity in cyberspace. In this book, both research and practical aspects of cyber security considerations in critical infrastructures are presented. Aligned with the interdisciplinary nature of cyber security, authors from academia, government, and industry have contributed 13 chapters. The issues that are discussed and analysed include cybersecurity training, maturity assessment frameworks, malware analysis techniques, ransomware attacks, security solutions for industrial control systems, and privacy preservation methods
Security and trust in cloud computing and IoT through applying obfuscation, diversification, and trusted computing technologies
Cloud computing and Internet of Things (IoT) are very widely spread and commonly used technologies nowadays. The advanced services offered by cloud computing have made it a highly demanded technology.
Enterprises and businesses are more and more relying on the cloud to deliver services to their customers. The prevalent use of cloud means that more data is stored outside the organization’s premises, which raises concerns about the security and privacy of the stored and processed data. This highlights the significance of effective security practices to secure the cloud infrastructure.
The number of IoT devices is growing rapidly and the technology is being employed in a wide range of sectors including smart healthcare, industry automation, and smart environments. These devices collect and exchange a great deal of information, some of which may contain critical and personal data of the users of the device. Hence, it is highly significant to protect the collected and shared data over the network; notwithstanding, the studies signify that attacks on these devices are increasing, while a high percentage of IoT devices lack proper security measures to protect the devices, the data, and the privacy of the users.
In this dissertation, we study the security of cloud computing and IoT and propose software-based security approaches supported by the hardware-based technologies to provide robust measures for enhancing the security of these environments. To achieve this goal, we use obfuscation and diversification as the potential software security techniques. Code obfuscation protects the software from malicious reverse engineering and diversification mitigates the risk of large-scale exploits. We study trusted computing and Trusted Execution Environments (TEE) as the hardware-based security solutions. Trusted Platform Module (TPM) provides security and trust through a hardware root of trust, and assures the integrity of a platform. We also study Intel SGX which is a TEE solution that guarantees the integrity and confidentiality of the code and data loaded onto its protected container, enclave.
More precisely, through obfuscation and diversification of the operating systems and APIs of the IoT devices, we secure them at the application level, and by obfuscation and diversification of the communication protocols, we protect the communication of data between them at the network level. For securing the cloud computing, we employ obfuscation and diversification techniques for securing the cloud computing software at the client-side. For an enhanced level of security, we employ hardware-based security solutions, TPM and SGX. These solutions, in addition to security, ensure layered trust in various layers from hardware to the application.
As the result of this PhD research, this dissertation addresses a number of security risks targeting IoT and cloud computing through the delivered publications and presents a brief outlook on the future research directions.Pilvilaskenta ja esineiden internet ovat nykyään hyvin tavallisia ja laajasti sovellettuja tekniikkoja. Pilvilaskennan pitkälle kehittyneet palvelut ovat tehneet siitä hyvin kysytyn teknologian. Yritykset enenevässä määrin nojaavat pilviteknologiaan toteuttaessaan palveluita asiakkailleen. Vallitsevassa pilviteknologian soveltamistilanteessa yritykset ulkoistavat tietojensa käsittelyä yrityksen ulkopuolelle, minkä voidaan nähdä nostavan esiin huolia taltioitavan ja käsiteltävän tiedon turvallisuudesta ja yksityisyydestä. Tämä korostaa tehokkaiden turvallisuusratkaisujen merkitystä osana pilvi-infrastruktuurin turvaamista.
Esineiden internet -laitteiden lukumäärä on nopeasti kasvanut. Teknologiana sitä sovelletaan laajasti monilla sektoreilla, kuten älykkäässä terveydenhuollossa, teollisuusautomaatiossa ja älytiloissa. Sellaiset laitteet keräävät ja välittävät suuria määriä informaatiota, joka voi sisältää laitteiden käyttäjien kannalta kriittistä ja yksityistä tietoa. Tästä syystä johtuen on erittäin merkityksellistä suojata verkon yli kerättävää ja jaettavaa tietoa. Monet tutkimukset osoittavat esineiden internet -laitteisiin kohdistuvien tietoturvahyökkäysten määrän olevan nousussa, ja samaan aikaan suuri osuus näistä laitteista ei omaa kunnollisia teknisiä ominaisuuksia itse laitteiden tai niiden käyttäjien yksityisen tiedon suojaamiseksi.
Tässä väitöskirjassa tutkitaan pilvilaskennan sekä esineiden internetin tietoturvaa ja esitetään ohjelmistopohjaisia tietoturvalähestymistapoja turvautumalla osittain laitteistopohjaisiin teknologioihin. Esitetyt lähestymistavat tarjoavat vankkoja keinoja tietoturvallisuuden kohentamiseksi näissä konteksteissa. Tämän saavuttamiseksi työssä sovelletaan obfuskaatiota ja diversifiointia potentiaalisiana ohjelmistopohjaisina tietoturvatekniikkoina. Suoritettavan koodin obfuskointi suojaa pahantahtoiselta ohjelmiston takaisinmallinnukselta ja diversifiointi torjuu tietoturva-aukkojen laaja-alaisen hyödyntämisen riskiä. Väitöskirjatyössä tutkitaan luotettua laskentaa ja luotettavan laskennan suoritusalustoja laitteistopohjaisina tietoturvaratkaisuina. TPM (Trusted Platform Module) tarjoaa turvallisuutta ja luottamuksellisuutta rakentuen laitteistopohjaiseen luottamukseen. Pyrkimyksenä on taata suoritusalustan eheys. Työssä tutkitaan myös Intel SGX:ää yhtenä luotettavan suorituksen suoritusalustana, joka takaa suoritettavan koodin ja datan eheyden sekä luottamuksellisuuden pohjautuen suojatun säiliön, saarekkeen, tekniseen toteutukseen.
Tarkemmin ilmaistuna työssä turvataan käyttöjärjestelmä- ja sovellusrajapintatasojen obfuskaation ja diversifioinnin kautta esineiden internet -laitteiden ohjelmistokerrosta. Soveltamalla samoja tekniikoita protokollakerrokseen, työssä suojataan laitteiden välistä tiedonvaihtoa verkkotasolla. Pilvilaskennan turvaamiseksi työssä sovelletaan obfuskaatio ja diversifiointitekniikoita asiakaspuolen ohjelmistoratkaisuihin. Vankemman tietoturvallisuuden saavuttamiseksi työssä hyödynnetään laitteistopohjaisia TPM- ja SGX-ratkaisuja. Tietoturvallisuuden lisäksi nämä ratkaisut tarjoavat monikerroksisen luottamuksen rakentuen laitteistotasolta ohjelmistokerrokseen asti.
Tämän väitöskirjatutkimustyön tuloksena, osajulkaisuiden kautta, vastataan moniin esineiden internet -laitteisiin ja pilvilaskentaan kohdistuviin tietoturvauhkiin. Työssä esitetään myös näkemyksiä jatkotutkimusaiheista
Towards an Improved Understanding of Software Vulnerability Assessment Using Data-Driven Approaches
Software Vulnerabilities (SVs) can expose software systems to cyber-attacks, potentially causing enormous financial and reputational damage for organizations. There have been significant research efforts to detect these SVs so that developers can promptly fix them. However, fixing SVs is complex and time-consuming in practice, and thus developers usually do not have sufficient time and resources to fix all SVs at once. As a result, developers often need SV information, such as exploitability, impact, and overall severity, to prioritize fixing more critical SVs. Such information required for fixing planning and prioritization is typically provided in the SV assessment step of the SV lifecycle. Recently, data-driven methods have been increasingly proposed to automate SV assessment tasks. However, there are still numerous shortcomings with the existing studies on data-driven SV assessment that would hinder their application in practice. This PhD thesis aims to contribute to the growing literature in data-driven SV assessment by investigating and addressing the constant changes in SV data as well as the lacking considerations of source code and developers’ needs for SV assessment that impede the practical applicability of the field. Particularly, we have made the following five contributions in this thesis. (1) We systematize the knowledge of data-driven SV assessment to reveal the best practices of the field and the main challenges affecting its application in practice. Subsequently, we propose various solutions to tackle these challenges to better support the real-world applications of data-driven SV assessment. (2) We first demonstrate the existence of the concept drift (changing data) issue in descriptions of SV reports that current studies have mostly used for predicting the Common Vulnerability Scoring System (CVSS) metrics. We augment report-level SV assessment models with subwords of terms extracted from SV descriptions to help the models more effectively capture the semantics of ever-increasing SVs. (3) We also identify that SV reports are usually released after SV fixing. Thus, we propose using vulnerable code to enable earlier SV assessment without waiting for SV reports. We are the first to use Machine Learning techniques to predict CVSS metrics on the function level leveraging vulnerable statements directly causing SVs and their context in code functions. The performance of our function-level SV assessment models is promising, opening up research opportunities in this new direction. (4) To facilitate continuous integration of software code nowadays, we present a novel deep multi-task learning model, DeepCVA, to simultaneously and efficiently predict multiple CVSS assessment metrics on the commit level, specifically using vulnerability-contributing commits. DeepCVA is the first work that enables practitioners to perform SV assessment as soon as vulnerable changes are added to a codebase, supporting just-in-time prioritization of SV fixing. (5) Besides code artifacts produced from a software project of interest, SV assessment tasks can also benefit from SV crowdsourcing information on developer Question and Answer (Q&A) websites. We automatically retrieve large-scale security/SVrelated posts from these Q&A websites. We then apply a topic modeling technique on these posts to distill developers’ real-world SV concerns that can be used for data-driven SV assessment. Overall, we believe that this thesis has provided evidence-based knowledge and useful guidelines for researchers and practitioners to automate SV assessment using data-driven approaches.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
Application of Big Data Technology, Text Classification, and Azure Machine Learning for Financial Risk Management Using Data Science Methodology
Data science plays a crucial role in enabling organizations to optimize data-driven opportunities within financial risk management. It involves identifying, assessing, and mitigating risks, ultimately safeguarding investments, reducing uncertainty, ensuring regulatory compliance, enhancing decision-making, and fostering long-term sustainability. This thesis explores three facets of Data Science projects: enhancing customer understanding, fraud prevention, and predictive analysis, with the goal of improving existing tools and enabling more informed decision-making. The first project examined leveraged big data technologies, such as Hadoop and Spark, to enhance financial risk management by accurately predicting loan defaulters and their repayment likelihood. In the second project, we investigated risk assessment and fraud prevention within the financial sector, where Natural Language Processing and machine learning techniques were applied to classify emails into categories like spam, ham, and phishing. After training various models, their performance was rigorously evaluated. In the third project, we explored the utilization of Azure machine learning to identify loan defaulters, emphasizing the comparison of different machine learning algorithms for predictive analysis. The results aimed to determine the best-performing model by evaluating various performance metrics for the dataset. This study is important because it offers a strategy for enhancing risk management, preventing fraud, and encouraging innovation in the financial industry, ultimately resulting in better financial outcomes and enhanced customer protection
- …