186 research outputs found

    Detecting AI generated text using neural networks

    Get PDF
    For humans, distinguishing machine generated text from human written text is men- tally taxing and slow. NLP models have been created to do this more effectively and faster. But, what if some adversarial changes have been added to the machine generated text? This thesis discusses this issue and text detectors in general. The primary goal of this thesis is to describe the current state of text detectors in research and to discuss a key adversarial issue in modern NLP transformers. To describe the current state of text detectors a Systematic Literature Review was done on 50 relevant papers to machine-centric detection in chapter 2. As for the key ad- versarial issue, chapter 3 describes an experiment where RoBERTa was used to test transformers against simple mutations which cause mislabelling. The state of the literature was written at length in the 2nd chapter, showing how viable text detection as a subject has become. Lastly, RoBERTa was shown to be vulnerable to mutation attacks. The solution was found to be fine-tuning it to some heuristics, as long as the mutations can be predicted the model can be fine tuned to detect them

    Artificial intelligence in the cyber domain: Offense and defense

    Get PDF
    Artificial intelligence techniques have grown rapidly in recent years, and their applications in practice can be seen in many fields, ranging from facial recognition to image analysis. In the cybersecurity domain, AI-based techniques can provide better cyber defense tools and help adversaries improve methods of attack. However, malicious actors are aware of the new prospects too and will probably attempt to use them for nefarious purposes. This survey paper aims at providing an overview of how artificial intelligence can be used in the context of cybersecurity in both offense and defense.Web of Science123art. no. 41

    Artificial Intelligence and Machine Learning in Cybersecurity: Applications, Challenges, and Opportunities for MIS Academics

    Get PDF
    The availability of massive amounts of data, fast computers, and superior machine learning (ML) algorithms has spurred interest in artificial intelligence (AI). It is no surprise, then, that we observe an increase in the application of AI in cybersecurity. Our survey of AI applications in cybersecurity shows most of the present applications are in the areas of malware identification and classification, intrusion detection, and cybercrime prevention. We should, however, be aware that AI-enabled cybersecurity is not without its drawbacks. Challenges to AI solutions include a shortage of good quality data to train machine learning models, the potential for exploits via adversarial AI/ML, and limited human expertise in AI. However, the rewards in terms of increased accuracy of cyberattack predictions, faster response to cyberattacks, and improved cybersecurity make it worthwhile to overcome these challenges. We present a summary of the current research on the application of AI and ML to improve cybersecurity, challenges that need to be overcome, and research opportunities for academics in management information systems

    METHODS FOR COMPUTING EFFECTIVE PERTURBATIONS IN ADVERSARIAL MACHINE LEARNING ATTACKS

    Get PDF
    In recent years, the widespread adoption of Machine Learning (ML) at the core of complex information technology systems has driven researchers to investigate the security and reliability of ML techniques. A very specific kind of threats concerns the adversary mechanisms through which an attacker could induce a classification algorithm to provide the desired output. Such strategies, known as Adversarial Machine Learning (AML), have a twofold purpose: to calculate a perturbation to be applied to the classifier's input such that the outcome is subverted, while maintaining the underlying intent of the original data. Although any manipulation that accomplishes these goals is theoretically acceptable, in real scenarios perturbations must correspond to a set of permissible manipulations of the input, which is rarely considered in the literature. In this thesis, two different problems are considered related to the matter of generating effective perturbations in an AML attack.First, an e-health scenario is addressed, in which an automatic system for prescriptions can be deceived by inputs forged to subvert the model's prediction.Patients clinical records are typically based on binary features representing the presence/absence of certain symptoms.In this work it is presented an algorithm capable of generating a precise sequence of moves, that the adversary has to take in order to elude the automatic prescription serviceSecondly, this thesis outlines an AML technique specifically designed to fool the spam account detection system of an Online Social Network (OSN). The proposed black-box evasion attack is formulated as an optimization problem that computes the adversarial sample while maintaining two important properties of the feature space, namely statistical correlation and semantic dependency

    An Evasion Attack against ML-based Phishing URL Detectors

    Full text link
    Background: Over the year, Machine Learning Phishing URL classification (MLPU) systems have gained tremendous popularity to detect phishing URLs proactively. Despite this vogue, the security vulnerabilities of MLPUs remain mostly unknown. Aim: To address this concern, we conduct a study to understand the test time security vulnerabilities of the state-of-the-art MLPU systems, aiming at providing guidelines for the future development of these systems. Method: In this paper, we propose an evasion attack framework against MLPU systems. To achieve this, we first develop an algorithm to generate adversarial phishing URLs. We then reproduce 41 MLPU systems and record their baseline performance. Finally, we simulate an evasion attack to evaluate these MLPU systems against our generated adversarial URLs. Results: In comparison to previous works, our attack is: (i) effective as it evades all the models with an average success rate of 66% and 85% for famous (such as Netflix, Google) and less popular phishing targets (e.g., Wish, JBHIFI, Officeworks) respectively; (ii) realistic as it requires only 23ms to produce a new adversarial URL variant that is available for registration with a median cost of only $11.99/year. We also found that popular online services such as Google SafeBrowsing and VirusTotal are unable to detect these URLs. (iii) We find that Adversarial training (successful defence against evasion attack) does not significantly improve the robustness of these systems as it decreases the success rate of our attack by only 6% on average for all the models. (iv) Further, we identify the security vulnerabilities of the considered MLPU systems. Our findings lead to promising directions for future research. Conclusion: Our study not only illustrate vulnerabilities in MLPU systems but also highlights implications for future study towards assessing and improving these systems.Comment: Draft for ACM TOP

    Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods

    Full text link
    Machine generated text is increasingly difficult to distinguish from human authored text. Powerful open-source models are freely available, and user-friendly tools that democratize access to generative models are proliferating. ChatGPT, which was released shortly after the first preprint of this survey, epitomizes these trends. The great potential of state-of-the-art natural language generation (NLG) systems is tempered by the multitude of avenues for abuse. Detection of machine generated text is a key countermeasure for reducing abuse of NLG models, with significant technical challenges and numerous open problems. We provide a survey that includes both 1) an extensive analysis of threat models posed by contemporary NLG systems, and 2) the most complete review of machine generated text detection methods to date. This survey places machine generated text within its cybersecurity and social context, and provides strong guidance for future work addressing the most critical threat models, and ensuring detection systems themselves demonstrate trustworthiness through fairness, robustness, and accountability.Comment: Manuscript submitted to ACM Special Session on Trustworthy AI. 2022/11/19 - Updated reference

    A Real-Time and Adaptive-Learning Malware Detection Method Based on API-Pair Graph

    Get PDF
    The detection of malware have developed for many years, and the appearance of new machine learning and deep learning techniques have improved the effect of detectors. However, most of current researches have focused on the general features of malware and ignored the development of the malware themselves, so that the features could be useless with the time passed as well as the advance of malware techniques. Besides, the detection methods based on machine learning are mainly static detection and analysis, while the study of real-time detection of malware is relatively rare. In this article, we proposed a new model that could detect malware real-time in principle and learn new features adaptively. Firstly, a new data structure of API-Pair was adopted, and the constructed data was trained with Maximum Entropy model, which could satisfy the goal of weighting and adaptive learning. Then a clustering was practised to filter relatively unrelated and confusing features. Moreover, a detector based on Lont Short Term Memory Network (LSTM) was devised to achieve the goal of real-time detection. Finally, a series of experiments were designed to verify our method. The experimental results showed that our model could obtain the highest accuracy of 99.07% in general tests and keep the accuracies above 97% with the development of malware; the results also proved the feasibility of our model in real-time detection through the simulation experiment, and robustness against a typical adversarial attack
    corecore