43 research outputs found

    Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes

    Full text link
    A honeynet is a promising active cyber defense mechanism. It reveals the fundamental Indicators of Compromise (IoCs) by luring attackers to conduct adversarial behaviors in a controlled and monitored environment. The active interaction at the honeynet brings a high reward but also introduces high implementation costs and risks of adversarial honeynet exploitation. In this work, we apply infinite-horizon Semi-Markov Decision Process (SMDP) to characterize a stochastic transition and sojourn time of attackers in the honeynet and quantify the reward-risk trade-off. In particular, we design adaptive long-term engagement policies shown to be risk-averse, cost-effective, and time-efficient. Numerical results have demonstrated that our adaptive engagement policies can quickly attract attackers to the target honeypot and engage them for a sufficiently long period to obtain worthy threat information. Meanwhile, the penetration probability is kept at a low level. The results show that the expected utility is robust against attackers of a large range of persistence and intelligence. Finally, we apply reinforcement learning to the SMDP to solve the curse of modeling. Under a prudent choice of the learning rate and exploration policy, we achieve a quick and robust convergence of the optimal policy and value.Comment: The presentation can be found at https://youtu.be/GPKT3uJtXqk. arXiv admin note: text overlap with arXiv:1907.0139

    Strategic Learning for Active, Adaptive, and Autonomous Cyber Defense

    Full text link
    The increasing instances of advanced attacks call for a new defense paradigm that is active, autonomous, and adaptive, named as the \texttt{`3A'} defense paradigm. This chapter introduces three defense schemes that actively interact with attackers to increase the attack cost and gather threat information, i.e., defensive deception for detection and counter-deception, feedback-driven Moving Target Defense (MTD), and adaptive honeypot engagement. Due to the cyber deception, external noise, and the absent knowledge of the other players' behaviors and goals, these schemes possess three progressive levels of information restrictions, i.e., from the parameter uncertainty, the payoff uncertainty, to the environmental uncertainty. To estimate the unknown and reduce uncertainty, we adopt three different strategic learning schemes that fit the associated information restrictions. All three learning schemes share the same feedback structure of sensation, estimation, and actions so that the most rewarding policies get reinforced and converge to the optimal ones in autonomous and adaptive fashions. This work aims to shed lights on proactive defense strategies, lay a solid foundation for strategic learning under incomplete information, and quantify the tradeoff between the security and costs.Comment: arXiv admin note: text overlap with arXiv:1906.1218

    RL-IoT: Reinforcement Learning to Interact with IoT Devices

    Get PDF
    Our life is getting filled by Internet of Things (IoT) devices. These devices often rely on closed or poorly documented protocols, with unknown formats and semantics. Learning how to interact with such devices in an autonomous manner is the key for interoperability and automatic verification of their capabilities. In this paper, we propose RL-IoT, a system that explores how to automatically interact with possibly unknown IoT devices. We leverage reinforcement learning (RL) to recover the semantics of protocol messages and to take control of the device to reach a given goal, while minimizing the number of interactions. We assume to know only a database of possible IoT protocol messages, whose semantics are however unknown. RL-IoT exchanges messages with the target IoT device, learning those commands that are useful to reach the given goal. Our results show that RL-IoT is able to solve both simple and complex tasks. With properly tuned parameters, RL-IoT learns how to perform actions with the target device, a Yeelight smart bulb in our case study, completing non-trivial patterns with as few as 400 interactions. RL-IoT paves the road for automatic interactions with poorly documented IoT protocols, thus enabling interoperable systems

    To What Extent Are Honeypots and Honeynets Autonomic Computing Systems?

    Full text link
    Cyber threats, such as advanced persistent threats (APTs), ransomware, and zero-day exploits, are rapidly evolving and demand improved security measures. Honeypots and honeynets, as deceptive systems, offer valuable insights into attacker behavior, helping researchers and practitioners develop innovative defense strategies and enhance detection mechanisms. However, their deployment involves significant maintenance and overhead expenses. At the same time, the complexity of modern computing has prompted the rise of autonomic computing, aiming for systems that can operate without human intervention. Recent honeypot and honeynet research claims to incorporate autonomic computing principles, often using terms like adaptive, dynamic, intelligent, and learning. This study investigates such claims by measuring the extent to which autonomic principles principles are expressed in honeypot and honeynet literature. The findings reveal that autonomic computing keywords are present in the literature sample, suggesting an evolution from self-adaptation to autonomic computing implementations. Yet, despite these findings, the analysis also shows low frequencies of self-configuration, self-healing, and self-protection keywords. Interestingly, self-optimization appeared prominently in the literature. While this study presents a foundation for the convergence of autonomic computing and deceptive systems, future research could explore technical implementations in sample articles and test them for autonomic behavior. Additionally, investigations into the design and implementation of individual autonomic computing principles in honeypots and determining the necessary ratio of these principles for a system to exhibit autonomic behavior could provide valuable insights for both researchers and practitioners.Comment: 18 pages, 3 figures, 5 table

    Farsighted Risk Mitigation of Lateral Movement Using Dynamic Cognitive Honeypots

    Full text link
    Lateral movement of advanced persistent threats has posed a severe security challenge. Due to the stealthy and persistent nature of the lateral movement, defenders need to consider time and spatial locations holistically to discover latent attack paths across a large time-scale and achieve long-term security for the target assets. In this work, we propose a time-expanded random network to model the stochastic service links in the user-host enterprise network and the adversarial lateral movement. We design cognitive honeypots at idle production nodes and disguise honey links as service links to detect and deter the adversarial lateral movement. The location of the honeypot changes randomly at different times and increases the honeypots' stealthiness. Since the defender does not know whether, when, and where the initial intrusion and the lateral movement occur, the honeypot policy aims to reduce the target assets' Long-Term Vulnerability (LTV) for proactive and persistent protection. We further characterize three tradeoffs, i.e., the probability of interference, the stealthiness level, and the roaming cost. To counter the curse of multiple attack paths, we propose an iterative algorithm and approximate the LTV with the union bound for computationally efficient deployment of cognitive honeypots. The results of the vulnerability analysis illustrate the bounds, trends, and a residue of LTV when the adversarial lateral movement has infinite duration. Besides honeypot policies, we obtain a critical threshold of compromisability to guide the design and modification of the current system parameters for a higher level of long-term security. We show that the target node can achieve zero vulnerability under infinite stages of lateral movement if the probability of movement deterrence is not less than the threshold

    The Power of MEME: Adversarial Malware Creation with Model-Based Reinforcement Learning

    Full text link
    Due to the proliferation of malware, defenders are increasingly turning to automation and machine learning as part of the malware detection tool-chain. However, machine learning models are susceptible to adversarial attacks, requiring the testing of model and product robustness. Meanwhile, attackers also seek to automate malware generation and evasion of antivirus systems, and defenders try to gain insight into their methods. This work proposes a new algorithm that combines Malware Evasion and Model Extraction (MEME) attacks. MEME uses model-based reinforcement learning to adversarially modify Windows executable binary samples while simultaneously training a surrogate model with a high agreement with the target model to evade. To evaluate this method, we compare it with two state-of-the-art attacks in adversarial malware creation, using three well-known published models and one antivirus product as targets. Results show that MEME outperforms the state-of-the-art methods in terms of evasion capabilities in almost all cases, producing evasive malware with an evasion rate in the range of 32-73%. It also produces surrogate models with a prediction label agreement with the respective target models between 97-99%. The surrogate could be used to fine-tune and improve the evasion rate in the future.Comment: 12 pages, 3 figures, 3 tables. Accepted at ESORICS 202
    corecore