Search CORE

4 research outputs found

Optimal Timing in Dynamic and Robust Attacker Engagement During Advanced Persistent Threats

Author: barrett
bellman
durkota
noureddine
pawlick
stokel-walker
von stackelberg
Publication venue
Publication date: 22/01/2019
Field of study

Advanced persistent threats (APTs) are stealthy attacks which make use of social engineering and deception to give adversaries insider access to networked systems. Against APTs, active defense technologies aim to create and exploit information asymmetry for defenders. In this paper, we study a scenario in which a powerful defender uses honeynets for active defense in order to observe an attacker who has penetrated the network. Rather than immediately eject the attacker, the defender may elect to gather information. We introduce an undiscounted, infinite-horizon Markov decision process on a continuous state space in order to model the defender's problem. We find a threshold of information that the defender should gather about the attacker before ejecting him. Then we study the robustness of this policy using a Stackelberg game. Finally, we simulate the policy for a conceptual network. Our results provide a quantitative foundation for studying optimal timing for attacker engagement in network defense.Comment: Submitted to the 2019 Intl. Symp. Modeling and Optimization in Mobile, Ad Hoc, and Wireless Nets. (WiOpt

arXiv.org e-Print Archive

Crossref

Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes

Author: D Chen
E Even-Dar
EM Mudrinich
ES Al-Shaer
G Wagener
H Liang
J Chen
J Garcıa
J Pawlick
J Pawlick
J Pawlick
J Zhuang
K Horák
K Wang
L Huang
L Huang
L Spitzner
M Kearns
ME Taylor
Q Hu
Q Zhu
Q Zhu
Q Zhu
QD La
R Zhang
S Farhang
S Jajodia
T Nakagawa
T Rid
T Zhang
T Zhang
Y Hayel
Y Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/11/2019
Field of study

A honeynet is a promising active cyber defense mechanism. It reveals the fundamental Indicators of Compromise (IoCs) by luring attackers to conduct adversarial behaviors in a controlled and monitored environment. The active interaction at the honeynet brings a high reward but also introduces high implementation costs and risks of adversarial honeynet exploitation. In this work, we apply infinite-horizon Semi-Markov Decision Process (SMDP) to characterize a stochastic transition and sojourn time of attackers in the honeynet and quantify the reward-risk trade-off. In particular, we design adaptive long-term engagement policies shown to be risk-averse, cost-effective, and time-efficient. Numerical results have demonstrated that our adaptive engagement policies can quickly attract attackers to the target honeypot and engage them for a sufficiently long period to obtain worthy threat information. Meanwhile, the penetration probability is kept at a low level. The results show that the expected utility is robust against attackers of a large range of persistence and intelligence. Finally, we apply reinforcement learning to the SMDP to solve the curse of modeling. Under a prudent choice of the learning rate and exploration policy, we achieve a quick and robust convergence of the optimal policy and value.Comment: The presentation can be found at https://youtu.be/GPKT3uJtXqk. arXiv admin note: text overlap with arXiv:1907.0139

arXiv.org e-Print Archive

Crossref

Farsighted Risk Mitigation of Lateral Movement Using Dynamic Cognitive Honeypots

Author: A Casteigts
J Mitola
K Horák
K Kaynar
L Huang
L Huang
L Huang
L Spitzner
MA Noureddine
N Krawetz
P Mell
Q Zhu
S Xu
Y Shi
Z Tian
Publication venue
Publication date: 05/10/2020
Field of study

Lateral movement of advanced persistent threats has posed a severe security challenge. Due to the stealthy and persistent nature of the lateral movement, defenders need to consider time and spatial locations holistically to discover latent attack paths across a large time-scale and achieve long-term security for the target assets. In this work, we propose a time-expanded random network to model the stochastic service links in the user-host enterprise network and the adversarial lateral movement. We design cognitive honeypots at idle production nodes and disguise honey links as service links to detect and deter the adversarial lateral movement. The location of the honeypot changes randomly at different times and increases the honeypots' stealthiness. Since the defender does not know whether, when, and where the initial intrusion and the lateral movement occur, the honeypot policy aims to reduce the target assets' Long-Term Vulnerability (LTV) for proactive and persistent protection. We further characterize three tradeoffs, i.e., the probability of interference, the stealthiness level, and the roaming cost. To counter the curse of multiple attack paths, we propose an iterative algorithm and approximate the LTV with the union bound for computationally efficient deployment of cognitive honeypots. The results of the vulnerability analysis illustrate the bounds, trends, and a residue of LTV when the adversarial lateral movement has infinite duration. Besides honeypot policies, we obtain a critical threshold of compromisability to guide the design and modification of the current system parameters for a higher level of long-term security. We show that the target node can achieve zero vulnerability under infinite stages of lateral movement if the probability of movement deterrence is not less than the threshold

arXiv.org e-Print Archive

Crossref

Strategic Learning for Active, Adaptive, and Autonomous Cyber Defense

The increasing instances of advanced attacks call for a new defense paradigm that is active, autonomous, and adaptive, named as the \texttt{`3A'} defense paradigm. This chapter introduces three defense schemes that actively interact with attackers to increase the attack cost and gather threat information, i.e., defensive deception for detection and counter-deception, feedback-driven Moving Target Defense (MTD), and adaptive honeypot engagement. Due to the cyber deception, external noise, and the absent knowledge of the other players' behaviors and goals, these schemes possess three progressive levels of information restrictions, i.e., from the parameter uncertainty, the payoff uncertainty, to the environmental uncertainty. To estimate the unknown and reduce uncertainty, we adopt three different strategic learning schemes that fit the associated information restrictions. All three learning schemes share the same feedback structure of sensation, estimation, and actions so that the most rewarding policies get reinforced and converge to the optimal ones in autonomous and adaptive fashions. This work aims to shed lights on proactive defense strategies, lay a solid foundation for strategic learning under incomplete information, and quantify the tradeoff between the security and costs.Comment: arXiv admin note: text overlap with arXiv:1906.1218

arXiv.org e-Print Archive

Crossref