4 research outputs found
Optimal Timing in Dynamic and Robust Attacker Engagement During Advanced Persistent Threats
Advanced persistent threats (APTs) are stealthy attacks which make use of
social engineering and deception to give adversaries insider access to
networked systems. Against APTs, active defense technologies aim to create and
exploit information asymmetry for defenders. In this paper, we study a scenario
in which a powerful defender uses honeynets for active defense in order to
observe an attacker who has penetrated the network. Rather than immediately
eject the attacker, the defender may elect to gather information. We introduce
an undiscounted, infinite-horizon Markov decision process on a continuous state
space in order to model the defender's problem. We find a threshold of
information that the defender should gather about the attacker before ejecting
him. Then we study the robustness of this policy using a Stackelberg game.
Finally, we simulate the policy for a conceptual network. Our results provide a
quantitative foundation for studying optimal timing for attacker engagement in
network defense.Comment: Submitted to the 2019 Intl. Symp. Modeling and Optimization in
Mobile, Ad Hoc, and Wireless Nets. (WiOpt
Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes
A honeynet is a promising active cyber defense mechanism. It reveals the
fundamental Indicators of Compromise (IoCs) by luring attackers to conduct
adversarial behaviors in a controlled and monitored environment. The active
interaction at the honeynet brings a high reward but also introduces high
implementation costs and risks of adversarial honeynet exploitation. In this
work, we apply infinite-horizon Semi-Markov Decision Process (SMDP) to
characterize a stochastic transition and sojourn time of attackers in the
honeynet and quantify the reward-risk trade-off. In particular, we design
adaptive long-term engagement policies shown to be risk-averse, cost-effective,
and time-efficient. Numerical results have demonstrated that our adaptive
engagement policies can quickly attract attackers to the target honeypot and
engage them for a sufficiently long period to obtain worthy threat information.
Meanwhile, the penetration probability is kept at a low level. The results show
that the expected utility is robust against attackers of a large range of
persistence and intelligence. Finally, we apply reinforcement learning to the
SMDP to solve the curse of modeling. Under a prudent choice of the learning
rate and exploration policy, we achieve a quick and robust convergence of the
optimal policy and value.Comment: The presentation can be found at https://youtu.be/GPKT3uJtXqk. arXiv
admin note: text overlap with arXiv:1907.0139
Farsighted Risk Mitigation of Lateral Movement Using Dynamic Cognitive Honeypots
Lateral movement of advanced persistent threats has posed a severe security
challenge. Due to the stealthy and persistent nature of the lateral movement,
defenders need to consider time and spatial locations holistically to discover
latent attack paths across a large time-scale and achieve long-term security
for the target assets. In this work, we propose a time-expanded random network
to model the stochastic service links in the user-host enterprise network and
the adversarial lateral movement. We design cognitive honeypots at idle
production nodes and disguise honey links as service links to detect and deter
the adversarial lateral movement. The location of the honeypot changes randomly
at different times and increases the honeypots' stealthiness. Since the
defender does not know whether, when, and where the initial intrusion and the
lateral movement occur, the honeypot policy aims to reduce the target assets'
Long-Term Vulnerability (LTV) for proactive and persistent protection. We
further characterize three tradeoffs, i.e., the probability of interference,
the stealthiness level, and the roaming cost. To counter the curse of multiple
attack paths, we propose an iterative algorithm and approximate the LTV with
the union bound for computationally efficient deployment of cognitive
honeypots. The results of the vulnerability analysis illustrate the bounds,
trends, and a residue of LTV when the adversarial lateral movement has infinite
duration. Besides honeypot policies, we obtain a critical threshold of
compromisability to guide the design and modification of the current system
parameters for a higher level of long-term security. We show that the target
node can achieve zero vulnerability under infinite stages of lateral movement
if the probability of movement deterrence is not less than the threshold
Strategic Learning for Active, Adaptive, and Autonomous Cyber Defense
The increasing instances of advanced attacks call for a new defense paradigm
that is active, autonomous, and adaptive, named as the \texttt{`3A'} defense
paradigm. This chapter introduces three defense schemes that actively interact
with attackers to increase the attack cost and gather threat information, i.e.,
defensive deception for detection and counter-deception, feedback-driven Moving
Target Defense (MTD), and adaptive honeypot engagement. Due to the cyber
deception, external noise, and the absent knowledge of the other players'
behaviors and goals, these schemes possess three progressive levels of
information restrictions, i.e., from the parameter uncertainty, the payoff
uncertainty, to the environmental uncertainty. To estimate the unknown and
reduce uncertainty, we adopt three different strategic learning schemes that
fit the associated information restrictions. All three learning schemes share
the same feedback structure of sensation, estimation, and actions so that the
most rewarding policies get reinforced and converge to the optimal ones in
autonomous and adaptive fashions. This work aims to shed lights on proactive
defense strategies, lay a solid foundation for strategic learning under
incomplete information, and quantify the tradeoff between the security and
costs.Comment: arXiv admin note: text overlap with arXiv:1906.1218