10 research outputs found
TSM: Measuring the Enticement of Honeyfiles with Natural Language Processing
Honeyfile deployment is a useful breach detection method in cyber deception that can also inform defenders about the intent and interests of intruders and malicious insiders. A key property of a honeyfile, enticement, is the extent to which the file can attract an intruder to interact with it. We introduce a novel metric, Topic Semantic Matching (TSM), which uses topic modelling to represent files in the repository and semantic matching in an embedding vector space to compare honeyfile text and topic words robustly. We also present a honeyfile corpus created with different Natural Language Processing (NLP) methods. Experiments show that TSM is effective in inter-corpus comparisons and is a promising tool to measure the enticement of honeyfiles. TSM is the first measure to use NLP techniques to quantify the enticement of honeyfile content that compares the essential topical content of local contexts to honeyfiles and is robust to paraphrasing
HoneyCode: Automating Deceptive Software Repositories with Deep Generative Models
We propose HoneyCode, an architecture for the generation of synthetic software repositories for cyber deception. The synthetic repositories have the characteristics of real software, including language features, file names and extensions, but contain no real intellectual property. The fake software can be used as a honeypot or form part of a deceptive environment. Existing approaches to software repository generation lack scalability due to reliance on hand-crafted structures for specific languages. Our approach is language agnostic and learns the underlying representations of repository structures, filenames and file content through a novel Tree Recurrent Network (TRN) and two recurrent networks (RNN) respectively. Each stage of the sequential generation process utilises features from prior steps, which increases the honey repository’s authenticity and consistency. Experiments show TRN generates tree samples that reduce degree mean maximal distance (MMD) by 90-92% and depth MMD by 75-86% to a held out test data set in comparison to recent deep graph generators and a baseline random tree generator. In addition, our RNN models generate convincing filenames with authentic syntax and realistic file content
Modelling Direct Messaging Networks with Multiple Recipients for Cyber Deception
Cyber deception is emerging as a promising approach to defending networks and
systems against attackers and data thieves. However, despite being relatively
cheap to deploy, the generation of realistic content at scale is very costly,
due to the fact that rich, interactive deceptive technologies are largely
hand-crafted. With recent improvements in Machine Learning, we now have the
opportunity to bring scale and automation to the creation of realistic and
enticing simulated content. In this work, we propose a framework to automate
the generation of email and instant messaging-style group communications at
scale. Such messaging platforms within organisations contain a lot of valuable
information inside private communications and document attachments, making them
an enticing target for an adversary. We address two key aspects of simulating
this type of system: modelling when and with whom participants communicate, and
generating topical, multi-party text to populate simulated conversation
threads. We present the LogNormMix-Net Temporal Point Process as an approach to
the first of these, building upon the intensity-free modeling approach of
Shchur et al. to create a generative model for unicast and multi-cast
communications. We demonstrate the use of fine-tuned, pre-trained language
models to generate convincing multi-party conversation threads. A live email
server is simulated by uniting our LogNormMix-Net TPP (to generate the
communication timestamp, sender and recipients) with the language model, which
generates the contents of the multi-party email threads. We evaluate the
generated content with respect to a number of realism-based properties, that
encourage a model to learn to generate content that will engage the attention
of an adversary to achieve a deception outcome
Three Decades of Deception Techniques in Active Cyber Defense -- Retrospect and Outlook
Deception techniques have been widely seen as a game changer in cyber
defense. In this paper, we review representative techniques in honeypots,
honeytokens, and moving target defense, spanning from the late 1980s to the
year 2021. Techniques from these three domains complement with each other and
may be leveraged to build a holistic deception based defense. However, to the
best of our knowledge, there has not been a work that provides a systematic
retrospect of these three domains all together and investigates their
integrated usage for orchestrated deceptions. Our paper aims to fill this gap.
By utilizing a tailored cyber kill chain model which can reflect the current
threat landscape and a four-layer deception stack, a two-dimensional taxonomy
is developed, based on which the deception techniques are classified. The
taxonomy literally answers which phases of a cyber attack campaign the
techniques can disrupt and which layers of the deception stack they belong to.
Cyber defenders may use the taxonomy as a reference to design an organized and
comprehensive deception plan, or to prioritize deception efforts for a budget
conscious solution. We also discuss two important points for achieving active
and resilient cyber defense, namely deception in depth and deception lifecycle,
where several notable proposals are illustrated. Finally, some outlooks on
future research directions are presented, including dynamic integration of
different deception techniques, quantified deception effects and deception
operation cost, hardware-supported deception techniques, as well as techniques
developed based on better understanding of the human element.Comment: 19 page
DECEPTION BASED TECHNIQUES AGAINST RANSOMWARES: A SYSTEMATIC REVIEW
Ransomware is the most prevalent emerging business risk nowadays. It seriously affects business continuity and operations. According to Deloitte Cyber Security Landscape 2022, up to 4000 ransomware attacks occur daily, while the average number of days an organization takes to identify a breach is 191. Sophisticated cyber-attacks such as ransomware typically must go through multiple consecutive phases (initial foothold, network propagation, and action on objectives) before accomplishing its final objective. This study analyzed decoy-based solutions as an approach (detection, prevention, or mitigation) to overcome ransomware. A systematic literature review was conducted, in which the result has shown that deception-based techniques have given effective and significant performance against ransomware with minimal resources. It is also identified that contrary to general belief, deception techniques mainly involved in passive approaches (i.e., prevention, detection) possess other active capabilities such as ransomware traceback and obstruction (thwarting), file decryption, and decryption key recovery. Based on the literature review, several evaluation methods are also analyzed to measure the effectiveness of these deception-based techniques during the implementation process
Deep Down the Rabbit Hole: On References in Networks of Decoy Elements
Deception technology has proven to be a sound approach against threats to
information systems. Aside from well-established honeypots, decoy elements,
also known as honeytokens, are an excellent method to address various types of
threats. Decoy elements are causing distraction and uncertainty to an attacker
and help detecting malicious activity. Deception is meant to be complementing
firewalls and intrusion detection systems. Particularly insider threats may be
mitigated with deception methods. While current approaches consider the use of
multiple decoy elements as well as context-sensitivity, they do not
sufficiently describe a relationship between individual elements. In this work,
inter-referencing decoy elements are introduced as a plausible extension to
existing deception frameworks, leading attackers along a path of decoy
elements. A theoretical foundation is introduced, as well as a stochastic model
and a reference implementation. It was found that the proposed system is
suitable to enhance current decoy frameworks by adding a further dimension of
inter-connectivity and therefore improve intrusion detection and prevention
Modeling Deception for Cyber Security
In the era of software-intensive, smart and connected systems, the growing power and so-
phistication of cyber attacks poses increasing challenges to software security. The reactive
posture of traditional security mechanisms, such as anti-virus and intrusion detection
systems, has not been sufficient to combat a wide range of advanced persistent threats
that currently jeopardize systems operation. To mitigate these extant threats, more ac-
tive defensive approaches are necessary. Such approaches rely on the concept of actively
hindering and deceiving attackers. Deceptive techniques allow for additional defense by
thwarting attackers’ advances through the manipulation of their perceptions. Manipu-
lation is achieved through the use of deceitful responses, feints, misdirection, and other
falsehoods in a system. Of course, such deception mechanisms may result in side-effects
that must be handled. Current methods for planning deception chiefly portray attempts
to bridge military deception to cyber deception, providing only high-level instructions
that largely ignore deception as part of the software security development life cycle. Con-
sequently, little practical guidance is provided on how to engineering deception-based
techniques for defense. This PhD thesis contributes with a systematic approach to specify
and design cyber deception requirements, tactics, and strategies. This deception approach
consists of (i) a multi-paradigm modeling for representing deception requirements, tac-
tics, and strategies, (ii) a reference architecture to support the integration of deception
strategies into system operation, and (iii) a method to guide engineers in deception mod-
eling. A tool prototype, a case study, and an experimental evaluation show encouraging
results for the application of the approach in practice. Finally, a conceptual coverage map-
ping was developed to assess the expressivity of the deception modeling language created.Na era digital o crescente poder e sofisticação dos ataques cibernéticos apresenta constan-
tes desafios para a segurança do software. A postura reativa dos mecanismos tradicionais
de segurança, como os sistemas antivírus e de detecção de intrusão, não têm sido suficien-
tes para combater a ampla gama de ameaças que comprometem a operação dos sistemas
de software actuais. Para mitigar estas ameaças são necessárias abordagens ativas de
defesa. Tais abordagens baseiam-se na ideia de adicionar mecanismos para enganar os
adversários (do inglês deception). As técnicas de enganação (em português, "ato ou efeito
de enganar, de induzir em erro; artimanha usada para iludir") contribuem para a defesa
frustrando o avanço dos atacantes por manipulação das suas perceções. A manipula-
ção é conseguida através de respostas enganadoras, de "fintas", ou indicações erróneas
e outras falsidades adicionadas intencionalmente num sistema. É claro que esses meca-
nismos de enganação podem resultar em efeitos colaterais que devem ser tratados. Os
métodos atuais usados para enganar um atacante inspiram-se fundamentalmente nas
técnicas da área militar, fornecendo apenas instruções de alto nível que ignoram, em
grande parte, a enganação como parte do ciclo de vida do desenvolvimento de software
seguro. Consequentemente, há poucas referências práticas em como gerar técnicas de
defesa baseadas em enganação. Esta tese de doutoramento contribui com uma aborda-
gem sistemática para especificar e desenhar requisitos, táticas e estratégias de enganação
cibernéticas. Esta abordagem é composta por (i) uma modelação multi-paradigma para re-
presentar requisitos, táticas e estratégias de enganação, (ii) uma arquitetura de referência
para apoiar a integração de estratégias de enganação na operação dum sistema, e (iii) um
método para orientar os engenheiros na modelação de enganação. Uma ferramenta protó-
tipo, um estudo de caso e uma avaliação experimental mostram resultados encorajadores
para a aplicação da abordagem na prática. Finalmente, a expressividade da linguagem
de modelação de enganação é avaliada por um mapeamento de cobertura de conceitos
Recommended from our members
Traffic Analysis Attacks and Defenses in Low Latency Anonymous Communication
The recent public disclosure of mass surveillance of electronic communication, involving powerful government authorities, has drawn the public's attention to issues regarding Internet privacy. For almost a decade now, there have been several research efforts towards designing and deploying open source, trustworthy and reliable systems that ensure users' anonymity and privacy. These systems operate by hiding the true network identity of communicating parties against eavesdropping adversaries. Tor, acronym for The Onion Router, is an example of such a system. Such systems relay the traffic of their users through an overlay of nodes that are called Onion Routers and are operated by volunteers distributed across the globe. Such systems have served well as anti-censorship and anti-surveillance tools. However, recent publications have disclosed that powerful government organizations are seeking means to de-anonymize such systems and have deployed distributed monitoring infrastructure to aid their efforts.
Attacks against anonymous communication systems, like Tor, often involve trac analysis. In such attacks, an adversary, capable of observing network traffic statistics in several different networks, correlates the trac patterns in these networks, and associates otherwise seemingly unrelated network connections. The process can lead an adversary to the source of an anonymous connection. However, due to their design, consisting of globally distributed relays, the users of anonymity networks like Tor, can route their traffic virtually via any network; hiding their tracks and true identities from their communication peers and eavesdropping adversaries. De-anonymization of a random anonymous connection is hard, as the adversary is required to correlate traffic patterns in one network link to those in virtually all other networks. Past research mostly involved reducing the complexity of this process by rst reducing the set of relays or network routers to monitor, and then identifying the actual source of anonymous traffic among network connections that are routed via this reduced set of relays or network routers to monitor. A study of various research efforts in this field reveals that there have been many more efforts to reduce the set of relays or routers to be searched than to explore methods for actually identifying an anonymous user amidst the network connections using these routers and relays. Few have tried to comprehensively study a complete attack, that involves reducing the set of relays and routers to monitor and identifying the source of an anonymous connection. Although it is believed that systems like Tor are trivially vulnerable to traffic analysis, there are various technical challenges and issues that can become obstacles to accurately identifying the source of anonymous connection. It is hard to adjudge the vulnerability of anonymous communication systems without adequately exploring the issues involved in identifying the source of anonymous traffic.
We take steps to ll this gap by exploring two novel active trac analysis attacks, that solely rely on measurements of network statistics. In these attacks, the adversary tries to identify the source of an anonymous connection arriving to a server from an exit node. This generally involves correlating traffic entering and leaving the Tor network, linking otherwise unrelated connections. To increase the accuracy of identifying the victim connection among several connections, the adversary injects a traffic perturbation pattern into a connection arriving to the server from a Tor node, that the adversary wants to de-anonymize. One way to achieve this is by colluding with the server and injecting a traffic perturbation pattern using common traffic shaping tools. Our first attack involves a novel remote bandwidth estimation technique to conrm the identity of Tor relays and network routers along the path connecting a Tor client and a server by observing network bandwidth fluctuations deliberately injected by the server. The second attack involves correlating network statistics, for connections entering and leaving the Tor network, available from existing network infrastructure, such as Cisco's NetFlow, for identifying the source of an anonymous connection. Additionally, we explored a novel technique to defend against the latter attack. Most research towards defending against traffic analysis attacks, involving transmission of dummy traffic, have not been implemented due to fears of potential performance degradation. Our novel technique involves transmission of dummy traffic, consisting of packets with IP headers having small Time-to-Live (TTL) values. Such packets are discarded by the routers before they reach their destination. They distort NetFlow statistics, without degrading the client's performance. Finally, we present a strategy that employs transmission of unique plain-text decoy traffic, that appears sensitive, such as fake user credentials, through Tor nodes to decoy servers under our control. Periodic tallying of client and server logs to determine unsolicited connection attempts at the server is used to identify the eavesdropping nodes. Such malicious Tor node operators, eavesdropping on users' traffic, could be potential traffic analysis attackers
Automating the Generation of Enticing Text Content for High-Interaction Honeyfiles
While advanced defenders have successfully used honeyfiles to detect unauthorized intruders and insider threats for more than 30 years, the complexity associated with adaptively devising enticing content has limited their diffusion. This paper presents four new designs for automating the construction of honeyfile content. The new designs select a document from the target directory as a template and employ word transposition and substitution based on parts of speech tagging and n-grams collected from both the target directory and the surrounding file system. These designs were compared to previous methods using a new theory to quantitatively evaluate honeyfile enticement. The new designs were able to successfully mimic the content from the target directory, whilst minimizing the introduction of material from other sources. The designs may also hold potential to match many of the characteristics of nearby documents, whilst minimizing the replication of copyrighted or classified material from documents they are protectin