2 research outputs found

    Design requirements for generating deceptive content to protect document repositories

    Get PDF
    For nearly 30 years, fake digital documents have been used to identify external intruders and malicious insider threats. Unfortunately, while fake files hold potential to assist in data theft detection, there is little evidence of their application outside of niche organisations and academic institutions. The barrier to wider adoption appears to be the difficulty in constructing deceptive content. The current generation of solutions principally: (1) use unrealistic random data; (2) output heavily formatted or specialised content, that is difficult to apply to other environments; (3) require users to manually build the content, which is not scalable, or (4) employ an existing production file, which creates a protection paradox. This paper introduces a set of requirements for generating automated fake file content: (1) enticing, (2) realistic, (3) minimise disruption, (4) adaptive, (5) scalable protective coverage, (6) minimise sensitive artefacts and copyright infringement, and (7) contain no distinguishable characteristics. These requirements have been drawn from literature on natural science, magical performances, human deceit, military operations, intrusion detection and previous fake file solutions. These requirements guide the design of an automated fake file content construction system, providing an opportunity for the next generation of solutions to find greater commercial application and widespread adoption

    Towards a set of metrics to guide the generation of fake computer file systems

    Get PDF
    Fake file systems are used in the field of cyber deception to bait intruders and fool forensic investigators. File system researchers also frequently generate their own synthetic document repositories, due to data privacy and copyright concerns associated with experimenting on real-world corpora. For both these fields, realism is critical. Unfortunately, after creating a set of files and folders, there are no current testing standards that can be applied to validate their authenticity, or conversely, reliably automate their detection. This paper reviews the previous 30 years of file system surveys on real world corpora, to identify a set of discrete measures for generating synthetic file systems. Statistical distributions, such as size, age and lifetime of files, common file types, compression and duplication ratios, directory distribution and depth (and its relationship with numbers of files and sub-directories) were identified and the respective merits discussed. Additionally, this paper highlights notable absences in these surveys, which could be beneficial, such as analysing, on mass, the text content distribution, file naming habits, and comparing file access times against traditional working hours
    corecore