16 research outputs found

    Decoding Substitution Ciphers by Means of Word Matching with Application to OCR

    Get PDF
    A substitution cipher consists of a block of natural language text where each letter of the alphabet has been replaced by a distinct symbol. As a problem in cryptography, the substitution cipher is of limited interest, but it has an important application in optical character recognition. Recent advances render it quite feasible to scan documents with a fairly complex layout and to classify (cluster) the printed characters into distinct groups according to their shape. However, given the immense variety of type styles and forms in current use, it is not possible to assign alphabetical identities to characters of arbitrary size and typeface. This gap can be bridged by solving the equivalent of a substitution cipher problem, thereby opening up the possibility of automatic translation of a scanned document into a standard character code, such as ASCII. Earlier methods relying on letter n-gram frequencies require a substantial amount of ciphertext for accurate n-gram estimates. A dictionary-based approach solves the problem using relatively small ciphertext samples and a dictionary of fewer than 500 words. Our heuristic backtrack algorithm typically visits only a few hundred among the 26! possible nodes on sample texts ranging from 100 to 600 words

    Dynamic block encryption with self-authenticating key exchange

    Get PDF
    One of the greatest challenges facing cryptographers is the mechanism used for key exchange. When secret data is transmitted, the chances are that there may be an attacker who will try to intercept and decrypt the message. Having done so, he/she might just gain advantage over the information obtained, or attempt to tamper with the message, and thus, misguiding the recipient. Both cases are equally fatal and may cause great harm as a consequence. In cryptography, there are two commonly used methods of exchanging secret keys between parties. In the first method, symmetric cryptography, the key is sent in advance, over some secure channel, which only the intended recipient can read. The second method of key sharing is by using a public key exchange method, where each party has a private and public key, a public key is shared and a private key is kept locally. In both cases, keys are exchanged between two parties. In this thesis, we propose a method whereby the risk of exchanging keys is minimised. The key is embedded in the encrypted text using a process that we call `chirp coding', and recovered by the recipient using a process that is based on correlation. The `chirp coding parameters' are exchanged between users by employing a USB flash memory retained by each user. If the keys are compromised they are still not usable because an attacker can only have access to part of the key. Alternatively, the software can be configured to operate in a one time parameter mode, in this mode, the parameters are agreed upon in advance. There is no parameter exchange during file transmission, except, of course, the key embedded in ciphertext. The thesis also introduces a method of encryption which utilises dynamic blocks, where the block size is different for each block. Prime numbers are used to drive two random number generators: a Linear Congruential Generator (LCG) which takes in the seed and initialises the system and a Blum-Blum Shum (BBS) generator which is used to generate random streams to encrypt messages, images or video clips for example. In each case, the key created is text dependent and therefore will change as each message is sent. The scheme presented in this research is composed of five basic modules. The first module is the key generation module, where the key to be generated is message dependent. The second module, encryption module, performs data encryption. The third module, key exchange module, embeds the key into the encrypted text. Once this is done, the message is transmitted and the recipient uses the key extraction module to retrieve the key and finally the decryption module is executed to decrypt the message and authenticate it. In addition, the message may be compressed before encryption and decompressed by the recipient after decryption using standard compression tools

    PROACTIVE BIOMETRIC-ENABLED FORENSIC IMPRINTING SYSTEM

    Get PDF
    Insider threats are a significant security issue. The last decade has witnessed countless instances of data loss and exposure in which leaked data have become publicly available and easily accessible. Losing or disclosing sensitive data or confidential information may cause substantial financial and reputational damage to a company. Therefore, preventing or responding to such incidents has become a challenging task. Whilst more recent research has focused explicitly on the problem of insider misuse, it has tended to concentrate on the information itself—either through its protection or approaches to detecting leakage. Although digital forensics has become a de facto standard in the investigation of criminal activities, a fundamental problem is not being able to associate a specific person with particular electronic evidence, especially when stolen credentials and the Trojan defence are two commonly cited arguments. Thus, it is apparent that there is an urgent requirement to develop a more innovative and robust technique that can more inextricably link the use of information (e.g., images and documents) to the users who access and use them. Therefore, this research project investigates the role that transparent and multimodal biometrics could play in providing this link by leveraging individuals’ biometric information for the attribution of insider misuse identification. This thesis examines the existing literature in the domain of data loss prevention, detection, and proactive digital forensics, which includes traceability techniques. The aim is to develop the current state of the art, having identified a gap in the literature, which this research has attempted to investigate and provide a possible solution. Although most of the existing methods and tools used by investigators to conduct examinations of digital crime help significantly in collecting, analysing and presenting digital evidence, essential to this process is that investigators establish a link between the notable/stolen digital object and the identity of the individual who used it; as opposed to merely using an electronic record or a log that indicates that the user interacted with the object in question (evidence). Therefore, the proposed approach in this study seeks to provide a novel technique that enables capturing individual’s biometric identifiers/signals (e.g. face or keystroke dynamics) and embedding them into the digital objects users are interacting with. This is achieved by developing two modes—a centralised or decentralised manner. The centralised approach stores the mapped information alongside digital object identifiers in a centralised storage repository; the decentralised approach seeks to overcome the need for centralised storage by embedding all the necessary information within the digital object itself. Moreover, no explicit biometric information is stored, as only the correlation that points to those locations within the imprinted object is preserved. Comprehensive experiments conducted to assess the proposed approach show that it is highly possible to establish this correlation even when the original version of the examined object has undergone significant modification. In many scenarios, such as changing or removing part of an image or document, including words and sentences, it was possible to extract and reconstruct the correlated biometric information from a modified object with a high success rate. A reconstruction of the feature vector from unmodified images was possible using the generated imprints with 100% accuracy. This was achieved easily by reversing the imprinting processes. Under a modification attack, in which the imprinted object is manipulated, at least one imprinted feature vector was successfully retrieved from an average of 97 out of 100 images, even when the modification percentage was as high as 80%. For the decentralised approach, the initial experimental results showed that it was possible to retrieve the embedded biometric signals successfully, even when the file (i.e., image) had had 75% of its original status modified. The research has proposed and validated a number of approaches to the embedding of biometric data within digital objects to enable successful user attribution of information leakage attacks.Embassy of Saudi Arabia in Londo

    Entropy in Image Analysis II

    Get PDF
    Image analysis is a fundamental task for any application where extracting information from images is required. The analysis requires highly sophisticated numerical and analytical methods, particularly for those applications in medicine, security, and other fields where the results of the processing consist of data of vital importance. This fact is evident from all the articles composing the Special Issue "Entropy in Image Analysis II", in which the authors used widely tested methods to verify their results. In the process of reading the present volume, the reader will appreciate the richness of their methods and applications, in particular for medical imaging and image security, and a remarkable cross-fertilization among the proposed research areas

    The Cryptographic Imagination

    Get PDF
    Originally published in 1996. In The Cryptographic Imagination, Shawn Rosenheim uses the writings of Edgar Allan Poe to pose a set of questions pertaining to literary genre, cultural modernity, and technology. Rosenheim argues that Poe's cryptographic writing—his essays on cryptography and the short stories that grew out of them—requires that we rethink the relation of poststructural criticism to Poe's texts and, more generally, reconsider the relation of literature to communication. Cryptography serves not only as a template for the language, character, and themes of much of Poe's late fiction (including his creation, the detective story) but also as a "secret history" of literary modernity itself. "Both postwar fiction and literary criticism," the author writes, "are deeply indebted to the rise of cryptography in World War II." Still more surprising, in Rosenheim's view, Poe is not merely a source for such literary instances of cryptography as the codes in Conan Doyle's "The Dancing-Men" or in Jules Verne, but, through his effect on real cryptographers, Poe's writing influenced the outcome of World War II and the development of the Cold War. However unlikely such ideas sound, The Cryptographic Imagination offers compelling evidence that Poe's cryptographic writing clarifies one important avenue by which the twentieth century called itself into being. "The strength of Rosenheim's work extends to a revisionistic understanding of the entirety of literary history (as a repression of cryptography) and then, in a breathtaking shift of register, interlinks Poe's exercises in cryptography with the hyperreality of the CIA, the Cold War, and the Internet. What enables this extensive range of applications is the stipulated tension Rosenheim discerns in the relationship between the forms of the literary imagination and the condition of its mode of production. Cryptography, in this account, names the technology of literary production—the diacritical relationship between decoding and encoding—that the literary imagination dissimulates as hieroglyphics—the hermeneutic relationship between a sign and its content."—Donald E. Pease, Dartmouth Colleg

    Dictionary of privacy, data protection and information security

    Get PDF
    The Dictionary of Privacy, Data Protection and Information Security explains the complex technical terms, legal concepts, privacy management techniques, conceptual matters and vocabulary that inform public debate about privacy. The revolutionary and pervasive influence of digital technology affects numerous disciplines and sectors of society, and concerns about its potential threats to privacy are growing. With over a thousand terms meticulously set out, described and cross-referenced, this Dictionary enables productive discussion by covering the full range of fields accessibly and comprehensively. In the ever-evolving debate surrounding privacy, this Dictionary takes a longer view, transcending the details of today''s problems, technology, and the law to examine the wider principles that underlie privacy discourse. Interdisciplinary in scope, this Dictionary is invaluable to students, scholars and researchers in law, technology and computing, cybersecurity, sociology, public policy and administration, and regulation. It is also a vital reference for diverse practitioners including data scientists, lawyers, policymakers and regulators

    From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

    Full text link
    Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance understanding of the gap through the lens of a qualitative study on the generalizability, trustworthiness, and causal reasoning capabilities of recent proprietary and open-source MLLMs across four modalities: ie, text, code, image, and video, ultimately aiming to improve the transparency of MLLMs. We believe these properties are several representative factors that define the reliability of MLLMs, in supporting various downstream applications. To be specific, we evaluate the closed-source GPT-4 and Gemini and 6 open-source LLMs and MLLMs. Overall we evaluate 230 manually designed cases, where the qualitative results are then summarized into 12 scores (ie, 4 modalities times 3 properties). In total, we uncover 14 empirical findings that are useful to understand the capabilities and limitations of both proprietary and open-source MLLMs, towards more reliable downstream multi-modal applications
    corecore