260 research outputs found

    Cryptanalysis of Classic Ciphers Using Hidden Markov Models

    Get PDF
    Cryptanalysis is the study of identifying weaknesses in the implementation of cryptographic algorithms. This process would improve the complexity of such algo- rithms, making the system secure. In this research, we apply Hidden Markov Models (HMMs) to classic cryptanaly- sis problems. We show that with sufficient ciphertext, an HMM can be used to break a simple substitution cipher. We also show that when limited ciphertext is avail- able, using multiple random restarts for the HMM increases our chance of successful decryption

    Generative Adversarial Networks for Classic Cryptanalysis

    Get PDF
    The necessity of protecting critical information has been understood for millennia. Although classic ciphers have inherent weaknesses in comparison to modern ciphers, many classic ciphers are extremely challenging to break in practice. Machine learning techniques, such as hidden Markov models (HMM), have recently been applied with success to various classic cryptanalysis problems. In this research, we consider the effectiveness of the deep learning technique CipherGAN---which is based on the well- established generative adversarial network (GAN) architecture---for classic cipher cryptanalysis. We experiment extensively with CipherGAN on a number of classic ciphers, and we compare our results to those obtained using HMMs

    Classifying Classic Ciphers using Machine Learning

    Get PDF
    We consider the problem of identifying the classic cipher that was used to generate a given ciphertext message. We assume that the plaintext is English and we restrict our attention to ciphertext consisting only of alphabetic characters. Among the classic ciphers considered are the simple substitution, Vigenère cipher, playfair cipher, and column transposition cipher. The problem of classification is approached in two ways. The first method uses support vector machines (SVM) trained directly on ciphertext to classify the ciphers. In the second approach, we train hidden Markov models (HMM) on each ciphertext message, then use these trained HMMs as features for classifiers. Under this second approach, we compare two classification strategies, namely, convolutional neural networks (CNN) and SVMs. For the CNN classifier, we convert the trained HMMs into images. Extensive experimental results are provided for each of these classification techniques

    Cryptanalysis of Homophonic Substitution Cipher Using Hidden Markov Models

    Get PDF
    We investigate the effectiveness of a Hidden Markov Model (HMM) with random restarts as a mean of breaking a homophonic substitution cipher. Based on extensive experiments, we find that such an HMM-based attack outperforms a previously de- veloped nested hill climb approach, particularly when the ciphertext message is short. We then consider a combination cipher, consisting of a homophonic substitution and a column transposition. We develop and analyze an attack on such a cipher. This attack employs an HMM (with random restarts), together with a hill climb to recover the column permutation. We show that this attack can succeed on relatively short ci- phertext messages. Finally, we test this combined attack on the unsolved Zodiac 340 cipher

    Cryptanalysis of the Purple Cipher using Random Restarts

    Get PDF
    Cryptanalysis is the process of trying to analyze ciphers, cipher text, and crypto systems, which may exploit any loopholes or weaknesses in the systems, leading us to an understanding of the key used to encrypt the data. This project uses Expectation Maximization (EM) approach using numerous restarts to attack decipherment problems such as the Purple Cipher. In this research, we perform cryptanalysis of the Purple cipher using genetic algorithms and hidden Markov models (HMM). If the Purple cipher has a fixed plugboard, we show that genetic algorithms are successful in retrieving the plaintext from cipher text with high accuracy. On the other hand, if the cipher has a plugboard that is not fixed, we can decrypt the cipher text with increasing accuracy given an increase in population size and restarts. We performed the cryptanalysis of PseudoPurple, which is less complex but more powerful than Purple using HMMs. Though we could not decrypt cipher text produced by PseudoPurple with good accuracy, there is an increase in accuracy of the decrypted plaintext with an increase in the number of restarts

    Hidden Markov Models with Random Restarts vs Boosting for Malware Detection

    Full text link
    Effective and efficient malware detection is at the forefront of research into building secure digital systems. As with many other fields, malware detection research has seen a dramatic increase in the application of machine learning algorithms. One machine learning technique that has been used widely in the field of pattern matching in general-and malware detection in particular-is hidden Markov models (HMMs). HMM training is based on a hill climb, and hence we can often improve a model by training multiple times with different initial values. In this research, we compare boosted HMMs (using AdaBoost) to HMMs trained with multiple random restarts, in the context of malware detection. These techniques are applied to a variety of challenging malware datasets. We find that random restarts perform surprisingly well in comparison to boosting. Only in the most difficult "cold start" cases (where training data is severely limited) does boosting appear to offer sufficient improvement to justify its higher computational cost in the scoring phase

    A Spectral Algorithm for Latent Dirichlet Allocation

    Get PDF
    Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of topic models, including Latent Dirichlet Allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, called Excess Correlation Analysis, is based on a spectral decomposition of low-order moments via two singular value decompositions (SVDs). Moreover, the algorithm is scalable, since the SVDs are carried out only on k × k matrices, where k is the number of latent factors (topics) and is typically much smaller than the dimension of the observation (word) space

    Vigenère Score for Malware Detection

    Get PDF
    Previous research has applied classic cryptanalytic techniques to the malware detection problem. Speci cally, scores based on simple substitution cipher cryptanal- ysis and various generalizations have been considered. In this research, we analyze two new malware scoring techniques based on classic cryptanalysis. Our rst ap- proach relies on the Index of Coincidence, which is used, for example, to determine the length of the keyword in a Vigenère ciphertext. We also consider a score based on a more complete cryptanalysis of a Vigenère cipher. We nd that the Vigenère score is competitive with previous statistical-based malware scores
    • …
    corecore