Search CORE

8 research outputs found

Automated Search for Block Cipher Differentials: A GPU-Accelerated Branch-and-Bound Algorithm

Author: A Biryukov
A Biryukov
A Borisenko
J Guo
K Chen
L Song
M ElSheikh
M Matsui
N Mouha
S Banik
S Sun
X Lai
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 10/08/2020
Field of study

Differential cryptanalysis of block ciphers requires the identification of differential characteristics with high probability. For block ciphers with large block sizes and number of rounds, identifying these characteristics is computationally intensive. The branch-and-bound algorithm was proposed by Matsui to automate this task. Since then, numerous improvements were made to the branch-and-bound algorithm by bounding the number of active s-boxes, incorporating a meet-in-the-middle approach, and adapting it to various block cipher architectures. Although mixed-integer linear programming (MILP) has been widely used to evaluate the differential resistance of block ciphers, MILP is still inefficient for clustering singular differential characteristics to obtain differentials (also known as the differential effect). The branch-and-bound method is still better suited for the task of trail clustering. However, it requires enhancements before being feasible for block ciphers with large block sizes, especially for a large number of rounds. Motivated by the need for a more efficient branch-and-bound algorithm to search for block cipher differentials, we propose a GPU-accelerated branch-and-bound algorithm. The proposed approach substantially increases the performance of the differential cluster search. We were able to derive a branch enumeration and evaluation kernel that is 5.95 times faster than its CPU counterpart. To showcase its practicality, the proposed algorithm is applied on TRIFLE-BC, a 128-bit block cipher. By incorporating a meet-in-the-middle approach with the proposed GPU kernel, we were able to improve the search efficiency (on 20 rounds of TRIFLE-BC) by approximately 58 times as compared to the CPU-based approach. Differentials consisting of up to 50 million individual characteristics can be constructed for 20 rounds of TRIFLE, leading to slight improvements to the overall differential probabilities. Even for larger rounds (43 rounds), the proposed algorithm is still able to construct large clusters of over 500 thousand characteristics. This result depicts the practicality of the proposed algorithm in constructing large differentials even for a 128-bit block cipher, which could be used to improve cryptanalytic findings against other block ciphers in the future

Crossref

Cryptology ePrint Archive

Automated enumeration of block cipher differentials: An optimized branch-and-bound GPU framework

Author: Je Sen Teh
Jiageng Chen
Wei-Zhu Yeoh
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 20/01/2022
Field of study

Block ciphers are prevalent in various security protocols used daily such as TLS, OpenPGP, and SSH. Their primary purpose is the protection of user data, both in transit and at rest. One of the de facto methods to evaluate block cipher security is differential cryptanalysis. Differential cryptanalysis observes the propagation of input patterns (input differences) through the cipher to produce output patterns (output differences). This probabilistic propagation is known as a differential; the identification of which is a measure of a block cipher’s security margins. This paper introduces an optimized GPU-based branch-and-bound framework for differential search. We optimize search efficiency by parallelizing all branch-and-bound operations, completing the entire search on the GPU without communicating with the CPU. The meet-in-the-middle (MITM) approach is also adopted for further performance gains. We analyze the financial and computational costs of the proposed framework using Google Cloud VM to showcase its practicality. When optimized for performance, we can attain up to 90x speedup while saving up to 47% of the running cost as compared to a single CPU core. When optimized for cost, the proposed framework can save up to 83% of financial costs while retaining a speedup of up to 40x. As a proof of concept, the proposed framework was then applied on 128-bit TRIFLE-BC, 64-bit PRESENT, and 64-bit GIFT. Notably, we identified the best differentials for PRESENT (16 rounds) and 64-bit GIFT (13 rounds) to date, with estimated probabilities of

2^{-61.7964}

and

2^{-60.66}

respectively. Although the differential results for TRIFLE-BC were incremental, the proposed framework was able to construct differentials for 43 rounds that consisted of approximately 5.8x more individual trails than previous work, making it one of the most efficient approaches for larger block ciphers

Cryptology ePrint Archive

REAL-TIME DATA MINING FOR PROCESS OPERATIONS USING GRAPHICS PROCESSING UNIT (GPU)-BASED HIGH PERFORMANCE COMPUTING

Author: LAU MAI CHAN
Publication venue
Publication date: 11/08/2014
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Optimisation massivement multi-tâche sur grappes de calcul hétérogènes – Application aux problèmes de permutation

Author: Gmys Jan
Publication venue: HAL CCSD
Publication date: 19/12/2017
Field of study

Branch-and-Bound (B&B) is a frequently used tree-search exploratory method for the exact resolution of combinatorial optimization problems (COPs). However, in practice, only small problem instances can be solved on a sequential computer, as B&B generates often generates a huge amount of subproblems to be evaluated. In order to solve large COPs, we revisit the design and implementation of massively parallel B&B on top of large heterogeneous clusters, integrating multi-core CPUs, many-core processors and GPUs.For the efficient storage and management of subproblems an original data structure (IVM) dedicated to permutation problems is used. Because of the highly irregular and unpredictable shape of the B&B tree, dynamic load balancing between parallel exploration processes is one of the main issues addressed in this thesis. Based on a compact encoding of the search space in the form of intervals, work stealing strategies for multi-core and GPU are proposed, as well as hierarchical approaches for load balancing in distributed memory multi-CPU/multi-GPU systems. Three permutation problems, the Flowshop Scheduling Problem (FSP), the Quadratic Assignment Problem (QAP) and the n-Queens puzzle problem are used as test-cases.The resolution, in 9 hours, of a FSP instance with an estimated sequential execution time of 22 years demonstrates the scalability of the proposed algorithms on a cluster composed of 36 GPUs.L'algorithme Branch-and-Bound (B&B) est une méthode de recherche arborescente fréquemment utilisé pour la résolution exacte de problèmes d'optimisation combinatoire (POC). Néanmoins, seules des petites instances peuvent être effectivement résolues sur une machine séquentielle, le nombre de sous-problèmes à évaluer étant souvent très grand. Visant la resolution de POC de grande taille, nous réexaminons la conception et l'implémentation d'algorithmes B&B massivement parallèles sur de larges plateformes hétérogènes de calcul, intégrant des processeurs multi-coeurs, many-cores et et processeurs graphiques (GPUs). Pour une représentation compacte en mémoire des sous-problèmes une structure de données originale (IVM), dédiée aux problèmes de permutation est utilisée. En raison de la forte irrégularité de l'arbre de recherche, l'équilibrage de charge dynamique entre processus d'exploration parallèles occupe une place centrale dans cette thèse. Basés sur un encodage compact de l'espace de recherche sous forme d'intervalles, des stratégies de vol de tâches sont proposées pour processeurs multi-core et GPU, ainsi une approche hiérarchique pour l'équilibrage de charge dans les systèmes multi-GPU et multi-CPU à mémoire distribuée. Trois problèmes d'optimisation définis sur l'ensemble des permutations, le problème d'ordonnancement Flow-Shop (FSP), d'affectation quadratique (QAP) et le problème des n-dames sont utilisés comme cas d'étude. La resolution en 9 heures d'une instance du FSP dont le temps de résolution séquentiel est estimé à 22 ans demontre la capacité de passage à l'échelle des algorithmes proposés sur une grappe de calcul composé de 36 GPUs

INRIA a CCSD electronic archive server

Quayside Operations Planning Under Uncertainty

Author: Iris Cagatay
Jin Jian Gang
Lee Der-Hong
Publication venue
Publication date: 01/01/2015
Field of study

Online Research Database In Technology

A novel dynamic and social perspective of multiple criteria decision making

Author: Corrente Salvatore
Di Stefano Alessandro
Giacchi Evelina
Greco Salvatore
La Corte Aurelio
Scatá Marialisa
Publication venue
Publication date: 01/01/2015
Field of study

Teeside University's Research Repository

Recommended from our members

A comparison of single runs versus multistart for simulated annealing and threshold accepting

Author: Crews Matthew
Publication venue: 'Oregon State University'
Publication date
Field of study

Simulated Annealing and Threshold Accepting are two stochastic search algorithms that have been successfully used on a variety of complex and difficult problem sets. Due to their stochastic nature they are not guaranteed to produce the same result for each run. This means that these techniques actually produce a distribution of solutions which are based on the input parameters and the problem instance. Most research in the area of Simulated Annealing and Threshold Accepting focuses on the single run performance of these algorithms and does not consider sampling multiple runs and taking the best result, known as multisampling. Previous work that has looked at multisampling did not explore a variety of settings or problem instances which has left gaps in the understanding of the multisampling performance of Simulated Annealing and Threshold Accepting. This work examines the use of single runs and multisampling on four instances of the Traveling Salesman Problem. A systematic exploration is done of the variables which affect the performance of these two heuristics. A pairwise analysis is then performed to identify if there are cases where it is advantageous to employ a multistart method instead of a single run. The conclusion is that in a majority of cases a single run outperforms the multistart method but there are cases in which the multistart method outperforms single runs

ScholarsArchive@OSU

Essentials of Business Analytics

Author: Bhimasankaram PochirajuSridhar Seshadri
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 22/04/2020
Field of study

Open Library