7 research outputs found

    Utilizing the Double-Precision Floating-Point Computing Power of GPUs for RSA Acceleration

    Get PDF
    Asymmetric cryptographic algorithm (e.g., RSA and Elliptic Curve Cryptography) implementations on Graphics Processing Units (GPUs) have been researched for over a decade. The basic idea of most previous contributions is exploiting the highly parallel GPU architecture and porting the integer-based algorithms from general-purpose CPUs to GPUs, to offer high performance. However, the great potential cryptographic computing power of GPUs, especially by the more powerful floating-point instructions, has not been comprehensively investigated in fact. In this paper, we fully exploit the floating-point computing power of GPUs, by various designs, including the floating-point-based Montgomery multiplication/exponentiation algorithm and Chinese Remainder Theorem (CRT) implementation in GPU. And for practical usage of the proposed algorithm, a new method is performed to convert the input/output between octet strings and floating-point numbers, fully utilizing GPUs and further promoting the overall performance by about 5%. The performance of RSA-2048/3072/4096 decryption on NVIDIA GeForce GTX TITAN reaches 42,211/12,151/5,790 operations per second, respectively, which achieves 13 times the performance of the previous fastest floating-point-based implementation (published in Eurocrypt 2009). The RSA-4096 decryption precedes the existing fastest integer-based result by 23%

    HI-Kyber: A novel high-performance implementation scheme of Kyber based on GPU

    Get PDF
    CRYSTALS-Kyber, as the only public key encryption (PKE) algorithm selected by the National Institute of Standards and Technology (NIST) in the third round, is considered one of the most promising post-quantum cryptography (PQC) schemes. Lattice-based cryptography uses complex discrete alogarithm problems on lattices to build secure encryption and decryption systems to resist attacks from quantum computing. Performance is an important bottleneck affecting the promotion of post quantum cryptography. In this paper, we present a High-performance Implementation of Kyber (named HI-Kyber) on the NVIDIA GPUs, which can increase the key-exchange performance of Kyber to the million-level. Firstly, we propose a lattice-based PQC implementation architecture based on kernel fusion, which can avoid redundant global-memory access operations. Secondly, We optimize and implement the core operations of CRYSTALS-Kyber, including Number Theoretic Transform (NTT), inverse NTT (INTT), pointwise multiplication, etc. Especially for the calculation bottleneck NTT operation, three novel methods are proposed to explore extreme performance: the sliced layer merging (SLM), the sliced depth-first search (SDFS-NTT) and the entire depth-first search (EDFS-NTT), which achieve a speedup of 7.5%, 28.5%, and 41.6% compared to the native implementation. Thirdly, we conduct comprehensive performance experiments with different parallel dimensions based on the above optimization. Finally, our key exchange performance reaches 1,664 kops/s. Specifically, based on the same platform, our HI-Kyber is 3.52×\times that of the GPU implementation based on the same instruction set and 1.78×\times that of the state-of-the-art one based on AI-accelerated tensor core

    A Novel High-performance Implementation of CRYSTALS-Kyber with AI Accelerator

    Get PDF
    Public-key cryptography, including conventional cryptosystems and post-quantum cryptography, involves computation-intensive workloads. With noticing the extraordinary computing power of AI accelerators, in this paper, we further explore the feasibility to introduce AI accelerators into high-performance cryptographic computing. Since AI accelerators are dedicated to machine learning or neural networks, the biggest challenge is how to transform cryptographic workloads into their operations, while ensuring the correctness of the results and bringing convincing performance gains. After investigating and analysing the workload of NVIDIA AI accelerator, Tensor Core, we choose to utilize it to accelerate the polynomial multiplication, usually the most time-consuming part in lattice-based cryptography. We take measures to accommodate the matrix-multiply-and-add mode of Tensor Core and make a trade-off between precision and performance, to leverage it as a high-performance NTT box performing NTT/INTT through CUDA C++ WMMA APIs. Meanwhile, we take CRYSTALS-Kyber, the candidate to be standardized by NIST, as a case study on RTX 3080 with the Ampere Tensor Core. The empirical results show that the customized NTT of polynomial vector (n=256,k=4n=256,k=4) with our NTT box obtains a speedup around 6.47x that of the state-of-the-art implementation on the same GPU platform. Compared with the AVX2 implementation submitted to NIST, our Kyber-1024 can achieve a speedup of 26x, 36x, and 35x for each phase

    Leukadherin-1-Mediated Activation of CD11b Inhibits LPS-Induced Pro-inflammatory Response in Macrophages and Protects Mice Against Endotoxic Shock by Blocking LPS-TLR4 Interaction

    Get PDF
    Dysregulation of macrophage has been demonstrated to contribute to aberrant immune responses and inflammatory diseases. CD11b, expressed on macrophages, plays a critical role in regulating pathogen recognition, phagocytosis, and cell survival. In the present study, we explored the effect of leukadherin-1 (LA1), an agonist of CD11b, on regulating LPS-induced pro-inflammatory response in macrophages and endotoxic shock. Intriguingly, we found that LA1 could significantly reduce mortalities of mice and alleviated pathological injury of liver and lung in endotoxic shock. In vivo studies showed that LA1-induced activation of CD11b significantly inhibited the LPS-induced pro-inflammatory response in macrophages of mice. Moreover, LA1-induced activation of CD11b significantly inhibited LPS/IFN-γ-induced pro-inflammatory response in macrophages by inhibiting MAPKs and NF-κB signaling pathways in vitro. Furthermore, the mice injected with LA1-treated BMDMs showed fewer pathological lesions than those injected with vehicle-treated BMDMs in endotoxic shock. In addition, we found that activation of TLR4 by LPS could endocytose CD11b and activation of CD11b by LA1 could endocytose TLR4 in vitro and in vivo, subsequently blocking the binding of LPS with TLR4. Based on these findings, we concluded that LA1-induced activation of CD11b negatively regulates LPS-induced pro-inflammatory response in macrophages and subsequently protects mice from endotoxin shock by partially blocking LPS-TLR4 interaction. Our study provides a new insight into the role of CD11b in the pathogenesis of inflammatory diseases

    Automated Windows domain penetration method based on reinforcement learning

    No full text
    Windows domain provides a unified system service for resource sharing and information interaction among users.However, this also introduces significant security risks while facilitating intranet management.In recent years, intranet attacks targeting domain controllers have become increasingly prevalent, necessitating automated penetration testing to detect vulnerabilities and ensure the ongoing maintenance of office network operations.Then efficient identification of attack paths within the domain environment is crucial.The penetration process was first modeled using reinforcement learning, and attack paths were then discovered and verified through the interaction of the model with the domain environment.Furthermore, unnecessary states in the reinforcement learning model were trimmed based on the contribution differences of hosts to the penetration process, aiming to optimize the path selection strategy and improve the actual attack efficiency.The Q-learning algorithms with solution space refinement and exploration policy optimization were utilized to filter the optimal attack path.By employing this method, all security threats in the domain can be automatically verified, providing a valuable protection basis for domain administrators.Experiments were conducted on typical Windows domain scenarios, and the results show that the optimal path is selected from the thirteen efficient paths generated by the proposed method, while also providing better performance optimization in terms of domain controller intrusion, domain host intrusion, attack steps, convergence, and time cost compared to other approaches
    corecore