149 research outputs found

    BASALISC: Programmable Hardware Accelerator for BGV Fully Homomorphic Encryption

    Get PDF
    Fully Homomorphic Encryption (FHE) allows for secure computation on encrypted data. Unfortunately, huge memory size, computational cost and bandwidth requirements limit its practicality. We present BASALISC, an architecture family of hardware accelerators that aims to substantially accelerate FHE computations in the cloud. BASALISC is the first to implement the BGV scheme with fully-packed bootstrapping – the noise removal capability necessary for arbitrary-depth computation. It supports a customized version of bootstrapping that can be instantiated with hardware multipliers optimized for area and power. BASALISC is a three-abstraction-layer RISC architecture, designed for a 1 GHz ASIC implementation and underway toward 150mm2 die tape-out in a 12nm GF process. BASALISC\u27s four-layer memory hierarchy includes a two-dimensional conflict-free inner memory layer that enables 32 Tb/s radix-256 NTT computations without pipeline stalls. Its conflict-resolution permutation hardware is generalized and re-used to compute BGV automorphisms without throughput penalty. BASALISC also has a custom multiply-accumulate unit to accelerate BGV key switching. The BASALISC toolchain comprises a custom compiler and a joint performance and correctness simulator. To evaluate BASALISC, we study its physical realizability, emulate and formally verify its core functional units, and we study its performance on a set of benchmarks. Simulation results show a speedup of more than 5,000× over HElib – a popular software FHE library

    Blockchain-Coordinated Frameworks for Scalable and Secure Supply Chain Networks

    Full text link
    Supply chains have progressed through time from being limited to a few regional traders to becoming complicated business networks. As a result, supply chain management systems now rely significantly on the digital revolution for the privacy and security of data. Due to key qualities of blockchain, such as transparency, immutability and decentralization, it has recently gained a lot of interest as a way to solve security, privacy and scalability problems in supply chains. However conventional blockchains are not appropriate for supply chain ecosystems because they are computationally costly, have a limited potential to scale and fail to provide trust. Consequently, due to limitations with a lack of trust and coordination, supply chains tend to fail to foster trust among the network’s participants. Assuring data privacy in a supply chain ecosystem is another challenge. If information is being shared with a large number of participants without establishing data privacy, access control risks arise in the network. Protecting data privacy is a concern when sending corporate data, including locations, manufacturing supplies and demand information. The third challenge in supply chain management is scalability, which continues to be a significant barrier to adoption. As the amount of transactions in a supply chain tends to increase along with the number of nodes in a network. So scalability is essential for blockchain adoption in supply chain networks. This thesis seeks to address the challenges of privacy, scalability and trust by providing frameworks for how to effectively combine blockchains with supply chains. This thesis makes four novel contributions. It first develops a blockchain-based framework with Attribute-Based Access Control (ABAC) model to assure data privacy by adopting a distributed framework to enable fine grained, dynamic access control management for supply chain management. To solve the data privacy challenge, AccessChain is developed. This proposed AccessChain model has two types of ledgers in the system: local and global. Local ledgers are used to store business contracts between stakeholders and the ABAC model management, whereas the global ledger is used to record transaction data. AccessChain can enable decentralized, fine-grained and dynamic access control management in SCM when combined with the ABAC model and blockchain technology (BCT). The framework enables a systematic approach that advantages the supply chain, and the experiments yield convincing results. Furthermore, the results of performance monitoring shows that AccessChain’s response time with four local ledgers is acceptable, and therefore it provides significantly greater scalability. Next, a framework for reducing the bullwhip effect (BWE) in SCM is proposed. The framework also focuses on combining data visibility with trust. BWE is first observed in SC and then a blockchain architecture design is used to minimize it. Full sharing of demand data has been shown to help improve the robustness of overall performance in a multiechelon SC environment, especially for BWE mitigation and cumulative cost reduction. It is observed that when it comes to providing access to data, information sharing using a blockchain has some obvious benefits in a supply chain. Furthermore, when data sharing is distributed, parties in the supply chain will have fair access to other parties’ data, even though they are farther downstream. Sharing customer demand is important in a supply chain to enhance decision-making, reduce costs and promote the final end product. This work also explores the ability of BCT as a solution in a distributed ledger approach to create a trust-enhanced environment where trust is established so that stakeholders can share their information effectively. To provide visibility and coordination along with a blockchain consensus process, a new consensus algorithm, namely Reputation-based proof-of cooperation (RPoC), is proposed for blockchain-based SCM, which does not involve validators to solve any mathematical puzzle before storing a new block. The RPoC algorithm is an efficient and scalable consensus algorithm that selects the consensus node dynamically and permits a large number of nodes to participate in the consensus process. The algorithm decreases the workload on individual nodes while increasing consensus performance by allocating the transaction verification process to specific nodes. Through extensive theoretical analyses and experimentation, the suitability of the proposed algorithm is well grounded in terms of scalability and efficiency. The thesis concludes with a blockchain-enabled framework that addresses the issue of preserving privacy and security for an open-bid auction system. This work implements a bid management system in a private BC environment to provide a secure bidding scheme. The novelty of this framework derives from an enhanced approach for integrating BC structures by replacing the original chain structure with a tree structure. Throughout the online world, user privacy is a primary concern, because the electronic environment enables the collection of personal data. Hence a suitable cryptographic protocol for an open-bid auction atop BC is proposed. Here the primary aim is to achieve security and privacy with greater efficiency, which largely depends on the effectiveness of the encryption algorithms used by BC. Essentially this work considers Elliptic Curve Cryptography (ECC) and a dynamic cryptographic accumulator encryption algorithm to enhance security between auctioneer and bidder. The proposed e-bidding scheme and the findings from this study should foster the further growth of BC strategies

    Learning-Based Ubiquitous Sensing For Solving Real-World Problems

    Get PDF
    Recently, as the Internet of Things (IoT) technology has become smaller and cheaper, ubiquitous sensing ability within these devices has become increasingly accessible. Learning methods have also become more complex in the field of computer science ac- cordingly. However, there remains a gap between these learning approaches and many problems in other disciplinary fields. In this dissertation, I investigate four different learning-based studies via ubiquitous sensing for solving real-world problems, such as in IoT security, athletics, and healthcare. First, I designed an online intrusion detection system for IoT devices via power auditing. To realize the real-time system, I created a lightweight power auditing device. With this device, I developed a distributed Convolutional Neural Network (CNN) for online inference. I demonstrated that the distributed system design is secure, lightweight, accurate, real-time, and scalable. Furthermore, I characterized potential Information-stealer attacks via power auditing. To defend against this potential exfiltration attack, a prototype system was built on top of the botnet detection system. In a testbed environment, I defined and deployed an IoT Information-stealer attack. Then, I designed a detection classifier. Altogether, the proposed system is able to identify malicious behavior on endpoint IoT devices via power auditing. Next, I enhanced athletic performance via ubiquitous sensing and machine learning techniques. I first designed a metric called LAX-Score to quantify a collegiate lacrosse team’s athletic performance. To derive this metric, I utilized feature selection and weighted regression. Then, the proposed metric was statistically validated on over 700 games from the last three seasons of NCAA Division I women’s lacrosse. I also exam- ined the biometric sensing dataset obtained from a collegiate team’s athletes over the course of a season. I then identified the practice features that are most correlated with high-performance games. Experimental results indicate that LAX-Score provides insight into athletic performance quality beyond wins and losses. Finally, I studied the data of patients with Parkinson’s Disease. I secured the Inertial Measurement Unit (IMU) sensing data of 30 patients while they conducted pre-defined activities. Using this dataset, I measured tremor events during drawing activities for more convenient tremor screening. Our preliminary analysis demonstrates that IMU sensing data can identify potential tremor events in daily drawing or writing activities. For future work, deep learning-based techniques will be used to extract features of the tremor in real-time. Overall, I designed and applied learning-based methods across different fields to solve real-world problems. The results show that combining learning methods with domain knowledge enables the formation of solutions

    GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption

    Full text link
    Fully Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it. FHE has garnered significant attention over the past decade as it supports secure outsourcing of data processing to remote cloud services. Despite its promise of strong data privacy and security guarantees, FHE introduces a slowdown of up to five orders of magnitude as compared to the same computation using plaintext data. This overhead is presently a major barrier to the commercial adoption of FHE. In this work, we leverage GPUs to accelerate FHE, capitalizing on a well-established GPU ecosystem available in the cloud. We propose GME, which combines three key microarchitectural extensions along with a compile-time optimization to the current AMD CDNA GPU architecture. First, GME integrates a lightweight on-chip compute unit (CU)-side hierarchical interconnect to retain ciphertext in cache across FHE kernels, thus eliminating redundant memory transactions. Second, to tackle compute bottlenecks, GME introduces special MOD-units that provide native custom hardware support for modular reduction operations, one of the most commonly executed sets of operations in FHE. Third, by integrating the MOD-unit with our novel pipelined 6464-bit integer arithmetic cores (WMAC-units), GME further accelerates FHE workloads by 19%19\%. Finally, we propose a Locality-Aware Block Scheduler (LABS) that exploits the temporal locality available in FHE primitive blocks. Incorporating these microarchitectural features and compiler optimizations, we create a synergistic approach achieving average speedups of 796Ă—796\times, 14.2Ă—14.2\times, and 2.3Ă—2.3\times over Intel Xeon CPU, NVIDIA V100 GPU, and Xilinx FPGA implementations, respectively

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    A quantum-resistant advanced metering infrastructure

    Get PDF
    This dissertation focuses on discussing and implementing a Quantum-Resistant Advanced Metering Infrastructure (QR-AMI) that employs quantum-resistant asymmetric and symmetric cryptographic schemes to withstand attacks from both quantum and classical computers. The proposed solution involves the integration of Quantum-Resistant Dedicated Cryptographic Modules (QR-DCMs) within Smart Meters (SMs). These QR-DCMs are designed to embed quantum-resistant cryptographic schemes suitable for AMI applications. In this sense, it investigates quantum-resistant asymmetric cryptographic schemes based on strong cryptographic principles and a lightweight approach for AMIs. In addition, it examines the practical deployment of quantum-resistant schemes in QR-AMIs. Two candidates from the National Institute of Standards and Technology (NIST) post-quantum cryptography (PQC) standardization process, FrodoKEM and CRYSTALS-Kyber, are assessed due to their adherence to strong cryptographic principles and lightweight approach. The feasibility of embedding these schemes within QRDCMs in an AMI context is evaluated through software implementations on low-cost hardware, such as microcontroller and processor, and hardware/software co-design implementations using System-on-a-Chip (SoC) devices with Field-Programmable Gate Array (FPGA) components. Experimental results show that the execution time for FrodoKEM and CRYSTALS-Kyber schemes on SoC FPGA devices is at least one-third faster than software implementations. Furthermore, the achieved execution time and resource usage demonstrate the viability of these schemes for AMI applications. The CRYSTALS-Kyber scheme appears to be a superior choice in all scenarios, except when strong cryptographic primitives are necessitated, at least theoretically. Due to the lack of off-the-shelf SMs supporting quantum-resistant asymmetric cryptographic schemes, a QRDCM embedding quantum-resistant scheme is implemented and evaluated. Regarding hardware selection for QR-DCMs, microcontrollers are preferable in situations requiring reduced processing power, while SoC FPGA devices are better suited for those demanding high processing power. The resource usage and execution time outcomes demonstrate the feasibility of implementing AMI based on QR-DCMs (i.e., QR-AMI) using microcontrollers or SoC FPGA devices.Esta tese de doutorado foca na discussão e implementação de uma Infraestrutura de Medição Avançada com Resistência Quântica (do inglês, Quantum-Resistant Advanced Metering Infrastructure - QR-AMI), que emprega esquemas criptográficos assimétricos e simétricos com resistência quântica para suportar ataques proveniente tanto de computadores quânticos, como clássicos. A solução proposta envolve a integração de um Módulo Criptográfico Dedicado com Resistência Quântica (do inglês, Quantum-Resistant Dedicated Cryptographic Modules - QR-DCMs) com Medidores Inteligentes (do inglês, Smart Meter - SM). Os QR-DCMs são projetados para embarcar esquemas criptográficos com resistência quântica adequados para aplicação em AMI. Nesse sentido, é investigado esquemas criptográficos assimétricos com resistência quântica baseado em fortes princípios criptográficos e abordagem com baixo uso de recursos para AMIs. Além disso, é analisado a implantação prática de um esquema com resistência quântica em QR-AMIs. Dois candidatos do processo de padronização da criptografia pós-quântica (do inglês, post-quantum cryptography - PQC) do Instituto Nacional de Padrões e Tecnologia (do inglês, National Institute of Standards and Technology - NIST), FrodoKEM e CRYSTALS-Kyber, são avaliados devido à adesão a fortes princípios criptográficos e abordagem com baixo uso de recursos. A viabilidade de embarcar esses esquemas em QR-DCMs em um contexto de AMI é avaliado por meio de implementação em software em hardwares de baixo custo, como um microcontrolador e processador, e implementações conjunta hardware/software usando um sistema em um chip (do inglês, System-on-a-Chip - SoC) com Arranjo de Porta Programável em Campo (do inglês, Field-Programmable Gate Array - FPGA). Resultados experimentais mostram que o tempo de execução para os esquemas FrodoKEM e CRYSTALSKyber em dispositivos SoC FPGA é, ao menos, um terço mais rápido que implementações em software. Além disso, os tempos de execuções atingidos e o uso de recursos demonstram a viabilidade desses esquemas para aplicações em AMI. O esquema CRYSTALS-Kyber parece ser uma escolha superior em todos os cenários, exceto quando fortes primitivas criptográficas são necessárias, ao menos teoricamente. Devido à falta de SMs no mercado que suportem esquemas criptográficos assimétricos com resistência quântica, um QR-DCM embarcando esquemas com resistência quântica é implementado e avaliado. Quanto à escolha do hardware para os QR-DCMs, microcontroladores são preferíveis em situações que requerem poder de processamento reduzido, enquanto dispositivos SoC FPGA são mais adequados para quando é demandado maior poder de processamento. O uso de recurso e o resultado do tempo de execução demonstram a viabilidade da implementação de AMI baseada em QR-DCMs, ou seja, uma QR-AMI, usando microcontroladores e dispositivos SoC FPGA

    CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure

    Full text link
    Fully homomorphic encryption (FHE) is in the spotlight as a definitive solution for privacy, but the high computational overhead of FHE poses a challenge to its practical adoption. Although prior studies have attempted to design ASIC accelerators to mitigate the overhead, their designs require excessive amounts of chip resources (e.g., areas) to contain and process massive data for FHE operations. We propose CiFHER, a chiplet-based FHE accelerator with a resizable structure, to tackle the challenge with a cost-effective multi-chip module (MCM) design. First, we devise a flexible architecture of a chiplet core whose configuration can be adjusted to conform to the global organization of chiplets and design constraints. The distinctive feature of our core is a recomposable functional unit providing varying computational throughput for number-theoretic transform (NTT), the most dominant function in FHE. Then, we establish generalized data mapping methodologies to minimize the network overhead when organizing the chips into the MCM package in a tiled manner, which becomes a significant bottleneck due to the technology constraints of MCMs. Also, we analyze the effectiveness of various algorithms, including a novel limb duplication algorithm, on the MCM architecture. A detailed evaluation shows that a CiFHER package composed of 4 to 64 compact chiplets provides performance comparable to state-of-the-art monolithic ASIC FHE accelerators with significantly lower package-wide power consumption while reducing the area of a single core to as small as 4.28mm2^2.Comment: 15 pages, 9 figure

    Ensembles of Pruned Deep Neural Networks for Accurate and Privacy Preservation in IoT Applications

    Get PDF
    The emergence of the AIoT (Artificial Intelligence of Things) represents the powerful convergence of Artificial Intelligence (AI) with the expansive realm of the Internet of Things (IoT). By integrating AI algorithms with the vast network of interconnected IoT devices, we open new doors for intelligent decision-making and edge data analysis, transforming various domains from healthcare and transportation to agriculture and smart cities. However, this integration raises pivotal questions: How can we ensure deep learning models are aptly compressed and quantised to operate seamlessly on devices constrained by computational resources, without compromising accuracy? How can these models be effectively tailored to cope with the challenges of statistical heterogeneity and the uneven distribution of class labels inherent in IoT applications? Furthermore, in an age where data is a currency, how do we uphold the sanctity of privacy for the sensitive data that IoT devices incessantly generate while also ensuring the unhampered deployment of these advanced deep learning models? Addressing these intricate challenges forms the crux of this thesis, with its contributions delineated as follows: Ensyth: A novel approach designed to synthesise pruned ensembles of deep learning models, which not only makes optimal use of limited IoT resources but also ensures a notable boost in predictability. Experimental evidence gathered from CIFAR-10, CIFAR-5, and MNIST-FASHION datasets solidify its merit, especially given its capacity to achieve high predictability. MicroNets: Venturing into the realms of efficiency, this is a multi-phase pruning pipeline that fuses the principles of weight pruning, channel pruning. Its objective is clear: foster efficient deep ensemble learning, specially crafted for IoT devices. Benchmark tests conducted on CIFAR-10 and CIFAR-100 datasets demonstrate its prowess, highlighting a compression ratio of nearly 92%, with these pruned ensembles surpassing the accuracy metrics set by conventional models. FedNets: Recognising the challenges of statistical heterogeneity in federated learning and the ever-growing concerns of data privacy, this innovative federated learning framework is introduced. It facilitates edge devices in their collaborative quest to train ensembles of pruned deep neural networks. More than just training, it ensures data privacy remains uncompromised. Evaluations conducted on the Federated CIFAR-100 dataset offer a testament to its efficacy. In this thesis, substantial contributions have been made to the AIoT application domain. Ensyth, MicroNets, and FedNets collaboratively tackle the challenges of efficiency, accuracy, statistical heterogeneity arising from distributed class labels, and privacy concerns inherent in deploying AI applications on IoT devices. The experimental results underscore the effectiveness of these approaches, paving the way for their practical implementation in real-world scenarios. By offering an integrated solution that satisfies multiple key requirements simultaneously, this research brings us closer to the realisation of effective and privacy-preserved AIoT systems

    Aloha-HE: A Low-Area Hardware Accelerator for Client-Side Operations in Homomorphic Encryption

    Get PDF
    Homomorphic encryption (HE) has gained broad attention in recent years as it allows computations on encrypted data enabling secure cloud computing. Deploying HE presents a notable challenge since it introduces a performance overhead by orders of magnitude. Hence, most works target accelerating server-side operations on hardware platforms, while little attention has been given to client-side operations. In this paper, we present a novel design methodology to implement and accelerate the client-side HE operations on area-constrained hardware. We show how to design an optimized floating-point unit tailored for the encoding of complex values. In addition, we introduce a novel hardware-friendly algorithm for modulo-reduction of floating-point numbers and propose various concepts for achieving efficient resource sharing between modular ring and floating-point arithmetic. Finally, we use this methodology to implement an end-to-end hardware accelerator, Aloha-HE, for the client-side operations of the CKKS scheme. In contrast to existing work, Aloha-HE supports both encoding and encryption and their counterparts within a unified architecture. Aloha-HE achieves a speedup of up to 59x compared to prior hardware solutions

    KaLi: A Crystal for Post-Quantum Security using Kyber and Dilithium

    Get PDF
    Quantum computers pose a threat to the security of communications over the internet. This imminent risk has led to the standardization of cryptographic schemes for protection in a post-quantum scenario. We present a design methodology for future implementations of such algorithms. This is manifested using the NIST selected digital signature scheme CRYSTALS-Dilithium and key encapsulation scheme CRYSTALS-Kyber. A unified architecture, \crystal, is proposed that can perform key generation, encapsulation, decapsulation, signature generation, and signature verification for all the security levels of CRYSTALS-Dilithium, and CRYSTALS-Kyber. A unified yet flexible polynomial arithmetic unit is designed that can processes Kyber operations twice as fast as Dilithium operations. Efficient memory management is proposed to achieve optimal latency. \crystal is explicitly tailored for ASIC platforms using multiple clock domains. On ASIC 28nm/65nm technology, it occupies 0.263/1.107 mm2^2 and achieves a clock frequency of 2GHz/560MHz for the fast clock used for memory unit. On Xilinx Zynq Ultrascale+ZCU102 FPGA, the proposed architecture uses 23,277 LUTs, 9,758 DFFs, 4 DSPs, and 24 BRAMs, at 270 MHz clock frequency. \crystal performs better than the standalone implementations of either of the two schemes. This is the first work to provide a unified design in hardware for both schemes
    • …
    corecore