1,795 research outputs found

    Performance and Efficiency Exploration of Hardware Polynomial Multipliers for Post-Quantum Lattice-Based Cryptosystems

    Get PDF
    The significant effort in the research and design of large-scale quantum computers has spurred a transition to post-quantum cryptographic primitives worldwide. The post-quantum cryptographic primitive standardization effort led by the US NIST has recently selected the asymmetric encryption primitive Kyber as its candidate for standardization and indicated NTRU, as a valid alternative if intellectual property issues are not solved. Finally, a more conservative alternative to NTRU, NTRUPrime was also considered as an alternate candidate, due to its design choices that remove the possibility for a large set of attacks preemptively. All the aforementioned asymmetric primitives provide good performances, and are prime choices to provide IoT devices with post-quantum confidentiality services. In this work, we present a comprehensive exploration of hardware designs for the computation of polynomial multiplications, the workhorse operation in all the aforementioned cryptosystems, with a thorough analysis of performance, compactness and efficiency. The presented designs cope with the differences in the arithmetics of polynomial rings employed by distinct cryptosystems, benefiting from configurations and optimizations that are applicable at synthesis time and/or run time. In this context, we target a use case scenario where long-term key pairs are used, such as the ones for VPNs (e.g., over IPSec), secure shell protocols and instant messaging applications. Our high-performance design variants exhibit figures of latency comparable to the ones needed for the execution of the symmetric cryptographic primitives also included in the Post-Quantum schemes. Notably, the performance figures of the designs proposed for NTRU and NTRU Prime surpass the ones described in the related literature

    Southern Adventist University Undergraduate Catalog 2023-2024

    Get PDF
    Southern Adventist University\u27s undergraduate catalog for the academic year 2023-2024.https://knowledge.e.southern.edu/undergrad_catalog/1123/thumbnail.jp

    Memory built-in self-repair and correction for improving yield: a review

    Get PDF
    Nanometer memories are highly prone to defects due to dense structure, necessitating memory built-in self-repair as a must-have feature to improve yield. Today’s system-on-chips contain memories occupying an area as high as 90% of the chip area. Shrinking technology uses stricter design rules for memories, making them more prone to manufacturing defects. Further, using 3D-stacked memories makes the system vulnerable to newer defects such as those coming from through-silicon-vias (TSV) and micro bumps. The increased memory size is also resulting in an increase in soft errors during system operation. Multiple memory repair techniques based on redundancy and correction codes have been presented to recover from such defects and prevent system failures. This paper reviews recently published memory repair methodologies, including various built-in self-repair (BISR) architectures, repair analysis algorithms, in-system repair, and soft repair handling using error correcting codes (ECC). It provides a classification of these techniques based on method and usage. Finally, it reviews evaluation methods used to determine the effectiveness of the repair algorithms. The paper aims to present a survey of these methodologies and prepare a platform for developing repair methods for upcoming-generation memories

    Architecture and Circuit Design Optimization for Compute-In-Memory

    Get PDF
    The objective of the proposed research is to optimize computing-in-memory (CIM) design for accelerating Deep Neural Network (DNN) algorithms. As compute peripheries such as analog-to-digital converter (ADC) introduce significant overhead in CIM inference design, the research first focuses on the circuit optimization for inference acceleration and proposes a resistive random access memory (RRAM) based ADC-free in-memory compute scheme. We comprehensively explore the trade-offs involving different types of ADCs and investigate a new ADC design especially suited for the CIM, which performs the analog shift-add for multiple weight significance bits, improving the throughput and energy efficiency under similar area constraints. Furthermore, we prototype an ADC-free CIM inference chip design with a fully-analog data processing manner between sub-arrays, which can significantly improve the hardware performance over the conventional CIM designs and achieve near-software classification accuracy on ImageNet and CIFAR-10/-100 dataset. Secondly, the research focuses on hardware support for CIM on-chip training. To maximize hardware reuse of CIM weight stationary dataflow, we propose the CIM training architectures with the transpose weight mapping strategy. The cell design and periphery circuitry are modified to efficiently support bi-directional compute. A novel solution of signed number multiplication is also proposed to handle the negative input in backpropagation. Finally, we propose an SRAM-based CIM training architecture and comprehensively explore the system-level hardware performance for DNN on-chip training based on silicon measurement results.Ph.D

    Potential of machine learning/Artificial Intelligence (ML/AI) for verifying configurations of 5G multi Radio Access Technology (RAT) base station

    Get PDF
    Abstract. The enhancements in mobile networks from 1G to 5G have greatly increased data transmission reliability and speed. However, concerns with 5G must be addressed. As system performance and reliability improve, ML and AI integration in products and services become more common. The integration teams in cellular network equipment creation test devices from beginning to end to ensure hardware and software parts function correctly. Radio unit integration is typically the first integration phase, where the radio is tested independently without additional network components like the BBU and UE. 5G architecture and the technology that it is using are explained further. The architecture defined by 3GPP for 5G differs from previous generations, using Network Functions (NFs) instead of network entities. This service-based architecture offers NF reusability to reduce costs and modularity, allowing for the best vendor options for customer radio products. 5G introduced the O-RAN concept to decompose the RAN architecture, allowing for increased speed, flexibility, and innovation. NG-RAN provided this solution to speed up the development and implementation process of 5G. The O-RAN concept aims to improve the efficiency of RAN by breaking it down into components, allowing for more agility and customization. The four protocols, the eCPRI interface, and the functionalities of fronthaul that NGRAN follows are expressed further. Additionally, the significance of NR is described with an explanation of its benefits. Some benefits are high data rates, lower latency, improved spectral efficiency, increased network flexibility, and improved energy efficiency. The timeline for 5G development is provided along with different 3GPP releases. Stand-alone and non-stand-alone architecture is integral while developing the 5G architecture; hence, it is also defined with illustrations. The two frequency bands that NR utilizes, FR1 and FR2, are expressed further. FR1 is a sub-6 GHz frequency band. It contains frequencies of low and high values; on the other hand, FR2 contains frequencies above 6GHz, comprising high frequencies. FR2 is commonly known as the mmWave band. Data collection for implementing the ML approaches is expressed that contains the test setup, data collection, data description, and data visualization part of the thesis work. The Test PC runs tests, executes test cases using test libraries, and collects data from various logs to analyze the system’s performance. The logs contain information about the test results, which can be used to identify issues and evaluate the system’s performance. The data collection part describes that the data was initially present in JSON files and extracted from there. The extraction took place using the Python code script and was then fed into an Excel sheet for further analysis. The data description explains the parameters that are taken while training the models. Jupyter notebook has been used for visualizing the data, and the visualization is carried out with the help of graphs. Moreover, the ML techniques used for analyzing the data are described. In total, three methods are used here. All the techniques come under the category of supervised learning. The explained models are random forest, XG Boost, and LSTM. These three models form the basis of ML techniques applied in the thesis. The results and discussion section explains the outcomes of the ML models and discusses how the thesis will be used in the future. The results include the parameters that are considered to apply the ML models to them. SINR, noise power, rxPower, and RSSI are the metrics that are being monitored. These parameters have variance, which is essential in evaluating the quality of the product test setup, the quality of the software being tested, and the state of the test environment. The discussion section of the thesis explains why the following parameters are taken, which ML model is most appropriate for the data being analyzed, and what the next steps are in implementation

    ACiS: smart switches with application-level acceleration

    Full text link
    Network performance has contributed fundamentally to the growth of supercomputing over the past decades. In parallel, High Performance Computing (HPC) peak performance has depended, first, on ever faster/denser CPUs, and then, just on increasing density alone. As operating frequency, and now feature size, have levelled off, two new approaches are becoming central to achieving higher net performance: configurability and integration. Configurability enables hardware to map to the application, as well as vice versa. Integration enables system components that have generally been single function-e.g., a network to transport data—to have additional functionality, e.g., also to operate on that data. More generally, integration enables compute-everywhere: not just in CPU and accelerator, but also in network and, more specifically, the communication switches. In this thesis, we propose four novel methods of enhancing HPC performance through Advanced Computing in the Switch (ACiS). More specifically, we propose various flexible and application-aware accelerators that can be embedded into or attached to existing communication switches to improve the performance and scalability of HPC and Machine Learning (ML) applications. We follow a modular design discipline through introducing composable plugins to successively add ACiS capabilities. In the first work, we propose an inline accelerator to communication switches for user-definable collective operations. MPI collective operations can often be performance killers in HPC applications; we seek to solve this bottleneck by offloading them to reconfigurable hardware within the switch itself. We also introduce a novel mechanism that enables the hardware to support MPI communicators of arbitrary shape and that is scalable to very large systems. In the second work, we propose a look-aside accelerator for communication switches that is capable of processing packets at line-rate. Functions requiring loops and states are addressed in this method. The proposed in-switch accelerator is based on a RISC-V compatible Coarse Grained Reconfigurable Arrays (CGRAs). To facilitate usability, we have developed a framework to compile user-provided C/C++ codes to appropriate back-end instructions for configuring the accelerator. In the third work, we extend ACiS to support fused collectives and the combining of collectives with map operations. We observe that there is an opportunity of fusing communication (collectives) with computation. Since the computation can vary for different applications, ACiS support should be programmable in this method. In the fourth work, we propose that switches with ACiS support can control and manage the execution of applications, i.e., that the switch be an active device with decision-making capabilities. Switches have a central view of the network; they can collect telemetry information and monitor application behavior and then use this information for control, decision-making, and coordination of nodes. We evaluate the feasibility of ACiS through extensive RTL-based simulation as well as deployment in an open-access cloud infrastructure. Using this simulation framework, when considering a Graph Convolutional Network (GCN) application as a case study, a speedup of on average 3.4x across five real-world datasets is achieved on 24 nodes compared to a CPU cluster without ACiS capabilities

    Hardware Architecture of a QAM Receiver for Short-Range Optical Communications

    Full text link
    [EN] Short-reach optical fiber communications systems aim to achieve high throughput, in the order of tens of Gbps. The implementation of these high-speed systems requires parallel processing, which makes low-complexity designs of their subsystems a key to the successful large-scale deployment of this technology. Half-Cycle Nyquist Subcarrier Modulation (HC-SCM) was originally suggested for these systems with the goal of using as much bandwidth as possible and, therefore, achieving high communication rates. Recently, Oversampled Subcarrier Modulation (OVS-SCM) was proposed as an alternative more computational efficient than HC-SCM and also with a better spectral efficiency. This paper proposes a hardware-efficient architecture for an OVS-SCM receiver, which takes into account the inherent parallel processing of these systems. This receiver takes 16 samples in parallel from a 5 GSa/s analog-to-digital converter with a 3.2 GHz 3 dB bandwidth. Design solutions for the frame detection block, the mixer, the resampler, the fractional interpolator, the matched filter and the timing estimator are presented. Our results show that, compared to the HC-SCM receiver, this proposal reduces the computational load of the downconverter stages by 90%. FPGA implementation results are given to demonstrate that our proposal can be implemented in state-of-the-art devices.This work was supported in part by MCIN/AEI/10.13039/501100011033 under Grants RTI2018-101658-B100 and PID2021-126514OB-I00, and in part by the European Union through "ERDF Away of making Europe."Valls Coquillat, J.; Torres Carot, V.; Pérez Pascual, MA.; Almenar Terre, V. (2023). Hardware Architecture of a QAM Receiver for Short-Range Optical Communications. Journal of Lightwave Technology. 41(2):451-461. https://doi.org/10.1109/JLT.2022.321735745146141

    QoS-aware architectures, technologies, and middleware for the cloud continuum

    Get PDF
    The recent trend of moving Cloud Computing capabilities to the Edge of the network is reshaping how applications and their middleware supports are designed, deployed, and operated. This new model envisions a continuum of virtual resources between the traditional cloud and the network edge, which is potentially more suitable to meet the heterogeneous Quality of Service (QoS) requirements of diverse application domains and next-generation applications. Several classes of advanced Internet of Things (IoT) applications, e.g., in the industrial manufacturing domain, are expected to serve a wide range of applications with heterogeneous QoS requirements and call for QoS management systems to guarantee/control performance indicators, even in the presence of real-world factors such as limited bandwidth and concurrent virtual resource utilization. The present dissertation proposes a comprehensive QoS-aware architecture that addresses the challenges of integrating cloud infrastructure with edge nodes in IoT applications. The architecture provides end-to-end QoS support by incorporating several components for managing physical and virtual resources. The proposed architecture features: i) a multilevel middleware for resolving the convergence between Operational Technology (OT) and Information Technology (IT), ii) an end-to-end QoS management approach compliant with the Time-Sensitive Networking (TSN) standard, iii) new approaches for virtualized network environments, such as running TSN-based applications under Ultra-low Latency (ULL) constraints in virtual and 5G environments, and iv) an accelerated and deterministic container overlay network architecture. Additionally, the QoS-aware architecture includes two novel middlewares: i) a middleware that transparently integrates multiple acceleration technologies in heterogeneous Edge contexts and ii) a QoS-aware middleware for Serverless platforms that leverages coordination of various QoS mechanisms and virtualized Function-as-a-Service (FaaS) invocation stack to manage end-to-end QoS metrics. Finally, all architecture components were tested and evaluated by leveraging realistic testbeds, demonstrating the efficacy of the proposed solutions

    Segment Routing based Traffic Engineering

    Get PDF
    In modern networks, the increasing volume of network traffic and the diverse range of services with varying requirements necessitate the implementation of more advanced routing decisions and traffic engineering. This academic study proposes a QoS adaptive mechanism called ”Sepitto”, which utilizes Segment routing protocols, specifically SRv6, to address network-traffic control and congestion avoidance. Sepitto leverages data-plane traffic to convey Linux Qdisc statistics, such as queue size, packet drops, and buffer occupancy, in each Linux-based virtual router. By incorporating this information, edge routers become aware of the current network status, enabling them to make informed decisions regarding traffic paths based on QoS classes. SRv6 is employed to direct traffic along desired paths, avoiding congested links and minimizing queuing delays and overall latency. Moreover, Sepitto offers network administrators an interface to customize decision-making processes based on their policies, assigning costs to network graph edges by associating the provided statistics to a certain cost. To incorporate these costs, the implementation employs the Dijkstra algorithm to determine the path with the lowest cost. Performance analysis of Sepitto reveals minimal overhead compared to traditional routing methods, while effectively mitigating network congestion. The results demonstrate that Sepitto reduces traffic round-trip time during congestion while maintaining differentiated treatment for various QoS classes

    Southern Adventist University Undergraduate Catalog 2022-2023

    Get PDF
    Southern Adventist University\u27s undergraduate catalog for the academic year 2022-2023.https://knowledge.e.southern.edu/undergrad_catalog/1121/thumbnail.jp
    corecore