47 research outputs found

    An Overview on Application of Machine Learning Techniques in Optical Networks

    Get PDF
    Today's telecommunication networks have become sources of enormous amounts of widely heterogeneous data. This information can be retrieved from network traffic traces, network alarms, signal quality indicators, users' behavioral data, etc. Advanced mathematical tools are required to extract meaningful information from these data and take decisions pertaining to the proper functioning of the networks from the network-generated data. Among these mathematical tools, Machine Learning (ML) is regarded as one of the most promising methodological approaches to perform network-data analysis and enable automated network self-configuration and fault management. The adoption of ML techniques in the field of optical communication networks is motivated by the unprecedented growth of network complexity faced by optical networks in the last few years. Such complexity increase is due to the introduction of a huge number of adjustable and interdependent system parameters (e.g., routing configurations, modulation format, symbol rate, coding schemes, etc.) that are enabled by the usage of coherent transmission/reception technologies, advanced digital signal processing and compensation of nonlinear effects in optical fiber propagation. In this paper we provide an overview of the application of ML to optical communications and networking. We classify and survey relevant literature dealing with the topic, and we also provide an introductory tutorial on ML for researchers and practitioners interested in this field. Although a good number of research papers have recently appeared, the application of ML to optical networks is still in its infancy: to stimulate further work in this area, we conclude the paper proposing new possible research directions

    Efficient machine learning: models and accelerations

    Get PDF
    One of the key enablers of the recent unprecedented success of machine learning is the adoption of very large models. Modern machine learning models typically consist of multiple cascaded layers such as deep neural networks, and at least millions to hundreds of millions of parameters (i.e., weights) for the entire model. The larger-scale model tend to enable the extraction of more complex high-level features, and therefore, lead to a significant improvement of the overall accuracy. On the other side, the layered deep structure and large model sizes also demand to increase computational capability and memory requirements. In order to achieve higher scalability, performance, and energy efficiency for deep learning systems, two orthogonal research and development trends have attracted enormous interests. The first trend is the acceleration while the second is the model compression. The underlying goal of these two trends is the high quality of the models to provides accurate predictions. In this thesis, we address these two problems and utilize different computing paradigms to solve real-life deep learning problems. To explore in these two domains, this thesis first presents the cogent confabulation network for sentence completion problem. We use Chinese language as a case study to describe our exploration of the cogent confabulation based text recognition models. The exploration and optimization of the cogent confabulation based models have been conducted through various comparisons. The optimized network offered a better accuracy performance for the sentence completion. To accelerate the sentence completion problem in a multi-processing system, we propose a parallel framework for the confabulation recall algorithm. The parallel implementation reduce runtime, improve the recall accuracy by breaking the fixed evaluation order and introducing more generalization, and maintain a balanced progress in status update among all neurons. A lexicon scheduling algorithm is presented to further improve the model performance. As deep neural networks have been proven effective to solve many real-life applications, and they are deployed on low-power devices, we then investigated the acceleration for the neural network inference using a hardware-friendly computing paradigm, stochastic computing. It is an approximate computing paradigm which requires small hardware footprint and achieves high energy efficiency. Applying this stochastic computing to deep convolutional neural networks, we design the functional hardware blocks and optimize them jointly to minimize the accuracy loss due to the approximation. The synthesis results show that the proposed design achieves the remarkable low hardware cost and power/energy consumption. Modern neural networks usually imply a huge amount of parameters which cannot be fit into embedded devices. Compression of the deep learning models together with acceleration attracts our attention. We introduce the structured matrices based neural network to address this problem. Circulant matrix is one of the structured matrices, where a matrix can be represented using a single vector, so that the matrix is compressed. We further investigate a more flexible structure based on circulant matrix, called block-circulant matrix. It partitions a matrix into several smaller blocks and makes each submatrix is circulant. The compression ratio is controllable. With the help of Fourier Transform based equivalent computation, the inference of the deep neural network can be accelerated energy efficiently on the FPGAs. We also offer the optimization for the training algorithm for block circulant matrices based neural networks to obtain a high accuracy after compression

    High-performance hardware accelerators for image processing in space applications

    Get PDF
    Mars is a hard place to reach. While there have been many notable success stories in getting probes to the Red Planet, the historical record is full of bad news. The success rate for actually landing on the Martian surface is even worse, roughly 30%. This low success rate must be mainly credited to the Mars environment characteristics. In the Mars atmosphere strong winds frequently breath. This phenomena usually modifies the lander descending trajectory diverging it from the target one. Moreover, the Mars surface is not the best place where performing a safe land. It is pitched by many and close craters and huge stones, and characterized by huge mountains and hills (e.g., Olympus Mons is 648 km in diameter and 27 km tall). For these reasons a mission failure due to a landing in huge craters, on big stones or on part of the surface characterized by a high slope is highly probable. In the last years, all space agencies have increased their research efforts in order to enhance the success rate of Mars missions. In particular, the two hottest research topics are: the active debris removal and the guided landing on Mars. The former aims at finding new methods to remove space debris exploiting unmanned spacecrafts. These must be able to autonomously: detect a debris, analyses it, in order to extract its characteristics in terms of weight, speed and dimension, and, eventually, rendezvous with it. In order to perform these tasks, the spacecraft must have high vision capabilities. In other words, it must be able to take pictures and process them with very complex image processing algorithms in order to detect, track and analyse the debris. The latter aims at increasing the landing point precision (i.e., landing ellipse) on Mars. Future space-missions will increasingly adopt Video Based Navigation systems to assist the entry, descent and landing (EDL) phase of space modules (e.g., spacecrafts), enhancing the precision of automatic EDL navigation systems. For instance, recent space exploration missions, e.g., Spirity, Oppurtunity, and Curiosity, made use of an EDL procedure aiming at following a fixed and precomputed descending trajectory to reach a precise landing point. This approach guarantees a maximum landing point precision of 20 km. By comparing this data with the Mars environment characteristics, it is possible to understand how the mission failure probability still remains really high. A very challenging problem is to design an autonomous-guided EDL system able to even more reduce the landing ellipse, guaranteeing to avoid the landing in dangerous area of Mars surface (e.g., huge craters or big stones) that could lead to the mission failure. The autonomous behaviour of the system is mandatory since a manual driven approach is not feasible due to the distance between Earth and Mars. Since this distance varies from 56 to 100 million of km approximately due to the orbit eccentricity, even if a signal transmission at the light speed could be possible, in the best case the transmission time would be around 31 minutes, exceeding so the overall duration of the EDL phase. In both applications, algorithms must guarantee self-adaptability to the environmental conditions. Since the Mars (and in general the space) harsh conditions are difficult to be predicted at design time, these algorithms must be able to automatically tune the internal parameters depending on the current conditions. Moreover, real-time performances are another key factor. Since a software implementation of these computational intensive tasks cannot reach the required performances, these algorithms must be accelerated via hardware. For this reasons, this thesis presents my research work done on advanced image processing algorithms for space applications and the associated hardware accelerators. My research activity has been focused on both the algorithm and their hardware implementations. Concerning the first aspect, I mainly focused my research effort to integrate self-adaptability features in the existing algorithms. While concerning the second, I studied and validated a methodology to efficiently develop, verify and validate hardware components aimed at accelerating video-based applications. This approach allowed me to develop and test high performance hardware accelerators that strongly overcome the performances of the actual state-of-the-art implementations. The thesis is organized in four main chapters. Chapter 2 starts with a brief introduction about the story of digital image processing. The main content of this chapter is the description of space missions in which digital image processing has a key role. A major effort has been spent on the missions in which my research activity has a substantial impact. In particular, for these missions, this chapter deeply analizes and evaluates the state-of-the-art approaches and algorithms. Chapter 3 analyzes and compares the two technologies used to implement high performances hardware accelerators, i.e., Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). Thanks to this information the reader may understand the main reasons behind the decision of space agencies to exploit FPGAs instead of ASICs for high-performance hardware accelerators in space missions, even if FPGAs are more sensible to Single Event Upsets (i.e., transient error induced on hardware component by alpha particles and solar radiation in space). Moreover, this chapter deeply describes the three available space-grade FPGA technologies (i.e., One-time Programmable, Flash-based, and SRAM-based), and the main fault-mitigation techniques against SEUs that are mandatory for employing space-grade FPGAs in actual missions. Chapter 4 describes one of the main contribution of my research work: a library of high-performance hardware accelerators for image processing in space applications. The basic idea behind this library is to offer to designers a set of validated hardware components able to strongly speed up the basic image processing operations commonly used in an image processing chain. In other words, these components can be directly used as elementary building blocks to easily create a complex image processing system, without wasting time in the debug and validation phase. This library groups the proposed hardware accelerators in IP-core families. The components contained in a same family share the same provided functionality and input/output interface. This harmonization in the I/O interface enables to substitute, inside a complex image processing system, components of the same family without requiring modifications to the system communication infrastructure. In addition to the analysis of the internal architecture of the proposed components, another important aspect of this chapter is the methodology used to develop, verify and validate the proposed high performance image processing hardware accelerators. This methodology involves the usage of different programming and hardware description languages in order to support the designer from the algorithm modelling up to the hardware implementation and validation. Chapter 5 presents the proposed complex image processing systems. In particular, it exploits a set of actual case studies, associated with the most recent space agency needs, to show how the hardware accelerator components can be assembled to build a complex image processing system. In addition to the hardware accelerators contained in the library, the described complex system embeds innovative ad-hoc hardware components and software routines able to provide high performance and self-adaptable image processing functionalities. To prove the benefits of the proposed methodology, each case study is concluded with a comparison with the current state-of-the-art implementations, highlighting the benefits in terms of performances and self-adaptability to the environmental conditions

    Extension of the L1Calo PreProcessor System for the ATLAS Phase-I Calorimeter Trigger Upgrade

    Get PDF
    For the Run-3 data-taking period at the Large Hadron Collider (LHC), the hardware- based Level-1 Calorimeter Trigger (L1Calo) of the ATLAS experiment was upgraded. Through new and sophisticated algorithms, the upgrade will increase the trigger performance in a challenging, high-pileup environment while maintaining low selection thresholds. The Tile Rear Extension (TREX) modules are the latest addition to the L1Calo PreProcessor system. Hosting state-of-the-art FPGAs and high-speed optical transceivers, the TREX modules provide digitised hadronic transverse energies from the ATLAS Tile Calorimeter to the new feature extractor (FEX) processors every 25 ns. In addition, the modules are designed to maintain compatibility with the original trigger processors. The system of 32 TREX modules has been developed, produced and successfully installed in ATLAS. The thesis describes the functional implementation of the modules and the detailed integration and commissioning into the ATLAS detector

    Semiconductor Memory Applications in Radiation Environment, Hardware Security and Machine Learning System

    Get PDF
    abstract: Semiconductor memory is a key component of the computing systems. Beyond the conventional memory and data storage applications, in this dissertation, both mainstream and eNVM memory technologies are explored for radiation environment, hardware security system and machine learning applications. In the radiation environment, e.g. aerospace, the memory devices face different energetic particles. The strike of these energetic particles can generate electron-hole pairs (directly or indirectly) as they pass through the semiconductor device, resulting in photo-induced current, and may change the memory state. First, the trend of radiation effects of the mainstream memory technologies with technology node scaling is reviewed. Then, single event effects of the oxide based resistive switching random memory (RRAM), one of eNVM technologies, is investigated from the circuit-level to the system level. Physical Unclonable Function (PUF) has been widely investigated as a promising hardware security primitive, which employs the inherent randomness in a physical system (e.g. the intrinsic semiconductor manufacturing variability). In the dissertation, two RRAM-based PUF implementations are proposed for cryptographic key generation (weak PUF) and device authentication (strong PUF), respectively. The performance of the RRAM PUFs are evaluated with experiment and simulation. The impact of non-ideal circuit effects on the performance of the PUFs is also investigated and optimization strategies are proposed to solve the non-ideal effects. Besides, the security resistance against modeling and machine learning attacks is analyzed as well. Deep neural networks (DNNs) have shown remarkable improvements in various intelligent applications such as image classification, speech classification and object localization and detection. Increasing efforts have been devoted to develop hardware accelerators. In this dissertation, two types of compute-in-memory (CIM) based hardware accelerator designs with SRAM and eNVM technologies are proposed for two binary neural networks, i.e. hybrid BNN (HBNN) and XNOR-BNN, respectively, which are explored for the hardware resource-limited platforms, e.g. edge devices.. These designs feature with high the throughput, scalability, low latency and high energy efficiency. Finally, we have successfully taped-out and validated the proposed designs with SRAM technology in TSMC 65 nm. Overall, this dissertation paves the paths for memory technologies’ new applications towards the secure and energy-efficient artificial intelligence system.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Improving the Scalability of XCS-Based Learning Classifier Systems

    No full text
    Using evolutionary intelligence and machine learning techniques, a broad range of intelligent machines have been designed to perform different tasks. An intelligent machine learns by perceiving its environmental status and taking an action that maximizes its chances of success. Human beings have the ability to apply knowledge learned from a smaller problem to more complex, large-scale problems of the same or a related domain, but currently the vast majority of evolutionary machine learning techniques lack this ability. This lack of ability to apply the already learned knowledge of a domain results in consuming more than the necessary resources and time to solve complex, large-scale problems of the domain. As the problem increases in size, it becomes difficult and even sometimes impractical (if not impossible) to solve due to the needed resources and time. Therefore, in order to scale in a problem domain, a systemis needed that has the ability to reuse the learned knowledge of the domain and/or encapsulate the underlying patterns in the domain. To extract and reuse building blocks of knowledge or to encapsulate the underlying patterns in a problem domain, a rich encoding is needed, but the search space could then expand undesirably and cause bloat, e.g. as in some forms of genetic programming (GP). Learning classifier systems (LCSs) are a well-structured evolutionary computation based learning technique that have pressures to implicitly avoid bloat, such as fitness sharing through niche based reproduction. The proposed thesis is that an LCS can scale to complex problems in a domain by reusing the learnt knowledge from simpler problems of the domain and/or encapsulating the underlying patterns in the domain. Wilson’s XCS is used to implement and test the proposed systems, which is a well-tested, online learning and accuracy based LCS model. To extract the reusable building blocks of knowledge, GP-tree like, code-fragments are introduced, which are more than simply another representation (e.g. ternary or real-valued alphabets). This thesis is extended to capture the underlying patterns in a problemusing a cyclic representation. Hard problems are experimented to test the newly developed scalable systems and compare them with benchmark techniques. Specifically, this work develops four systems to improve the scalability of XCS-based classifier systems. (1) Building blocks of knowledge are extracted fromsmaller problems of a Boolean domain and reused in learning more complex, large-scale problems in the domain, for the first time. By utilizing the learnt knowledge from small-scale problems, the developed XCSCFC (i.e. XCS with Code-Fragment Conditions) system readily solves problems of a scale that existing LCS and GP approaches cannot, e.g. the 135-bitMUX problem. (2) The introduction of the code fragments in classifier actions in XCSCFA (i.e. XCS with Code-Fragment Actions) enables the rich representation of GP, which when couples with the divide and conquer approach of LCS, to successfully solve various complex, overlapping and niche imbalance Boolean problems that are difficult to solve using numeric action based XCS. (3) The underlying patterns in a problem domain are encapsulated in classifier rules encoded by a cyclic representation. The developed XCSSMA system produces general solutions of any scale n for a number of important Boolean problems, for the first time in the field of LCS, e.g. parity problems. (4) Optimal solutions for various real-valued problems are evolved by extending the existing real-valued XCSR system with code-fragment actions to XCSRCFA. Exploiting the combined power of GP and LCS techniques, XCSRCFA successfully learns various continuous action and function approximation problems that are difficult to learn using the base techniques. This research work has shown that LCSs can scale to complex, largescale problems through reusing learnt knowledge. The messy nature, disassociation of message to condition order, masking, feature construction, and reuse of extracted knowledge add additional abilities to the XCS family of LCSs. The ability to use rich encoding in antecedent GP-like codefragments or consequent cyclic representation leads to the evolution of accurate, maximally general and compact solutions in learning various complex Boolean as well as real-valued problems. Effectively exploiting the combined power of GP and LCS techniques, various continuous action and function approximation problems are solved in a simple and straight forward manner. The analysis of the evolved rules reveals, for the first time in XCS, that no matter how specific or general the initial classifiers are, all the optimal classifiers are converged through the mechanism ‘be specific then generalize’ near the final stages of evolution. Also that standard XCS does not use all available information or all available genetic operators to evolve optimal rules, whereas the developed code-fragment action based systems effectively use figure and ground information during the training process. Thiswork has created a platformto explore the reuse of learnt functionality, not just terminal knowledge as present, which is needed to replicate human capabilities

    Testability and redundancy techniques for improved yield and reliability of CMOS VLSI circuits

    Get PDF
    The research presented in this thesis is concerned with the design of fault-tolerant integrated circuits as a contribution to the design of fault-tolerant systems. The economical manufacture of very large area ICs will necessitate the incorporation of fault-tolerance features which are routinely employed in current high density dynamic random access memories. Furthermore, the growing use of ICs in safety-critical applications and/or hostile environments in addition to the prospect of single-chip systems will mandate the use of fault-tolerance for improved reliability. A fault-tolerant IC must be able to detect and correct all possible faults that may affect its operation. The ability of a chip to detect its own faults is not only necessary for fault-tolerance, but it is also regarded as the ultimate solution to the problem of testing. Off-line periodic testing is selected for this research because it achieves better coverage of physical faults and it requires less extra hardware than on-line error detection techniques. Tests for CMOS stuck-open faults are shown to detect all other faults. Simple test sequence generation procedures for the detection of all faults are derived. The test sequences generated by these procedures produce a trivial output, thereby, greatly simplifying the task of test response analysis. A further advantage of the proposed test generation procedures is that they do not require the enumeration of faults. The implementation of built-in self-test is considered and it is shown that the hardware overhead is comparable to that associated with pseudo-random and pseudo-exhaustive techniques while achieving a much higher fault coverage through-the use of the proposed test generation procedures. The consideration of the problem of testing the test circuitry led to the conclusion that complete test coverage may be achieved if separate chips cooperate in testing each other's untested parts. An alternative approach towards complete test coverage would be to design the test circuitry so that it is as distributed as possible and so that it is tested as it performs its function. Fault correction relies on the provision of spare units and a means of reconfiguring the circuit so that the faulty units are discarded. This raises the question of what is the optimum size of a unit? A mathematical model, linking yield and reliability is therefore developed to answer such a question and also to study the effects of such parameters as the amount of redundancy, the size of the additional circuitry required for testing and reconfiguration, and the effect of periodic testing on reliability. The stringent requirement on the size of the reconfiguration logic is illustrated by the application of the model to a typical example. Another important result concerns the effect of periodic testing on reliability. It is shown that periodic off-line testing can achieve approximately the same level of reliability as on-line testing, even when the time between tests is many hundreds of hours

    Logic perturbation based circuit partitioning and optimum FPGA switch-box designs.

    Get PDF
    Cheung Chak Chung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 101-114).Abstracts in English and Chinese.Abstract --- p.iAcknowledgments --- p.iiiVita --- p.vTable of Contents --- p.viList of Figures --- p.xList of Tables --- p.xivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.2 --- Aims and Contribution --- p.4Chapter 1.3 --- Thesis Overview --- p.5Chapter 2 --- VLSI Design Cycle --- p.6Chapter 2.1 --- Logic Synthesis --- p.7Chapter 2.1.1 --- Logic Minimization --- p.8Chapter 2.1.2 --- Technology Mapping --- p.8Chapter 2.1.3 --- Testability --- p.8Chapter 2.2 --- Physical Design Synthesis --- p.8Chapter 2.2.1 --- Partitioning --- p.9Chapter 2.2.2 --- Floorplanning & Placement --- p.10Chapter 2.2.3 --- Routing --- p.11Chapter 2.2.4 --- "Compaction, Extraction & Verification" --- p.12Chapter 2.2.5 --- Physical Design of FPGAs --- p.12Chapter 3 --- Alternative Wiring --- p.13Chapter 3.1 --- Introduction --- p.13Chapter 3.2 --- Notation and Definitions --- p.15Chapter 3.3 --- Application of Rewiring --- p.17Chapter 3.3.1 --- Logic Optimization --- p.17Chapter 3.3.2 --- Timing Optimization --- p.17Chapter 3.3.3 --- Circuit Partitioning and Routing --- p.18Chapter 3.4 --- Logic Optimization Analysis --- p.19Chapter 3.4.1 --- Global Flow Optimization --- p.19Chapter 3.4.2 --- OBDD Representation --- p.20Chapter 3.4.3 --- Automatic Test Pattern Generation (ATPG) --- p.22Chapter 3.4.4 --- Graph Based Alternative Wiring (GBAW) --- p.23Chapter 3.5 --- Augmented GBAW --- p.26Chapter 3.6 --- Logic Optimization by using GBAW --- p.28Chapter 3.7 --- Conclusions --- p.31Chapter 4 --- Multi-way Partitioning using Rewiring Techniques --- p.33Chapter 4.1 --- Introduction --- p.33Chapter 4.2 --- Circuit Partitioning Algorithm Analysis --- p.38Chapter 4.2.1 --- The Kernighan-Lin (KL) Algorithm --- p.39Chapter 4.2.2 --- The Fiduccia-Mattheyses (FM) Algorithm --- p.42Chapter 4.2.3 --- Geometric Representation Algorithm --- p.46Chapter 4.2.4 --- The Multi-level Partitioning Algorithm --- p.49Chapter 4.2.5 --- Hypergraph METIS - hMETIS --- p.51Chapter 4.3 --- The GBAW Partitioning Algorithm --- p.53Chapter 4.4 --- Experimental Results --- p.56Chapter 4.5 --- Conclusions --- p.58Chapter 5 --- Optimum FPGA Switch-Box Designs - HUSB --- p.62Chapter 5.1 --- Introduction --- p.62Chapter 5.2 --- Background and Definitions --- p.65Chapter 5.2.1 --- Routing Architectures --- p.65Chapter 5.2.2 --- Global Routing --- p.67Chapter 5.2.3 --- Detailed Routing --- p.67Chapter 5.3 --- FPGA Router Comparison --- p.69Chapter 5.3.1 --- CGE --- p.69Chapter 5.3.2 --- SEGA --- p.70Chapter 5.3.3 --- TRACER --- p.71Chapter 5.3.4 --- VPR --- p.72Chapter 5.4 --- Switch Box Design --- p.73Chapter 5.4.1 --- Disjoint type switch box (XC4000-type) --- p.73Chapter 5.4.2 --- Anti-symmetric switch box --- p.74Chapter 5.4.3 --- Universal Switch box --- p.74Chapter 5.4.4 --- Switch box Analysis --- p.75Chapter 5.5 --- Terminology --- p.77Chapter 5.6 --- "Hyper-universal (4, W)-design analysis" --- p.82Chapter 5.6.1 --- "H3 is an optimum (4, 3)-design" --- p.84Chapter 5.6.2 --- "H4 is an optimum (4,4)-design" --- p.88Chapter 5.6.3 --- "Hi is a hyper-universal (4, i)-design for i = 5,6,7" --- p.90Chapter 5.7 --- Experimental Results --- p.92Chapter 5.8 --- Conclusions --- p.95Chapter 6 --- Conclusions --- p.99Chapter 6.1 --- Thesis Summary --- p.99Chapter 6.2 --- Future work --- p.100Chapter 6.2.1 --- Alternative Wiring --- p.100Chapter 6.2.2 --- Partitioning Quality --- p.100Chapter 6.2.3 --- Routing Devices Studies --- p.100Bibliography --- p.101Chapter A --- 5xpl - Berkeley Logic Interchange Format (BLIF) --- p.115Chapter B --- Proof of some 2-local patterns --- p.122Chapter C --- Illustrations of FM algorithm --- p.124Chapter D --- HUSB Structures --- p.127Chapter E --- Primitive minimal 4-way global routing Structures --- p.13

    Nano-intrinsic security primitives for internet of everything

    Get PDF
    With the advent of Internet-enabled electronic devices and mobile computer systems, maintaining data security is one of the most important challenges in modern civilization. The innovation of physically unclonable functions (PUFs) shows great potential for enabling low-cost low-power authentication, anti-counterfeiting and beyond on the semiconductor chips. This is because secrets in a PUF are hidden in the randomness of the physical properties of desirably identical devices, making it extremely difficult, if not impossible, to extract them. Hence, the basic idea of PUF is to take advantage of inevitable non-idealities in the physical domain to create a system that can provide an innovative way to secure device identities, sensitive information, and their communications. While the physical variation exists everywhere, various materials, systems, and technologies have been considered as the source of unpredictable physical device variation in large scales for generating security primitives. The purpose of this project is to develop emerging solid-state memory-based security primitives and examine their robustness as well as feasibility. Firstly, the author gives an extensive overview of PUFs. The rationality, classification, and application of PUF are discussed. To objectively compare the quality of PUFs, the author formulates important PUF properties and evaluation metrics. By reviewing previously proposed constructions ranging from conventional standard complementary metal-oxide-semiconductor (CMOS) components to emerging non-volatile memories, the quality of different PUFs classes are discussed and summarized. Through a comparative analysis, emerging non-volatile redox-based resistor memories (ReRAMs) have shown the potential as promising candidates for the next generation of low-cost, low-power, compact in size, and secure PUF. Next, the author presents novel approaches to build a PUF by utilizing concatenated two layers of ReRAM crossbar arrays. Upon concatenate two layers, the nonlinear structure is introduced, and this results in the improved uniformity and the avalanche characteristic of the proposed PUF. A group of cell readout method is employed, and it supports a massive pool of challenge-response pairs of the nonlinear ReRAM-based PUF. The non-linear PUF construction is experimentally assessed using the evaluation metrics, and the quality of randomness is verified using predictive analysis. Last but not least, random telegraph noise (RTN) is studied as a source of entropy for a true random number generation (TRNG). RTN is usually considered a disadvantageous feature in the conventional CMOS designs. However, in combination with appropriate readout scheme, RTN in ReRAM can be used as a novel technique to generate quality random numbers. The proposed differential readout-based design can maintain the quality of output by reducing the effect of the undesired noise from the whole system, while the controlling difficulty of the conventional readout method can be significantly reduced. This is advantageous as the differential readout circuit can embrace the resistance variation features of ReRAMs without extensive pre-calibration. The study in this thesis has the potential to enable the development of cost-efficient and lightweight security primitives that can be integrated into modern computer mobile systems and devices for providing a high level of security
    corecore