154 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationThe continuous growth of wireless communication use has largely exhausted the limited spectrum available. Methods to improve spectral efficiency are in high demand and will continue to be for the foreseeable future. Several technologies have the potential to make large improvements to spectral efficiency and the total capacity of networks including massive multiple-input multiple-output (MIMO), cognitive radio, and spatial-multiplexing MIMO. Of these, spatial-multiplexing MIMO has the largest near-term potential as it has already been adopted in the WiFi, WiMAX, and LTE standards. Although transmitting independent MIMO streams is cheap and easy, with a mere linear increase in cost with streams, receiving MIMO is difficult since the optimal methods have exponentially increasing cost and power consumption. Suboptimal MIMO detectors such as K-Best have a drastically reduced complexity compared to optimal methods but still have an undesirable exponentially increasing cost with data-rate. The Markov Chain Monte Carlo (MCMC) detector has been proposed as a near-optimal method with polynomial cost, but it has a history of unusual performance issues which have hindered its adoption. In this dissertation, we introduce a revised derivation of the bitwise MCMC MIMO detector. The new approach resolves the previously reported high SNR stalling problem of MCMC without the need for hybridization with another detector method or adding heuristic temperature scaling terms. Another common problem with MCMC algorithms is an unknown convergence time making predictable fixed-length implementations problematic. When an insufficient number of iterations is used on a slowly converging example, the output LLRs can be unstable and overconfident, therefore, we develop a method to identify rare, slowly converging runs and mitigate their degrading effects on the soft-output information. This improves forward-error-correcting code performance and removes a symptomatic error floor in bit-error-rates. Next, pseudo-convergence is identified with a novel way to visualize the internal behavior of the Gibbs sampler. An effective and efficient pseudo-convergence detection and escape strategy is suggested. Finally, the new excited MCMC (X-MCMC) detector is shown to have near maximum-a-posteriori (MAP) performance even with challenging, realistic, highly-correlated channels at the maximum MIMO sizes and modulation rates supported by the 802.11ac WiFi specification, 8x8 256 QAM. Further, the new excited MCMC (X-MCMC) detector is demonstrated on an 8-antenna MIMO testbed with the 802.11ac WiFi protocol, confirming its high performance. Finally, a VLSI implementation of the X-MCMC detector is presented which retains the near-optimal performance of the floating-point algorithm while having one of the lowest complexities found in the near-optimal MIMO detector literature

    High-Performance Hardware and Software Implementations of the Cyclic Redundancy Check Computation

    Get PDF
    The Cyclic Redundancy Check (CRC) is an error detection code used in many digital transmission and storage systems. The two major research areas surrounding CRCs concern developing computation approaches and studying error detection properties. This thesis aims to explore the various aspects of the CRC computation, with the primary objective being to propose novel computation approaches which outperform the existing ones. The work begins with a thorough examination of the formulations found throughout the literature. Then, their subsequent realizations as hardware architectures and software algorithms are investigated. During this investigation, some improvements are presented including optimizations of the state-space trans­ formed and primitive architectures. Afterward, novel formulations are derived and the most significant contribution consists of a matrix decomposition that gives rise to a high-performance software algorithm. Simulation and implementation results are gathered for both hardware and software deployments of the investigated computa­ tion approaches. The theoretical results obtained by simulations are validated with implementation experiments. The proposed algorithm is shown to outperform the existing comparable low-memory algorithm in terms of time complexity

    Vision Sensors and Edge Detection

    Get PDF
    Vision Sensors and Edge Detection book reflects a selection of recent developments within the area of vision sensors and edge detection. There are two sections in this book. The first section presents vision sensors with applications to panoramic vision sensors, wireless vision sensors, and automated vision sensor inspection, and the second one shows image processing techniques, such as, image measurements, image transformations, filtering, and parallel computing

    PUF Modeling Attacks on Simulated and Silicon Data

    Get PDF
    We discuss numerical modeling attacks on several proposed strong physical unclonable functions (PUFs). Given a set of challenge-response pairs (CRPs) of a Strong PUF, the goal of our attacks is to construct a computer algorithm which behaves indistinguishably from the original PUF on almost all CRPs. If successful, this algorithm can subsequently impersonate the Strong PUF, and can be cloned and distributed arbitrarily. It breaks the security of any applications that rest on the Strong PUF's unpredictability and physical unclonability. Our method is less relevant for other PUF types such as Weak PUFs. The Strong PUFs that we could attack successfully include standard Arbiter PUFs of essentially arbitrary sizes, and XOR Arbiter PUFs, Lightweight Secure PUFs, and Feed-Forward Arbiter PUFs up to certain sizes and complexities. We also investigate the hardness of certain Ring Oscillator PUF architectures in typical Strong PUF applications. Our attacks are based upon various machine learning techniques, including a specially tailored variant of logistic regression and evolution strategies. Our results are mostly obtained on CRPs from numerical simulations that use established digital models of the respective PUFs. For a subset of the considered PUFs-namely standard Arbiter PUFs and XOR Arbiter PUFs-we also lead proofs of concept on silicon data from both FPGAs and ASICs. Over four million silicon CRPs are used in this process. The performance on silicon CRPs is very close to simulated CRPs, confirming a conjecture from earlier versions of this work. Our findings lead to new design requirements for secure electrical Strong PUFs, and will be useful to PUF designers and attackers alike.National Science Foundation (U.S.) (Grant CNS 0923313)National Science Foundation (U.S.) (Grant CNS 0964641

    Design and Implementation of Hardware Accelerators for Neural Processing Applications

    Full text link
    Primary motivation for this work was the need to implement hardware accelerators for a newly proposed ANN structure called Auto Resonance Network (ARN) for robotic motion planning. ARN is an approximating feed-forward hierarchical and explainable network. It can be used in various AI applications but the application base was small. Therefore, the objective of the research was twofold: to develop a new application using ARN and to implement a hardware accelerator for ARN. As per the suggestions given by the Doctoral Committee, an image recognition system using ARN has been implemented. An accuracy of around 94% was achieved with only 2 layers of ARN. The network also required a small training data set of about 500 images. Publicly available MNIST dataset was used for this experiment. All the coding was done in Python. Massive parallelism seen in ANNs presents several challenges to CPU design. For a given functionality, e.g., multiplication, several copies of serial modules can be realized within the same area as a parallel module. Advantage of using serial modules compared to parallel modules under area constraints has been discussed. One of the module often useful in ANNs is a multi-operand addition. One problem in its implementation is that the estimation of carry bits when the number of operands changes. A theorem to calculate exact number of carry bits required for a multi-operand addition has been presented in the thesis which alleviates this problem. The main advantage of the modular approach to multi-operand addition is the possibility of pipelined addition with low reconfiguration overhead. This results in overall increase in throughput for large number of additions, typically seen in several DNN configurations

    Algorithm Architecture Co-design for Dense and Sparse Matrix Computations

    Get PDF
    abstract: With the end of Dennard scaling and Moore's law, architects have moved towards heterogeneous designs consisting of specialized cores to achieve higher performance and energy efficiency for a target application domain. Applications of linear algebra are ubiquitous in the field of scientific computing, machine learning, statistics, etc. with matrix computations being fundamental to these linear algebra based solutions. Design of multiple dense (or sparse) matrix computation routines on the same platform is quite challenging. Added to the complexity is the fact that dense and sparse matrix computations have large differences in their storage and access patterns and are difficult to optimize on the same architecture. This thesis addresses this challenge and introduces a reconfigurable accelerator that supports both dense and sparse matrix computations efficiently. The reconfigurable architecture has been optimized to execute the following linear algebra routines: GEMV (Dense General Matrix Vector Multiplication), GEMM (Dense General Matrix Matrix Multiplication), TRSM (Triangular Matrix Solver), LU Decomposition, Matrix Inverse, SpMV (Sparse Matrix Vector Multiplication), SpMM (Sparse Matrix Matrix Multiplication). It is a multicore architecture where each core consists of a 2D array of processing elements (PE). The 2D array of PEs is of size 4x4 and is scheduled to perform 4x4 sized matrix updates efficiently. A sequence of such updates is used to solve a larger problem inside a core. A novel partitioned block compressed sparse data structure (PBCSC/PBCSR) is used to perform sparse kernel updates. Scalable partitioning and mapping schemes are presented that map input matrices of any given size to the multicore architecture. Design trade-offs related to the PE array dimension, size of local memory inside a core and the bandwidth between on-chip memories and the cores have been presented. An optimal core configuration is developed from this analysis. Synthesis results using a 7nm PDK show that the proposed accelerator can achieve a performance of upto 32 GOPS using a single core.Dissertation/ThesisMasters Thesis Computer Engineering 201

    Development of a multilevel converter topology for transformer-less connection of renewable energy systems

    Get PDF
    The global need to reduce dependence on fossil fuels for electricity production has become an ongoing research theme in the last decade. Clean energy sources (such as wind energy and solar energy) have considerable potential to reduce reliance on fossil fuels and mitigate climate change. However, wind energy is going to become more mainstream due to technological advancement and geographical availability. Therefore, various technologies exist to maximize the inherent advantages of using wind energy conversion systems (WECSs) to generate electrical power. One important technology is the power electronics interface that enables the transfer and effective control of electrical power from the renewable energy source to the grid through the filter and isolation transformer. However, the transformer is bulky, generates losses, and is also very costly. Therefore, the term "transformer-less connection" refers to eliminating a step-up transformer from the WECS, while the power conversion stage performs the conventional functions of a transformer. Existing power converter configurations for transformer-less connection of a WECS are either based on the generator-converter configuration or three-stage power converter configuration. These configurations consist of conventional multilevel converter topologies and two-stage power conversion between the generator-side converter topology and the high-order filter connected to the collection point of the wind power plant (WPP). Thus, the complexity and cost of these existing configurations are significant at higher voltage and power ratings. Therefore, a single-stage multilevel converter topology is proposed to simplify the power conversion stage of a transformer-less WECS. Furthermore, the primary design challenges – such as multiple clamping devices, multiple dc-link capacitors, and series-connected power semiconductor devices – have been mitigated by the proposed converter topology. The proposed converter topology, known as the "tapped inductor quasi-Z-source nested neutral-point-clamped (NNPC) converter," has been analyzed, and designed, and a prototype of the topology developed for experimental verification. A field-programmable gate array (FPGA)-based modulation technique and voltage balancing control technique for maintaining the clamping capacitor voltages was developed. Hence, the proposed converter topology presents a single-stage power conversion configuration. Efficiency analysis of the proposed converter topology has been studied and compared to the intermediate and grid-side converter topology of a three-stage power converter configuration. A direct current (DC) component minimization technique to minimize the dc component generated by the proposed converter topology was investigated, developed, and verified experimentally. The proposed dc component minimization technique consists of a sensing and measurement circuitry with a digital notch filter. This thesis presents a detailed and comprehensive overview of the existing power converter configurations developed for transformer-less WECS applications. Based on the developed 2 comparative benchmark factor (CBF), the merits and demerits of each power converter configuration in terms of the component counts and grid compliance have been presented. In terms of cost comparison, the three-stage power converter configuration is more cost-effective than the generatorconverter configuration. Furthermore, the cost-benefit analysis of deploying a transformer-less WECSs in a WPP is evaluated and compared with conventional WECS in a WPP based on power converter configurations and collection system. Overall, the total cost of the collection system of WPP with transformer-less WECSs is about 23% less than the total cost of WPP with conventional WECs. The derivation and theoretical analysis of the proposed five-level tapped inductor quasi-Z-source NNPC converter topology have been presented, emphasizing its operating principles, steady-state analysis, and deriving equations to calculate its inductance and capacitance values. Furthermore, the FPGA implementation of the proposed converter topology was verified experimentally with a developed prototype of the topology. The efficiency of the proposed converter topology has been evaluated by varying the switching frequency and loads. Furthermore, the proposed converter topology is more efficient than the five-level DC-DC converter with a five-level diode-clamped converter (DCC) topology under the three-stage power converter configuration. Also, the cost analysis of the proposed converter topology and the conventional converter topology shows that it is more economical to deploy the proposed converter topology at the grid side of a transformer-less WECS
    • …
    corecore