131 research outputs found

    Leveraging Signal Transfer Characteristics and Parasitics of Spintronic Circuits for Area and Energy-Optimized Hybrid Digital and Analog Arithmetic

    Get PDF
    While Internet of Things (IoT) sensors offer numerous benefits in diverse applications, they are limited by stringent constraints in energy, processing area and memory. These constraints are especially challenging within applications such as Compressive Sensing (CS) and Machine Learning (ML) via Deep Neural Networks (DNNs), which require dot product computations on large data sets. A solution to these challenges has been offered by the development of crossbar array architectures, enabled by recent advances in spintronic devices such as Magnetic Tunnel Junctions (MTJs). Crossbar arrays offer a compact, low-energy and in-memory approach to dot product computation in the analog domain by leveraging intrinsic signal-transfer characteristics of the embedded MTJ devices. The first phase of this dissertation research seeks to build on these benefits by optimizing resource allocation within spintronic crossbar arrays. A hardware approach to non-uniform CS is developed, which dynamically configures sampling rates by deriving necessary control signals using circuit parasitics. Next, an alternate approach to non-uniform CS based on adaptive quantization is developed, which reduces circuit area in addition to energy consumption. Adaptive quantization is then applied to DNNs by developing an architecture allowing for layer-wise quantization based on relative robustness levels. The second phase of this research focuses on extension of the analog computation paradigm by development of an operational amplifier-based arithmetic unit for generalized scalar operations. This approach allows for 95% area reduction in scalar multiplications, compared to the state-of-the-art digital alternative. Moreover, analog computation of enhanced activation functions allows for significant improvement in DNN accuracy, which can be harnessed through triple modular redundancy to yield 81.2% reduction in power at the cost of only 4% accuracy loss, compared to a larger network. Together these results substantiate promising approaches to several challenges facing the design of future IoT sensors within the targeted applications of CS and ML

    Learning-Based Hardware Design for Data Acquisition Systems

    Get PDF
    This multidisciplinary research work aims to investigate the optimized information extraction from signals or data volumes and to develop tailored hardware implementations that trade-off the complexity of data acquisition with that of data processing, conceptually allowing radically new device designs. The mathematical results in classical Compressive Sampling (CS) support the paradigm of Analog-to-Information Conversion (AIC) as a replacement for conventional ADC technologies. The AICs simultaneously perform data acquisition and compression, seeking to directly sample signals for achieving specific tasks as opposed to acquiring a full signal only at the Nyquist rate to throw most of it away via compression. Our contention is that in order for CS to live up its name, both theory and practice must leverage concepts from learning. This work demonstrates our contention in hardware prototypes, with key trade-offs, for two different fields of application as edge and big-data computing. In the framework of edge-data computing, such as wearable and implantable ecosystems, the power budget is defined by the battery capacity, which generally limits the device performance and usability. This is more evident in very challenging field, such as medical monitoring, where high performance requirements are necessary for the device to process the information with high accuracy. Furthermore, in applications like implantable medical monitoring, the system performances have to merge the small area as well as the low-power requirements, in order to facilitate the implant bio-compatibility, avoiding the rejection from the human body. Based on our new mathematical foundations, we built different prototypes to get a neural signal acquisition chip that not only rigorously trades off its area, energy consumption, and the quality of its signal output, but also significantly outperforms the state-of-the-art in all aspects. In the framework of big-data and high-performance computation, such as in high-end servers application, the RF circuits meant to transmit data from chip-to-chip or chip-to-memory are defined by low power requirements, since the heat generated by the integrated circuits is partially distributed by the chip package. Hence, the overall system power budget is defined by its affordable cooling capacity. For this reason, application specific architectures and innovative techniques are used for low-power implementation. In this work, we have developed a single-ended multi-lane receiver for high speed I/O link in servers application. The receiver operates at 7 Gbps by learning inter-symbol interference and electromagnetic coupling noise in chip-to-chip communication systems. A learning-based approach allows a versatile receiver circuit which not only copes with large channel attenuation but also implements novel crosstalk reduction techniques, to allow single-ended multiple lines transmission, without sacrificing its overall bandwidth for a given area within the interconnect's data-path

    Large scale reconfigurable analog system design enabled through floating-gate transistors

    Get PDF
    This work is concerned with the implementation and implication of non-volatile charge storage on VLSI system design. To that end, the floating-gate pFET (fg-pFET) is considered in the context of large-scale arrays. The programming of the element in an efficient and predictable way is essential to the implementation of these systems, and is thus explored. The overhead of the control circuitry for the fg-pFET, a key scalability issue, is examined. A light-weight, trend-accurate model is absolutely necessary for VLSI system design and simulation, and is also provided. Finally, several reconfigurable and reprogrammable systems that were built are discussed.Ph.D.Committee Chair: Hasler, Paul E.; Committee Member: Anderson, David V.; Committee Member: Ayazi, Farrokh; Committee Member: Degertekin, F. Levent; Committee Member: Hunt, William D

    Data compression techniques applied to high resolution high frame rate video technology

    Get PDF
    An investigation is presented of video data compression applied to microgravity space experiments using High Resolution High Frame Rate Video Technology (HHVT). An extensive survey of methods of video data compression, described in the open literature, was conducted. The survey examines compression methods employing digital computing. The results of the survey are presented. They include a description of each method and assessment of image degradation and video data parameters. An assessment is made of present and near term future technology for implementation of video data compression in high speed imaging system. Results of the assessment are discussed and summarized. The results of a study of a baseline HHVT video system, and approaches for implementation of video data compression, are presented. Case studies of three microgravity experiments are presented and specific compression techniques and implementations are recommended

    A Study on Efficient Designs of Approximate Arithmetic Circuits

    Get PDF
    Approximate computing is a popular field where accuracy is traded with energy. It can benefit applications such as multimedia, mobile computing and machine learning which are inherently error resilient. Error introduced in these applications to a certain degree is beyond human perception. This flexibility can be exploited to design area, delay and power efficient architectures. However, care must be taken on how approximation compromises the correctness of results. This research work aims to provide approximate hardware architectures with error metrics and design metrics analyzed and their effects in image processing applications. Firstly, we study and propose unsigned array multipliers based on probability statistics and with approximate 4-2 compressors, full adders and half adders. This work deals with a new design approach for approximation of multipliers. The partial products of the multiplier are altered to introduce varying probability terms. Logic complexity of approximation is varied for the accumulation of altered partial products based on their probability. The proposed approximation is utilized in two variants of 16-bit multipliers. Synthesis results reveal that two proposed multipliers achieve power savings of 72% and 38% respectively compared to an exact multiplier. They have better precision when compared to existing approximate multipliers. Mean relative error distance (MRED) figures are as low as 7.6% and 0.02% for the proposed approximate multipliers, which are better than the previous state-of-the-art works. Performance of the proposed multipliers is evaluated with geometric mean filtering application, where one of the proposed models achieves the highest peak signal to noise ratio (PSNR). Second, approximation is proposed for signed Booth multiplication. Approximation is introduced in partial product generation and partial product accumulation circuits. In this work, three multipliers (ABM-M1, ABM-M2, and ABM-M3) are proposed in which the modified Booth algorithm is approximated. In all three designs, approximate Booth partial product generators are designed with different variations of approximation. The approximations are performed by reducing the logic complexity of the Booth partial product generator, and the accumulation of partial products is slightly modified to improve circuit performance. Compared to the exact Booth multiplier, ABM-M1 achieves up to 15% reduction in power consumption with an MRED value of 7.9 × 10-4. ABM-M2 has power savings of up to 60% with an MRED of 1.1 × 10-1. ABM-M3 has power savings of up to 50% with an MRED of 3.4 × 10-3. Compared to existing approximate Booth multipliers, the proposed multipliers ABM-M1 and ABM-M3 achieve up to a 41% reduction in power consumption while exhibiting very similar error metrics. Image multiplication and matrix multiplication are used as case studies to illustrate the high performance of the proposed approximate multipliers. Third, distributed arithmetic based sum of products units approximation is analyzed. Sum of products units are key elements in many digital signal processing applications. Three approximate sum of products models which are based on distributed arithmetic are proposed. They are designed for different levels of accuracy. First model of approximate sum of products achieves an improvement up to 64% on area and 70% on power, when compared to conventional unit. Other two models provide an improvement of 32% and 48% on area and 54% and 58% on power, respectively, with a reduced error rate compared to the first model. Third model achieves MRED and normalized mean error distance (NMED) as low as 0.05% and 0.009%. Performance of approximate units is evaluated with a noisy image smoothing application, where the proposed models are capable of achieving higher PSNR than existing state of the art techniques. Fourth, approximation is applied in division architecture. Two approximation models are proposed for restoring divider. In the first design, approximation is performed at circuit level, where approximate divider cells are utilized in place of exact ones by simplifying the logic equations. In the second model, restoring divider is analyzed strategically and number of restoring divider cells are reduced by finding the portions of divisor and dividend with significant information. An approximation factor pp is used in both designs. In model 1, the design with p=8 has a 58% reduction in both area and power consumption compared to exact design, with a Q-MRED of 1.909 × 10-2 and Q-NMED of 0.449 × 10-2. The second model with an approximation factor p=4 has 54% area savings and 62% power savings compared to exact design. The proposed models are found to have better error metrics compared to existing designs, with better performance at similar error values. A change detection image processing application is used for real time assessment of proposed and existing approximate dividers and one of the models achieves a PSNR of 54.27 dB

    Low-complexity Multidimensional DCT Approximations

    Full text link
    In this paper, we introduce low-complexity multidimensional discrete cosine transform (DCT) approximations. Three dimensional DCT (3D DCT) approximations are formalized in terms of high-order tensor theory. The formulation is extended to higher dimensions with arbitrary lengths. Several multiplierless 8×8×88\times 8\times 8 approximate methods are proposed and the computational complexity is discussed for the general multidimensional case. The proposed methods complexity cost was assessed, presenting considerably lower arithmetic operations when compared with the exact 3D DCT. The proposed approximations were embedded into 3D DCT-based video coding scheme and a modified quantization step was introduced. The simulation results showed that the approximate 3D DCT coding methods offer almost identical output visual quality when compared with exact 3D DCT scheme. The proposed 3D approximations were also employed as a tool for visual tracking. The approximate 3D DCT-based proposed system performs similarly to the original exact 3D DCT-based method. In general, the suggested methods showed competitive performance at a considerably lower computational cost.Comment: 28 pages, 5 figures, 5 table
    corecore