58 research outputs found

    On The Design Of Low-Complexity High-Speed Arithmetic Circuits In Quantum-Dot Cellular Automata Nanotechnology

    Get PDF
    For the last four decades, the implementation of very large-scale integrated systems has largely based on complementary metal-oxide semiconductor (CMOS) technology. However, this technology has reached its physical limitations. Emerging nanoscale technologies such as quantum-dot cellular automata (QCA), single electron tunneling (SET), and tunneling phase logic (TPL) are major candidate for possible replacements of CMOS. These nanotechnologies use majority and/or minority logic and inverters as circuit primitives. In this dissertation, a comprehensive methodology for majority/minority logic networks synthesis is developed. This method is capable of processing any arbitrary multi-output Boolean function to nd its equivalent optimal majority logic network targeting to optimize either the number of gates or levels. The proposed method results in different primary equivalent majority expression networks. However, the most optimized network will be generated as a nal solution. The obtained results for 15 MCNC benchmark circuits show that when the number of majority gates is the rst optimization priority, there is an average reduction of 45.3% in the number of gates and 15.1% in the number of levels. They also show that when the rst priority is the number of levels, an average reduction of 23.5% in the number of levels and 43.1% in the number of gates is possible, compared to the majority AND/OR mapping method. These results are better compared to those obtained from the best existing methods. In this dissertation, our approach is to exploit QCA technology because of its capability to implement high-density, very high-speed switching and tremendously lowpower integrated systems and is more amenable to digital circuits design. In particular, we have developed algorithms for the QCA designs of various single- and multi-operation arithmetic arrays. Even though, majority/minority logic are the basic units in promising nanotechnologies, an XOR function can be constructed in QCA as a single device. The basic cells of the proposed arrays are developed based on the fundamental logic devices in QCA and a single-layer structure of the three-input XOR function. This process leads to QCA arithmetic circuits with better results in view of dierent aspects such as cell count, area, and latency, compared to their best counterparts. The proposed arrays can be formed in a pipeline manner to perform the arithmetic operations for any number of bits which could be quite valuable while considering the future design of large-scale QCA circuits

    Digital ADCs and ultra-wideband RF circuits for energy constrained wireless applications by Denis Clarke Daly.

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 173-183).Ongoing advances in semiconductor technology have enabled a multitude of portable, low power devices like cellular phones and wireless sensors. Most recently, as transistor device geometries reach the nanometer scale, transistor characteristics have changed so dramatically that many traditional circuits and architectures are no longer optimal and/or feasible. As a solution, much research has focused on developing 'highly digital' circuits and architectures that are tolerant of the increased leakage, variation and degraded voltage headrooms associated with advanced CMOS processes. This thesis presents several highly digital, mixed-signal circuits and architectures designed for energy constrained wireless applications. First, as a case study, a highly digital, voltage scalable flash ADC is presented. The flash ADC, implemented in 0.18 [mu]m CMOS, leverages redundancy and calibration to achieve robust operation at supply voltages from 0.2 V to 0.9 V. Next, the thesis expands in scope to describe a pulsed, noncoherent ultra-wideband transceiver chipset, implemented in 90 nm CMOS and operating in the 3-to-5 GHz band. The all-digital transmitter employs capacitive combining and pulse shaping in the power amplifier to meet the FCC spectral mask without any off-chip filters. The noncoherent receiver system-on-chip achieves both energy efficiency and high performance by employing simple amplifier and ADC structures combined with extensive digital calibration. Finally, the transceiver chipset is integrated in a complete system for wireless insect flight control.(cont.) Through the use of a flexible PCB and 3D die stacking, the total weight of the electronics is kept to 1 g, within the carrying capacity of an adult Manduca sexta moth. Preliminary wireless flight control of a moth in a wind tunnel is demonstrated.Ph.D

    Truncated Binary Multipliers with minimum Mean Square Error: analytical characterization, circuit implementation and applications

    Get PDF
    In the wireless multimedia word, DSP systems are ubiquitous. DSP algorithms are computationally intensive and test the limits of battery life in portable device such as cell phones, hearing aids, MP3 players, digital video recorders and so on. Multiplication and squaring are the main operation in many signal processing algorithms (filtering, convolution, FFT, DCT, euclidean distance etc.), hence efficient parallel multipliers are desirable. A full-width digital nxn bits multiplier computes the 2n bits output as a weighted sum of partial products. A multiplier with the output represented on n bits output is useful, as example, in DSP datapaths which saves the output in the same n bits registers of the input. Note that the truncated multipliers are useful not only for DSP but also for digital, computational intensive, ASICs where the bit-widths at the output of the arithmetic blocks are chosen on the basis of system-related accuracy issues. Hence 2n bits of precision at the multiplier output are very often more than required. A truncated multiplier is an nxn multiplier with n bits output. Since in a truncated multiplier the n less-significant bits of the full-width product are discarded, some of the partial products are removed and replaced by a suitable compensation function, to trade-off accuracy with hardware cost. Several techniques have been proposed in the Literature following this basic idea. The difference between the various circuits is in the choice and the implementation of the compensation circuit. The correction techniques proposed in the Literature are obtained through exhaustive search. This means that the results are only available for small n values and that the proposed approach are not extendable to greater bit widths. Furthermore the analytical characterization of the error is not possible. In this dissertation an innovative solution for the design and characterization of truncated multipliers is presented. The proposed circuits are based on the analytical calculation of the error of the truncated multiplier. This approach allows to have the description of a multiplier characterized by a minimum mean square error which gives a fast and low power VLSI implementation. Furthermore the analytical approach yields to a closed form expression of the mean square error and maximum absolute error for the proposed truncated multipliers. In this way the a priori knowledge of the output error is available. The errors are known for every bit width of the multiplier and it is also possible to decide, for a given bit width, which correction circuit has to be used in order to obtain a certain error. This analytical relation between the error and the parameters of hardware implementation is extremely important for the digital designer, since now it is possible to select the suitable implementation as a function of the desired accuracy. Proposed truncated multipliers overcome the previously proposed truncated multipliers since provide lower error, lower power dissipation, lower area occupation and also provide higher working frequency. The circuits are also easily implemented and allow an automatic HDL description as a function of bit width and desired error. The complete description of the errors for the truncated multipliers allows the use of these circuits as building blocks for more complex systems. It will be shown how the proposed multiplier can be used to design low area occupation FIR filters and an efficient PI temperature controller

    High sample-rate Givens rotations for recursive least squares

    Get PDF
    The design of an application-specific integrated circuit of a parallel array processor is considered for recursive least squares by QR decomposition using Givens rotations, applicable in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation, which, for this recursive algorithm, means that the time to perform arithmetic operations is critical. The algorithm, architecture and arithmetic are considered in a single integrated design procedure to achieve optimum results. A realisation approach using standard arithmetic operators, add, multiply and divide is adopted. The design of high-throughput operators with low delay is addressed for fixed- and floating-point number formats, and the application of redundant arithmetic considered. New redundant multiplier architectures are presented enabling reductions in area of up to 25%, whilst maintaining low delay. A technique is presented enabling the use of a conventional tree multiplier in recursive applications, allowing savings in area and delay. Two new divider architectures are presented showing benefits compared with the radix-2 modified SRT algorithm. Givens rotation algorithms are examined to determine their suitability for VLSI implementation. A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed enabling the sample-rate to be increased by a factor of approximately 6 and offering area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of 136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology. The enhanced SGR algorithm has been compared with a CORDIC approach and shown to benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation on a parallel array of general purpose (GP) DSP chips, it is estimated that a single application specific chip could offer up to 1,500 times the computation obtained from a single OP DSP chip

    Hardware/software optimizations for elliptic curve scalar multiplication on hybrid FPGAs

    Get PDF
    Elliptic curve cryptography (ECC) offers a viable alternative to Rivest-Shamir-Adleman (RSA) by delivering equivalent security with a smaller key size. This has several advantages, including smaller bandwidth demands, faster key exchange, and lower latency encryption and decryption. The fundamental operation for ECC is scalar point multiplication, wherein a point P on an elliptic curve defined over a finite field is multiplied by a scalar k. The complexity of this operation requires a hardware implementation to achieve high performance. The algorithms involved in scalar point multiplication are constantly evolving, incorporating the latest developments in number theory to improve computation time. These competing needs, high performance and flexibility, have caused previous implementations to either limit their adaptability or to incur performance losses. This thesis explores the use of a hybrid-FPGA for scalar point multiplication. A hybrid- FPGA contains a general purpose processor (GPP) in addition to reconfigurable fabric. This allows for a software/hardware co-design with low latency communication between the GPP and custom hardware. The elliptic curve operations and finite field inversion are programmed in C code. All other finite field arithmetic is implemented in the FPGA hardware, providing higher performance while retaining flexibility. The resulting implementation achieves speedups ranging from 24 times to 55 times faster than an optimized software implementation executing on a Pentium II workstation. The scalability of the design is investigated in two directions: faster finite field multiplication and increased instruction level parallelism exploitation. Increasing the number of parallel arithmetic units beyond two is shown to be less efficient than increasing the speed of the finite field multiplier

    QASMBench: A Low-level QASM Benchmark Suite for NISQ Evaluation and Simulation

    Full text link
    The rapid development of quantum computing (QC) in the NISQ era urgently demands a low-level benchmark suite and insightful evaluation metrics for characterizing the properties of prototype NISQ devices, the efficiency of QC programming compilers, schedulers and assemblers, and the capability of quantum simulators in a classical computer. In this work, we fill this gap by proposing a low-level, easy-to-use benchmark suite called QASMBench based on the OpenQASM assembly representation. It consolidates commonly used quantum routines and kernels from a variety of domains including chemistry, simulation, linear algebra, searching, optimization, arithmetic, machine learning, fault tolerance, cryptography, etc., trading-off between generality and usability. To analyze these kernels in terms of NISQ device execution, in addition to circuit width and depth, we propose four circuit metrics including gate density, retention lifespan, measurement density, and entanglement variance, to extract more insights about the execution efficiency, the susceptibility to NISQ error, and the potential gain from machine-specific optimizations. Most of the QASMBench application code can be launched and verified in IBM-Q directly. With the help from q-convert, QASMBench can be evaluated on various platforms and simulation environments. QASMBench is released at: http://github.com/pnnl/QASMBench

    Software-Defined Radio Technologies forGNSS Receivers: A Tutorial Approach to a SimpleDesign and Implementation

    Get PDF
    The field of satellite navigation has witnessed the advent of a number of new systems and technologies: after the landmark design and development of the Global Positioning System (GPS), a number of new independent Global Navigation Satellite Systems (GNSSs) were or are being developed all over the world: Russia's GLONASS, Europe's GALILEO, and China's BEIDOU-2, to mention a few. In this ever-changing context, the availability of reliable and flexible receivers is becoming a priority for a host of applications, including research, commercial, civil, and military. Flexible means here both easily upgradeable for future needs and/or on-the-fly reprogrammable to adapt to different signal formats. An effective approach to meet these design goals is the software-defined radio (SDR) paradigm. In the last few years, the availability of new processors with high computational power enabled the development of (fully) software receivers whose performance is comparable to or better than that of conventional hardware devices, while providing all the advantages of a flexible and fully configurable architecture. The aim of this tutorial paper is surveying the issue of the general architecture and design rules of a GNSS software receiver, through a comprehensive discussion of some techniques and algorithms, typically applied in simple PC-based receiver implementations
    corecore