3,655 research outputs found
Self-Scaling Evolution of Analog Computation Circuits
Energy and performance improvements of continuous-time analog-based computation for selected applications offer an avenue to continue improving the computational ability of tomorrow*s electronic devices at current technology scaling limits. However, analog computation is plagued by the difficulty of designing complex computational circuits, programmability, as well as the inherent lack of accuracy and precision when compared to digital implementations. In this thesis, evolutionary algorithm-based techniques are utilized within a reconfigurable analog fabric to realize an automated method of designing analog-based computational circuits while adapting the functional range to improve performance. A Self-Scaling Genetic Algorithm is proposed to adapt solutions to computationally-tractable ranges in hardware-constrained analog reconfigurable fabrics. It operates by utilizing a Particle Swarm Optimization (PSO) algorithm that operates synergistically with a Genetic Algorithm (GA) to adaptively scale and translate the functional range of computational circuits composed of high-level or low-level Computational Analog Elements to improve performance and realize functionality otherwise unobtainable on the intrinsic platform. The technique is demonstrated by evolving square, square-root, cube, and cube-root analog computational circuits on the Cypress PSoC-5LP System-on-Chip. Results indicate that the Self-Scaling Genetic Algorithm improves our error metric on average 7.18-fold, up to 12.92-fold for computational circuits that produce outputs beyond device range. Results were also favorable compared to previous works, which utilized extrinsic evolution of circuits with much greater complexity than was possible on the PSoC-5LP
Recommended from our members
Array Architectures and Physical Layer Design for Millimeter-Wave Communications Beyond 5G
Ever increasing demands in mobile data rates have resulted in exploration of millimeter-wave (mmW) frequencies for the next generation (5G) wireless networks. Communications at mmW frequencies is presented with two keys challenges. Firstly, high propagation loss requires base stations (BSs) and user equipment (UEs) to use a large number of antennas and narrow beams to close the link with sufficient received signal power. Consequently, communications using narrow beams create a new challenge in channel estimation and link establishment based on fine angular probing. Current mmW system use analog phased arrays that can probe only one angle at the time which results in high latency during link establishment and channel tracking. It is desirable to design low latency beam training by exploring both physical layer designs and array architectures that could replace current 5G approaches and pave the way to the communications for frequency bands in higher mmW band and sub-THz region where larger antenna arrays and communications bandwidth can be exploited. To this end, we propose a novel signal processing techniques exploiting unique properties of mmW channel, and show both theoretically, in simulation and experiments its advantages over conventional approaches. Secondly, we explore different array architecture design and analyze their trade-offs between spectral efficiency and power consumption and area. For comprehensive comparison, we have developed a methodology for optimal design of system parameters for different array architecture candidates based on the spectral efficiency target, and use these parameters to estimate the array area and power consumption based on the circuits reported in the literature. We show that the hybrid analog and digital architectures have severe scalability concerns in radio frequency signal distribution with increased array size and spatial multiplexing levels, while the fully-digital array architectures have the best performance and power/area trade-offs.The developed approaches are based on a cross-disciplinary research that combines innovation in model based signal processing, machine learning, and radio hardware. This work is the first to apply compressive sensing (CS), a signal processing tool that exploits sparsity of mmW channel model, to accelerate beam training of mmW cellular system. The algorithm is designed to address practical issues including the requirement of cell discovery and synchronization that involves estimation of angular channel together with carrier frequency offset and timing offsets. We have analyzed the algorithm performance in the 5G compliant simulation and showed that an order of magnitude saving is achieved in initial access latency for the desired channel estimation accuracy. Moreover, we are the first to develop and implement a neural network assisted compressive beam alignment to deal with hardware impairments in mmW radios. We have used 60GHz mmW testbed to perform experiments and show that neural networks approach enhances alignment rate compared to CS. To further accelerate beam training, we proposed a novel frequency selective probing beams using the true-time-delay (TTD) analog array architecture. Our approach utilizes different subcarriers to scan different directions, and achieves a single-shot beam alignment, the fastest approach reported to date. Our comprehensive analysis of different array architectures and exploration of emerging architectures enabled us to develop an order of magnitude faster and energy efficient approaches for initial access and channel estimation in mmW systems
Recommended from our members
Hybrid Analog-Digital Co-Processing for Scientific Computation
In the past 10 years computer architecture research has moved to more heterogeneity and less adherence to conventional abstractions. Scientists and engineers hold an unshakable belief that computing holds keys to unlocking humanity's Grand Challenges. Acting on that belief they have looked deeper into computer architecture to find specialized support for their applications. Likewise, computer architects have looked deeper into circuits and devices in search of untapped performance and efficiency. The lines between computer architecture layers---applications, algorithms, architectures, microarchitectures, circuits and devices---have blurred. Against this backdrop, a menagerie of computer architectures are on the horizon, ones that forgo basic assumptions about computer hardware, and require new thinking of how such hardware supports problems and algorithms.
This thesis is about revisiting hybrid analog-digital computing in support of diverse modern workloads. Hybrid computing had extensive applications in early computing history, and has been revisited for small-scale applications in embedded systems. But architectural support for using hybrid computing in modern workloads, at scale and with high accuracy solutions, has been lacking.
I demonstrate solving a variety of scientific computing problems, including stochastic ODEs, partial differential equations, linear algebra, and nonlinear systems of equations, as case studies in hybrid computing. I solve these problems on a system of multiple prototype analog accelerator chips built by a team at Columbia University. On that team I made contributions toward programming the chips, building the digital interface, and validating the chips' functionality. The analog accelerator chip is intended for use in conjunction with a conventional digital host computer.
The appeal and motivation for using an analog accelerator is efficiency and performance, but it comes with limitations in accuracy and problem sizes that we have to work around.
The first problem is how to do problems in this unconventional computation model. Scientific computing phrases problems as differential equations and algebraic equations. Differential equations are a continuous view of the world, while algebraic equations are a discrete one. Prior work in analog computing mostly focused on differential equations; algebraic equations played a minor role in prior work in analog computing. The secret to using the analog accelerator to support modern workloads on conventional computers is that these two viewpoints are interchangeable. The algebraic equations that underlie most workloads can be solved as differential equations,
and differential equations are naturally solvable in the analog accelerator chip. A hybrid analog-digital computer architecture can focus on solving linear and nonlinear algebra problems to support many workloads.
The second problem is how to get accurate solutions using hybrid analog-digital computing. The reason that the analog computation model gives less accurate solutions is it gives up representing numbers as digital binary numbers, and instead uses the full range of analog voltage and current to represent real numbers. Prior work has established that encoding data in analog signals gives an energy efficiency advantage as long as the analog data precision is limited. While the analog accelerator alone may be useful for energy-constrained applications where inputs and outputs are imprecise, we are more interested in using analog in conjunction with digital for precise solutions. This thesis gives novel insight that the trick to do so is to solve nonlinear problems where low-precision guesses are useful for conventional digital algorithms.
The third problem is how to solve large problems using hybrid analog-digital computing. The reason the analog computation model can't handle large problems is it gives up step-by-step discrete-time operation, instead allowing variables to evolve smoothly in continuous time. To make that happen the analog accelerator works by chaining hardware for mathematical operations end-to-end. During computation analog data flows through the hardware with no overheads in control logic and memory accesses. The downside is then the needed hardware size grows alongside problem sizes. While scientific computing researchers have for a long time split large problems into smaller subproblems to fit in digital computer constraints, this thesis is a first attempt to consider these divide-and-conquer algorithms as an essential tool in using the analog model of computation.
As we enter the post-Moore’s law era of computing, unconventional architectures will offer specialized models of computation that uniquely support specific problem types. Two prominent examples are deep neural networks and quantum computers. Recent trends in computer science research show these unconventional architectures will soon have broad adoption. In this thesis I show another specialized, unconventional architecture is to use analog accelerators to solve problems in scientific computing. Computer architecture researchers will discover other important models of computation in the future. This thesis is an example of the discovery process, implementation, and evaluation of how an unconventional architecture supports specialized workloads
MFPA: Mixed-Signal Field Programmable Array for Energy-Aware Compressive Signal Processing
Compressive Sensing (CS) is a signal processing technique which reduces the number of samples taken per frame to decrease energy, storage, and data transmission overheads, as well as reducing time taken for data acquisition in time-critical applications. The tradeoff in such an approach is increased complexity of signal reconstruction. While several algorithms have been developed for CS signal reconstruction, hardware implementation of these algorithms is still an area of active research. Prior work has sought to utilize parallelism available in reconstruction algorithms to minimize hardware overheads; however, such approaches are limited by the underlying limitations in CMOS technology. Herein, the MFPA (Mixed-signal Field Programmable Array) approach is presented as a hybrid spin-CMOS reconfigurable fabric specifically designed for implementation of CS data sampling and signal reconstruction. The resulting fabric consists of 1) slice-organized analog blocks providing amplifiers, transistors, capacitors, and Magnetic Tunnel Junctions (MTJs) which are configurable to achieving square/square root operations required for calculating vector norms, 2) digital functional blocks which feature 6-input clockless lookup tables for computation of matrix inverse, and 3) an MRAM-based nonvolatile crossbar array for carrying out low-energy matrix-vector multiplication operations. The various functional blocks are connected via a global interconnect and spin-based analog-to-digital converters. Simulation results demonstrate significant energy and area benefits compared to equivalent CMOS digital implementations for each of the functional blocks used: this includes an 80% reduction in energy and 97% reduction in transistor count for the nonvolatile crossbar array, 80% standby power reduction and 25% reduced area footprint for the clockless lookup tables, and roughly 97% reduction in transistor count for a multiplier built using components from the analog blocks. Moreover, the proposed fabric yields 77% energy reduction compared to CMOS when used to implement CS reconstruction, in addition to latency improvements
Addressing the Smart Systems Design Challenge: The SMAC Platform
This article presents the concepts, the organization, and the preliminary application results of SMAC, a smart systems co-design platform. The SMAC platform, which has been developed as Integrated Project (IP) of the 7th ICT Call under the Objective 3.2 \u201cSmart components and Smart Systems integration\u201d addresses the challenges of the integration of heterogeneous and conflicting domains that emerge in the design of smart systems. SMAC includes methodologies and EDA tools enabling multi-disciplinary and multi-scale modelling and design, simulation of multidomain systems, subsystems and components at different levels of abstraction, system integration and exploration for optimization of functional and non-functional metrics. The article presents the preliminary results obtained by adopting the SMAC platform for the design of a limb tracking smart system
VLSI Design
This book provides some recent advances in design nanometer VLSI chips. The selected topics try to present some open problems and challenges with important topics ranging from design tools, new post-silicon devices, GPU-based parallel computing, emerging 3D integration, and antenna design. The book consists of two parts, with chapters such as: VLSI design for multi-sensor smart systems on a chip, Three-dimensional integrated circuits design for thousand-core processors, Parallel symbolic analysis of large analog circuits on GPU platforms, Algorithms for CAD tools VLSI design, A multilevel memetic algorithm for large SAT-encoded problems, etc
Thermodynamic Computing
The hardware and software foundations laid in the first half of the 20th
Century enabled the computing technologies that have transformed the world, but
these foundations are now under siege. The current computing paradigm, which is
the foundation of much of the current standards of living that we now enjoy,
faces fundamental limitations that are evident from several perspectives. In
terms of hardware, devices have become so small that we are struggling to
eliminate the effects of thermodynamic fluctuations, which are unavoidable at
the nanometer scale. In terms of software, our ability to imagine and program
effective computational abstractions and implementations are clearly challenged
in complex domains. In terms of systems, currently five percent of the power
generated in the US is used to run computing systems - this astonishing figure
is neither ecologically sustainable nor economically scalable. Economically,
the cost of building next-generation semiconductor fabrication plants has
soared past $10 billion. All of these difficulties - device scaling, software
complexity, adaptability, energy consumption, and fabrication economics -
indicate that the current computing paradigm has matured and that continued
improvements along this path will be limited. If technological progress is to
continue and corresponding social and economic benefits are to continue to
accrue, computing must become much more capable, energy efficient, and
affordable. We propose that progress in computing can continue under a united,
physically grounded, computational paradigm centered on thermodynamics. Herein
we propose a research agenda to extend these thermodynamic foundations into
complex, non-equilibrium, self-organizing systems and apply them holistically
to future computing systems that will harness nature's innate computational
capacity. We call this type of computing "Thermodynamic Computing" or TC.Comment: A Computing Community Consortium (CCC) workshop report, 36 page
Time-domain optimization of amplifiers based on distributed genetic algorithms
Thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Electrical and Computer EngineeringThe work presented in this thesis addresses the task of circuit optimization, helping the designer facing the high performance and high efficiency circuits demands of the market and technology evolution. A novel framework is introduced, based on time-domain analysis, genetic algorithm optimization, and distributed processing.
The time-domain optimization methodology is based on the step response of the amplifier. The main advantage of this new time-domain methodology is that, when a given settling-error is reached within the desired settling-time, it is automatically guaranteed that the amplifier has enough open-loop gain, AOL, output-swing (OS), slew-rate (SR), closed loop bandwidth and closed loop stability. Thus, this simplification of the circuit‟s evaluation helps the optimization process to converge faster. The method used to calculate the step response expression of the circuit is based on the inverse Laplace transform applied to the transfer function, symbolically, multiplied by 1/s (which represents the unity input step). Furthermore, may be applied to transfer functions of circuits with unlimited number of zeros/poles, without approximation in order to keep accuracy. Thus, complex circuit, with several design/optimization degrees of freedom can also be considered. The expression of the step response, from the proposed methodology, is based on the DC bias operating point of the devices of the circuit. For this, complex and accurate device models (e.g. BSIM3v3) are integrated. During the optimization process, the time-domain evaluation of the amplifier is used by the genetic algorithm, in the classification of the genetic individuals. The time-domain evaluator is integrated into the developed optimization platform, as independent library, coded using C programming language.
The genetic algorithms have demonstrated to be a good approach for optimization since they are flexible and independent from the optimization-objective. Different levels of abstraction can be optimized either system level or circuit level. Optimization of any new block is basically carried-out by simply providing additional configuration files, e.g. chromosome format, in text format; and the circuit library where the fitness value of each individual of the genetic algorithm is computed.
Distributed processing is also employed to address the increasing processing time demanded by the complex circuit analysis, and the accurate models of the circuit devices. The communication by remote processing nodes is based on Message Passing interface (MPI). It is demonstrated that the distributed processing reduced the optimization run-time by more than one order of magnitude.
Platform assessment is carried by several examples of two-stage amplifiers, which have been optimized and successfully used, embedded, in larger systems, such as data converters. A dedicated example of an inverter-based self-biased two-stage amplifier has been designed, laid-out and fabricated as a stand-alone circuit and experimentally evaluated. The measured results are a direct demonstration of the effectiveness of the proposed time-domain optimization methodology.Portuguese Foundation for the Science and Technology (FCT
- …