12 research outputs found

    Methods for synthesis of multiple-input translinear element networks

    Get PDF
    Translinear circuits are circuits in which the exponential relationship between the output current and input voltage of a circuit element is exploited to realize various algebraic or differential equations. This thesis is concerned with a subclass of translinear circuits, in which the basic translinear element, called a multiple-input translinear element (MITE), has an output current that is exponentially related to a weighted sum of its input voltages. MITE networks can be used for the implementation of the same class of functions as traditional translinear circuits. The implementation of algebraic or (algebraic) differential equations using MITEs can be reduced to the implementation of the product-of-power-law (POPL) relationships, in which an output is given by the product of inputs raised to different powers. Hence, the synthesis of POPL relationships, and their optimization with respect to the relevant cost functions, is very important in the theory of MITE networks. In this thesis, different constraints on the topology of POPL networks that result in desirable system behavior are explored and different methods of synthesis, subject to these constraints, are developed. The constraints are usually conditions on certain matrices of the network, which characterize the weights in the relevant MITEs. Some of these constraints are related to the uniqueness of the operating point of the network and the stability of the network. Conditions that satisfy these constraints are developed in this work. The cost functions to be minimized are the number of MITEs and the number of input gates in each MITE. A complete solution to POPL network synthesis is presented here that minimizes the number of MITEs first and then minimizes the number of input gates to each MITE. A procedure for synthesizing POPL relationships optimally when the number of gates is minimal, i.e., 2, has also been developed here for the single--output case. A MITE structure that produces the maximum number of functions with minimal reconfigurability is developed for use in MITE field--programmable analog arrays. The extension of these constraints to the synthesis of linear filters is also explored, the constraint here being that the filter network should have a unique operating point in the presence of nonidealities. Synthesis examples presented here include nonlinear functions like the arctangent and the gaussian function which find application in analog implementations of particle filters. Synthesis of dynamical systems is presented here using the examples of a Lorenz system and a sinusoidal oscillator. The procedures developed here provide a structured way to automate the synthesis of nonlinear algebraic functions and differential equations using MITEs.Ph.D.Committee Chair: Anderson, David; Committee Member: Habetler, Thomas; Committee Member: Hasler, Paul; Committee Member: McClellan, James; Committee Member: Minch, Bradle

    Analog signal processing on a reconfigurable platform

    Get PDF
    The Cooperative Analog/Digital Signal Processing (CADSP) research group's approach to signal processing is to see what opportunities lie in adjusting the line between what is traditionally computed in digital and what can be done in analog. By allowing more computation to be done in analog, we can take advantage of its low power, continuous domain operation, and parallel capabilities. One setback keeping Analog Signal Processing (ASP) from achieving more wide-spread use, however, is its lack of programmability. The design cycle for a typical analog system often involves several iterations of the fabrication step, which is labor intensive, time consuming, and expensive. These costs in both time and money reduce the likelihood that engineers will consider an analog solution. With CADSP's development of a reconfigurable analog platform, a Field-Programmable Analog Array (FPAA), it has become much more practical for systems to incorporate processing in the analog domain. In this Thesis, I present an entire chain of tools that allow one to design simply at the system block level and then compile that design onto analog hardware. This tool chain uses the Simulink design environment and a custom library of blocks to create analog systems. I also present several of these ASP blocks, covering a broad range of functions from matrix computation to interfacing. In addition to these tools and blocks, the most recent FPAA architectures are discussed. These include the latest RASP general-purpose FPAAs as well as an adapted version geared toward high-speed applications.M.S.Committee Chair: Hasler, Paul; Committee Member: Anderson, David; Committee Member: Ghovanloo, Maysa

    On low-power analog implementations of particle filters for target tracking

    Get PDF
    We propose a low-power, analog and mixed-mode, implementation of particle filters. Low-power analog implementation of nonlinear functions such as exponential and arctangent functions is done using multiple-input translinear element (MITE) networks. These nonlinear functions are used to calculate the probability densities in the particle filter. A bearings-only tracking problem is simulated to present the proposed low-power implementation of the particle filter algorithm

    Adaptive Log Domain Filters Using Floating Gate Transistors

    Get PDF
    In this thesis, an adaptive first order lowpass log domain filter and an adaptive second order log domain filter are presented with integrated learning rules for model reference estimation. Both systems are implemented using multiple input floating gate transistors to realize on-line learning of system parameters. Adaptive dynamical system theory is used to derive robust control laws in a system identification task for the parameters of both a first order lowpass filter and a second order tunable filter. The log domain filters adapt to estimate the parameters of the reference filters accurately and efficiently as the parameters are changed. Simulation results for both the first order and the second order adaptive filters are presented which demonstrate that adaptation occurs within milliseconds. Experimental results and mismatch analysis are described for the first order lowpass filter which demonstrates the success of our adaptive system design using this model-based learning method

    Synthesis and analysis of nonlinear, analog, ultra low power, Bernoulli cell based CytoMimetic circuits for biocomputation

    Get PDF
    A novel class of analog BioElectronics is introduced for the systematic implementation of ultra-low power microelectronic circuits, able to compute nonlinear biological dynamics. This class of circuits is termed ``CytoMimetic Circuits'', in an attempt to highlight their actual function, which is mimicking biological responses, as observed experimentally. Inspired by the ingenious Bernoulli Cell Formalism (BCF), which was originally formulated for the modular synthesis and analysis of linear, time-invariant, high-dynamic range, logarithmic filters, a new, modified mathematical framework has been conceived, termed Nonlinear Bernoulli Cell Formalism (NBCF), which forms the core mathematical framework, characterising the operation of CytoMimetic circuits. The proposed nonlinear, transistor-level mathematical formulation exploits the striking similarities existing between the NBCF and coupled ordinary differential equations, typically appearing in models of naturally encountered biochemical systems. The resulting continuous-time, continuous-value, low-power CytoMimetic electronic circuits succeed in simulating with good accuracy cellular and molecular dynamics and found to be in very good agreement with their biological counterparts. They usually occupy an area of a fraction of a square millimetre, while consuming between hundreds of nanowatts and few tenths of microwatts of power. The systematic nature of the NBCF led to the transformation of a wide variety of biochemical reactions into nonlinear Log-domain circuits, which span a large area of different biological model types. Moreover, a detailed analysis of the robustness and performance of the proposed circuit class is also included in this thesis. The robustness examination has been conducted via post-layout simulations of an indicative CytoMimetic circuit and also by providing fabrication-related variability simulations, obtained by means of analog Monte Carlo statistical analysis for each one of the proposed circuit topologies. Furthermore, a detailed mathematical analysis that is carefully addressing the effect of process-parameters and MOSFET geometric properties upon subthreshold translinear circuits has been conducted for the fundamental translinear blocks, CytoMimetic topologies are comprised of. Finally, an interesting sub-category of Neuromorphic circuits, the ``Log-Domain Silicon Synapses'' is presented and representative circuits are thoroughly analysed by a novel, generalised BC operator framework. This leads to the conclusion that the BC operator consists the heart of such Log-domain circuits, therefore, allows the establishment of a general class of BC-based silicon synaptic circuits, which includes most of the synaptic circuits, implemented so far in Log-domain.Open Acces

    Efficient Implementation of Particle Filters in Application-Specific Instruction-Set Processor

    Get PDF
    RÉSUMÉ Cette thèse considère le problème de l’implémentation de filtres particulaires (particle filters PFs) dans des processeurs à jeu d’instructions spécialisé (Application-Specific Instruction-set Processors ASIPs). Considérant la diversité et la complexité des PFs, leur implémentation requiert une grande efficacité dans les calculs et de la flexibilité dans leur conception. La conception de ASIPs peut se faire avec un niveau intéressant de flexibilité. Notre recherche se concentre donc sur l’amélioration du débit des PFs dans un environnement de conception de ASIP. Une approche générale est tout d’abord proposée pour caractériser la complexité computationnelle des PFs. Puisque les PFs peuvent être utilisés dans une vaste gamme d’applications, nous utilisons deux types de blocs afin de distinguer les propriétés des PFs. Le premier type est spécifique à l’application et le deuxième type est spécifique à l’algorithme. Selon les résultats de profilage, nous avons identifié que les blocs du calcul de la probabilité et du rééchantillonnage sont les goulots d’étranglement principaux des blocs spécifiques à l’algorithme. Nous explorons l’optimisation de ces deux blocs aux niveaux algorithmique et architectural. Le niveau algorithmique offre un grand potentiel d’accélération et d’amélioration du débit. Notre travail débute donc à ce niveau par l’analyse de la complexité des blocs du calcul de la probabilité et du rééchantillonnage, puis continue avec leur simplification et modification. Nous avons simplifié le bloc du calcul de la probabilité en proposant un mécanisme de quantification uniforme, l’algorithme UQLE. Les résultats démontrent une amélioration significative d’une implémentation logicielle, sans perte de précision. Le pire cas de l’algorithme UQLE implémenté en logiciel à virgule fixe avec 32 niveaux de quantification atteint une accélération moyenne de 23.7× par rapport à l’implémentation logicielle de l’algorithme ELE. Nous proposons aussi deux nouveaux algorithmes de rééchantillonnage pour remplacer l’algorithme séquentiel de rééchantillonnage systématique (SR) dans les PFs. Ce sont l’algorithme SR reformulé et l’algorithme SR parallèle (PSR). L’algorithme SR reformulé combine un groupe de boucles en une boucle unique afin de faciliter sa parallélisation dans un ASIP. L’algorithme PSR rend les itérations indépendantes, permettant ainsi à l’algorithme de rééchantillonnage de s’exécuter en parallèle. De plus, l’algorithme PSR a une complexité computationnelle plus faible que l’algorithme SR. Du point de vue architectural, les ASIPs offrent un grand potentiel pour l’implémentation de PFs parce qu’ils présentent un bon équilibre entre l’efficacité computationnelle et la flexibilité de conception. Ils permettent des améliorations considérables en débit par l’inclusion d’instructions spécialisées, tout en conservant la facilité relative de programmation de processeurs à usage général. Après avoir identifié les goulots d’étranglement de PFs dans les blocs spécifiques à l’algorithme, nous avons généré des instructions spécialisées pour les algorithmes UQLE, SR reformulé et PSR. Le débit a été significativement amélioré par rapport à une implémentation purement logicielle tournant sur un processeur à usage général. L’implémentation de l’algorithme UQLE avec instruction spécialisée avec 32 intervalles atteint une accélération de 34× par rapport au pire cas de son implémentation logicielle, avec 3.75 K portes logiques additionnelles. Nous avons produit une implémentation de l’algorithme SR reformulé, avec quatre poids calculés en parallèle et huit catégories définies par des bornes uniformément distribuées qui sont comparées simultanément. Elle atteint une accélération de 23.9× par rapport à l’algorithme SR séquentiel dans un processeur à usage général. Le surcoût est limité à 54 K portes logiques additionnelles. Pour l’algorithme PSR, nous avons conçu quatre instructions spécialisées configurées pour supporter quatre poids entrés en parallèle. Elles mènent à une accélération de 53.4× par rapport à une implémentation de l’algorithme SR en virgule flottante sur un processeur à usage général, avec un surcoût de 47.3 K portes logiques additionnelles. Finalement, nous avons considéré une application du suivi vidéo et implémenté dans un ASIP un algorithme de FP basé sur un histogramme. Nous avons identifié le calcul de l’histogramme comme étant le goulot principal des blocs spécifiques à l’application. Nous avons donc proposé une architecture de calcul d’histogramme à réseau parallèle (PAHA) pour ASIPs. Les résultats d’implémentation démontrent qu’un PAHA à 16 voies atteint une accélération de 43.75× par rapport à une implémentation logicielle sur un processeur à usage général.----------ABSTRACT This thesis considers the problem of the implementation of particle filters (PFs) in Application-Specific Instruction-set Processors (ASIPs). Due to the diversity and complexity of PFs, implementing them requires both computational efficiency and design flexibility. ASIP design can offer an interesting degree of design flexibility. Hence, our research focuses on improving the throughput of PFs in this flexible ASIP design environment. A general approach is first proposed to characterize the computational complexity of PFs. Since PFs can be used for a wide variety of applications, we employ two types of blocks, which are application-specific and algorithm-specific, to distinguish the properties of PFs. In accordance with profiling results, we identify likelihood processing and resampling processing blocks as the main bottlenecks in the algorithm-specific blocks. We explore the optimization of these two blocks at the algorithmic and architectural levels. The algorithmic level is at a high level and therefore has a high potential to offer speed and throughput improvements. Hence, in this work we begin at the algorithm level by analyzing the complexity of the likelihood processing and resampling processing blocks, then proceed with their simplification and modification. We simplify the likelihood processing block by proposing a uniform quantization scheme, the Uniform Quantization Likelihood Evaluation (UQLE). The results show a significant improvement in performance without losing accuracy. The worst case of UQLE software implementation in fixed-point arithmetic with 32 quantized intervals achieves 23.7× average speedup over the software implementation of ELE. We also propose two novel resampling algorithms instead of the sequential Systematic Resampling (SR) algorithm in PFs. They are the reformulated SR and Parallel Systematic Resampling (PSR) algorithms. The reformulated SR algorithm combines a group of loops into a parallel loop to facilitate parallel implementation in an ASIP. The PSR algorithm makes the iterations independent, thus allowing the resampling algorithms to perform loop iterations in parallel. In addition, our proposed PSR algorithm has lower computational complexity than the SR algorithm. At the architecture level, ASIPs are appealing for the implementation of PFs because they strike a good balance between computational efficiency and design flexibility. They can provide considerable throughput improvement by the inclusion of custom instructions, while retaining the ease of programming of general-purpose processors. Hence, after identifying the bottlenecks of PFs in the algorithm-specific blocks, we describe customized instructions for the UQLE, reformulated SR, and PSR algorithms in an ASIP. These instructions provide significantly higher throughput when compared to a pure software implementation running on a general-purpose processor. The custom instruction implementation of UQLE with 32 intervals achieves 34× speedup over the worst case of its software implementation with 3.75 K additional gates. An implementation of the reformulated SR algorithm is evaluated with four weights calculated in parallel and eight categories defined by uniformly distributed numbers that are compared simultaneously. It achieves a 23.9× speedup over the sequential SR algorithm in a general-purpose processor. This comes at a cost of only 54 K additional gates. For the PSR algorithm, four custom instructions, when configured to support four weights input in parallel, lead to a 53.4× speedup over the floating-point SR implementation on a general-purpose processor at a cost of 47.3 K additional gates. Finally, we consider the specific application of video tracking, and an implementation of a histogram-based PF in an ASIP. We identify that the histogram calculation is the main bottleneck in the application-specific blocks. We therefore propose a Parallel Array Histogram Architecture (PAHA) engine for accelerating the histogram calculation in ASIPs. Implementation results show that a 16-way PAHA can achieve a speedup of 43.75× when compared to its software implementation in a general-purpose processor

    Potential and Challenges of Analog Reconfigurable Computation in Modern and Future CMOS

    Get PDF
    In this work, the feasibility of the floating-gate technology in analog computing platforms in a scaled down general-purpose CMOS technology is considered. When the technology is scaled down the performance of analog circuits tends to get worse because the process parameters are optimized for digital transistors and the scaling involves the reduction of supply voltages. Generally, the challenge in analog circuit design is that all salient design metrics such as power, area, bandwidth and accuracy are interrelated. Furthermore, poor flexibility, i.e. lack of reconfigurability, the reuse of IP etc., can be considered the most severe weakness of analog hardware. On this account, digital calibration schemes are often required for improved performance or yield enhancement, whereas high flexibility/reconfigurability can not be easily achieved. Here, it is discussed whether it is possible to work around these obstacles by using floating-gate transistors (FGTs), and analyze problems associated with the practical implementation. FGT technology is attractive because it is electrically programmable and also features a charge-based built-in non-volatile memory. Apart from being ideal for canceling the circuit non-idealities due to process variations, the FGTs can also be used as computational or adaptive elements in analog circuits. The nominal gate oxide thickness in the deep sub-micron (DSM) processes is too thin to support robust charge retention and consequently the FGT becomes leaky. In principle, non-leaky FGTs can be implemented in a scaled down process without any special masks by using “double”-oxide transistors intended for providing devices that operate with higher supply voltages than general purpose devices. However, in practice the technology scaling poses several challenges which are addressed in this thesis. To provide a sufficiently wide-ranging survey, six prototype chips with varying complexity were implemented in four different DSM process nodes and investigated from this perspective. The focus is on non-leaky FGTs, but the presented autozeroing floating-gate amplifier (AFGA) demonstrates that leaky FGTs may also find a use. The simplest test structures contain only a few transistors, whereas the most complex experimental chip is an implementation of a spiking neural network (SNN) which comprises thousands of active and passive devices. More precisely, it is a fully connected (256 FGT synapses) two-layer spiking neural network (SNN), where the adaptive properties of FGT are taken advantage of. A compact realization of Spike Timing Dependent Plasticity (STDP) within the SNN is one of the key contributions of this thesis. Finally, the considerations in this thesis extend beyond CMOS to emerging nanodevices. To this end, one promising emerging nanoscale circuit element - memristor - is reviewed and its applicability for analog processing is considered. Furthermore, it is discussed how the FGT technology can be used to prototype computation paradigms compatible with these emerging two-terminal nanoscale devices in a mature and widely available CMOS technology.Siirretty Doriast

    ENABLING HARDWARE TECHNOLOGIES FOR AUTONOMY IN TINY ROBOTS: CONTROL, INTEGRATION, ACTUATION

    Get PDF
    The last two decades have seen many exciting examples of tiny robots from a few cm3 to less than one cm3. Although individually limited, a large group of these robots has the potential to work cooperatively and accomplish complex tasks. Two examples from nature that exhibit this type of cooperation are ant and bee colonies. They have the potential to assist in applications like search and rescue, military scouting, infrastructure and equipment monitoring, nano-manufacture, and possibly medicine. Most of these applications require the high level of autonomy that has been demonstrated by large robotic platforms, such as the iRobot and Honda ASIMO. However, when robot size shrinks down, current approaches to achieve the necessary functions are no longer valid. This work focused on challenges associated with the electronics and fabrication. We addressed three major technical hurdles inherent to current approaches: 1) difficulty of compact integration; 2) need for real-time and power-efficient computations; 3) unavailability of commercial tiny actuators and motion mechanisms. The aim of this work was to provide enabling hardware technologies to achieve autonomy in tiny robots. We proposed a decentralized application-specific integrated circuit (ASIC) where each component is responsible for its own operation and autonomy to the greatest extent possible. The ASIC consists of electronics modules for the fundamental functions required to fulfill the desired autonomy: actuation, control, power supply, and sensing. The actuators and mechanisms could potentially be post-fabricated on the ASIC directly. This design makes for a modular architecture. The following components were shown to work in physical implementations or simulations: 1) a tunable motion controller for ultralow frequency actuation; 2) a nonvolatile memory and programming circuit to achieve automatic and one-time programming; 3) a high-voltage circuit with the highest reported breakdown voltage in standard 0.5 ÎĽm CMOS; 4) thermal actuators fabricated using CMOS compatible process; 5) a low-power mixed-signal computational architecture for robotic dynamics simulator; 6) a frequency-boost technique to achieve low jitter in ring oscillators. These contributions will be generally enabling for other systems with strict size and power constraints such as wireless sensor nodes

    Low-Power and Programmable Analog Circuitry for Wireless Sensors

    Get PDF
    Embedding networks of secure, wirelessly-connected sensors and actuators will help us to conscientiously manage our local and extended environments. One major challenge for this vision is to create networks of wireless sensor devices that provide maximal knowledge of their environment while using only the energy that is available within that environment. In this work, it is argued that the energy constraints in wireless sensor design are best addressed by incorporating analog signal processors. The low power-consumption of an analog signal processor allows persistent monitoring of multiple sensors while the device\u27s analog-to-digital converter, microcontroller, and transceiver are all in sleep mode. This dissertation describes the development of analog signal processing integrated circuits for wireless sensor networks. Specific technology problems that are addressed include reconfigurable processing architectures for low-power sensing applications, as well as the development of reprogrammable biasing for analog circuits
    corecore