70 research outputs found

    Toward a formal theory for computing machines made out of whatever physics offers: extended version

    Full text link
    Approaching limitations of digital computing technologies have spurred research in neuromorphic and other unconventional approaches to computing. Here we argue that if we want to systematically engineer computing systems that are based on unconventional physical effects, we need guidance from a formal theory that is different from the symbolic-algorithmic theory of today's computer science textbooks. We propose a general strategy for developing such a theory, and within that general view, a specific approach that we call "fluent computing". In contrast to Turing, who modeled computing processes from a top-down perspective as symbolic reasoning, we adopt the scientific paradigm of physics and model physical computing systems bottom-up by formalizing what can ultimately be measured in any physical substrate. This leads to an understanding of computing as the structuring of processes, while classical models of computing systems describe the processing of structures.Comment: 76 pages. This is an extended version of a perspective article with the same title that will appear in Nature Communications soon after this manuscript goes public on arxi

    Ultrafast Microfluidic Immunoassays Towards Real-time Intervention of Cytokine Storms

    Full text link
    Biomarker-guided precision medicine holds great promise to provide personalized therapy with a good understanding of the molecular or cellular data of an individual patient. However, implementing this approach in critical care uniquely faces enormous challenges as it requires obtaining “real-time” data with high sensitivity, reliability, and multiplex capacity near the patient’s bedside in the quickly evolving illness. Current immunodiagnostic platforms generally compromise assay sensitivity and specificity for speed or face significantly increased complexity and cost for highly multiplexed detection with low sample volume. This thesis introduces two novel ultrafast immunoassay platforms: one is a machine learning-based digital molecular counting assay, and the other is a label-free nano-plasmonic sensor integrated with an electrokinetic mixer. Both of them incorporate microfluidic approaches to pave the way for near-real-time interventions of cytokine storms. In the first part of the thesis, we present an innovative concept and the theoretical study that enables ultrafast measurement of multiple protein biomarkers (<1 min assay incubation) with comparable sensitivity to the gold standard ELISA method. The approach, which we term “pre-equilibrium digital enzyme-linked immunosorbent assay” (PEdELISA) incorporates the single-molecular counting of proteins at the early, pre-equilibrium state to achieve the combination of high speed and sensitivity. We experimentally demonstrated the assay’s application in near-real-time monitoring of patients receiving chimeric antigen receptor (CAR) T-cell therapy and for longitudinal serum cytokine measurements in a mouse sepsis model. In the second part, we report the further development of a machine learning-based PEdELISA microarray data analysis approach with a significantly extended multiplex capacity using the spatial-spectral microfluidic encoding technique. This unique approach, together with a convolutional neural network-based image analysis algorithm, remarkably reduced errors faced by the highly multiplexed digital immunoassay at low analyte concentrations. As a result, we demonstrated the longitudinal data collection of 14 serum cytokines in human patients receiving CAR-T cell therapy at concentrations < 10pg/mL with a sample volume < 10 µL and 5-min assay incubation. In the third part, we demonstrate the clinical application of a machine learning-based digital protein microarray platform for rapid multiplex quantification of cytokines from critically ill COVID-19 patients admitted to the intensive care unit. The platform comprises two low-cost modules: (i) a semi-automated fluidic dispensing module that can be operated inside a biosafety cabinet to minimize the exposure of technician to the virus infection and (ii) a compact fluorescence optical scanner for the potential near-bedside readout. The automated system has achieved high interassay precision (~10% CV) with high sensitivity (<0.4pg/mL). Our data revealed large subject-to-subject variability in patient responses to anti-inflammatory treatment for COVID-19, reaffirming the need for a personalized strategy guided by rapid cytokine assays. Lastly, an AC electroosmosis-enhanced localized surface plasmon resonance (ACE-LSPR) biosensing device was presented for rapid analysis of cytokine IL-1β among sepsis patients. The ACE-LSPR device is constructed using both bottom-up and top-down sensor fabrication methods, allowing the seamless integration of antibody-conjugated gold nanorod (AuNR) biosensor arrays with microelectrodes on the same microfluidic platform. Applying an AC voltage to microelectrodes while scanning the scattering light intensity variation of the AuNR biosensors results in significantly enhanced biosensing performance. The technologies developed have enabled new capabilities with broad application to advance precision medicine of life-threatening acute illnesses in critical care, which potentially will allow the clinical team to make individualized treatment decisions based on a set of time-resolved biomarker signatures.PHDMechanical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163129/1/yujing_1.pd

    Applications of Emerging Memory in Modern Computer Systems: Storage and Acceleration

    Get PDF
    In recent year, heterogeneous architecture emerges as a promising technology to conquer the constraints in homogeneous multi-core architecture, such as supply voltage scaling, off-chip communication bandwidth, and application parallelism. Various forms of accelerators, e.g., GPU and ASIC, have been extensively studied for their tradeoffs between computation efficiency and adaptivity. But with the increasing demand of the capacity and the technology scaling, accelerators also face limitations on cost-efficiency due to the use of traditional memory technologies and architecture design. Emerging memory has become a promising memory technology to inspire some new designs by replacing traditional memory technologies in modern computer system. In this dissertation, I will first summarize my research on the application of Spin-transfer torque random access memory (STT-RAM) in GPU memory hierarchy, which offers simple cell structure and non-volatility to enable much smaller cell area than SRAM and almost zero standby power. Then I will introduce my research about memristor implementation as the computation component in the neuromorphic computing accelerator, which has the similarity between the programmable resistance state of memristors and the variable synaptic strengths of biological synapses to simplify the realization of neural network model. At last, a dedicated interconnection network design for multicore neuromorphic computing system will be presented to reduce the prominent average latency and power consumption brought by NoC in a large size neuromorphic computing system

    Double-gate single electron transistor : modeling, design & evaluation of logic architectures

    Get PDF
    Dans les années à venir, l'industrie de la microélectronique doit développer de nouvelles filières technologiques qui pourront devenir des successeurs ou des compléments de la technologie CMOS ultime. Parmi ces technologies émergentes relevant du domaine « Beyond CMOS », ce travail de recherche porte sur les transistors mono-électroniques (SET) dont le fonctionnement est basé sur la quantification de la charge électrique, le transport quantique et la répulsion Coulombienne. Les SETs doivent être étudiés à trois niveaux : composants, circuits et système. Ces nouveaux composants, utilisent à leur profit le phénomène dit de blocage de Coulomb permettant le transit des électrons de manière séquentielle, afin de contrôler très précisément le courant véhiculé. En effet, l'émergence du caractère granulaire de la charge électrique dans le transport des électrons par effet tunnel, permet d'envisager la réalisation de remplaçants potentiels des transistors ou de cellules mémoire à haute densité d'intégration, basse consommation. L'objectif principal de ce travail de thèse est d'explorer et d'évaluer le potentiel des transistors mono-électroniques double-grille métalliques (DG-SETs) pour les circuits logiques numériques. De ce fait, les travaux de recherches proposés sont divisés en trois parties : i) le développement des outils de simulation et tout particulièrement un modèle analytique de DG-SET ; ii) la conception de circuits numériques à base de DG-SETs dans une approche « cellules standards » ; et iii) l'exploration d'architectures logiques versatiles à base de DG-SETs en exploitant la double-grille du dispositif. Un modèle analytique pour les DG-SETs métalliques fonctionnant à température ambiante et au-delà est présenté. Ce modèle est basé sur des paramètres physiques et géométriques et implémenté en langage Verilog-A. Il est utilisable pour la conception de circuits analogiques ou numériques hybrides SET-CMOS. A l'aide de cet outil, nous avons conçu, simulé et évalué les performances de circuits logiques à base de DG-SETs afin de mettre en avant leur utilisation dans les futurs circuits ULSI. Une bibliothèque de cellules logiques, à base de DG-SETs, fonctionnant à haute température est présentée. Des résultats remarquables ont été atteints notamment en termes de consommation d'énergie. De plus, des architectures logiques telles que les blocs élémentaires pour le calcul (ALU, SRAM, etc.) ont été conçues entièrement à base de DG-SETs. La flexibilité offerte par la seconde grille du DG-SET a permis de concevoir une nouvelle famille de circuits logiques flexibles à base de portes de transmission. Une réduction du nombre de transistors par fonction et de consommation a été atteinte. Enfin, des analyses Monte-Carlo sont abordées afin de déterminer la robustesse des circuits logiques conçus à l'égard des dispersions technologiques

    BOOLEAN AND BRAIN-INSPIRED COMPUTING USING SPIN-TRANSFER TORQUE DEVICES

    Get PDF
    Several completely new approaches (such as spintronic, carbon nanotube, graphene, TFETs, etc.) to information processing and data storage technologies are emerging to address the time frame beyond current Complementary Metal-Oxide-Semiconductor (CMOS) roadmap. The high speed magnetization switching of a nano-magnet due to current induced spin-transfer torque (STT) have been demonstrated in recent experiments. Such STT devices can be explored in compact, low power memory and logic design. In order to truly leverage STT devices based computing, researchers require a re-think of circuit, architecture, and computing model, since the STT devices are unlikely to be drop-in replacements for CMOS. The potential of STT devices based computing will be best realized by considering new computing models that are inherently suited to the characteristics of STT devices, and new applications that are enabled by their unique capabilities, thereby attaining performance that CMOS cannot achieve. The goal of this research is to conduct synergistic exploration in architecture, circuit and device levels for Boolean and brain-inspired computing using nanoscale STT devices. Specifically, we first show that the non-volatile STT devices can be used in designing configurable Boolean logic blocks. We propose a spin-memristor threshold logic (SMTL) gate design, where memristive cross-bar array is used to perform current mode summation of binary inputs and the low power current mode spintronic threshold device carries out the energy efficient threshold operation. Next, for brain-inspired computing, we have exploited different spin-transfer torque device structures that can implement the hard-limiting and soft-limiting artificial neuron transfer functions respectively. We apply such STT based neuron (or ‘spin-neuron’) in various neural network architectures, such as hierarchical temporal memory and feed-forward neural network, for performing “human-like” cognitive computing, which show more than two orders of lower energy consumption compared to state of the art CMOS implementation. Finally, we show the dynamics of injection locked Spin Hall Effect Spin-Torque Oscillator (SHE-STO) cluster can be exploited as a robust multi-dimensional distance metric for associative computing, image/ video analysis, etc. Our simulation results show that the proposed system architecture with injection locked SHE-STOs and the associated CMOS interface circuits can be suitable for robust and energy efficient associative computing and pattern matching

    Energy efficient hybrid computing systems using spin devices

    Get PDF
    Emerging spin-devices like magnetic tunnel junctions (MTJ\u27s), spin-valves and domain wall magnets (DWM) have opened new avenues for spin-based logic design. This work explored potential computing applications which can exploit such devices for higher energy-efficiency and performance. The proposed applications involve hybrid design schemes, where charge-based devices supplement the spin-devices, to gain large benefits at the system level. As an example, lateral spin valves (LSV) involve switching of nanomagnets using spin-polarized current injection through a metallic channel such as Cu. Such spin-torque based devices possess several interesting properties that can be exploited for ultra-low power computation. Analog characteristic of spin current facilitate non-Boolean computation like majority evaluation that can be used to model a neuron. The magneto-metallic neurons can operate at ultra-low terminal voltage of ∼20mV, thereby resulting in small computation power. Moreover, since nano-magnets inherently act as memory elements, these devices can facilitate integration of logic and memory in interesting ways. The spin based neurons can be integrated with CMOS and other emerging devices leading to different classes of neuromorphic/non-Von-Neumann architectures. The spin-based designs involve `mixed-mode\u27 processing and hence can provide very compact and ultra-low energy solutions for complex computation blocks, both digital as well as analog. Such low-power, hybrid designs can be suitable for various data processing applications like cognitive computing, associative memory, and currentmode on-chip global interconnects. Simulation results for these applications based on device-circuit co-simulation framework predict more than ∼100x improvement in computation energy as compared to state of the art CMOS design, for optimal spin-device parameters

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Integer Sparse Distributed Memory and Modular Composite Representation

    Get PDF
    Challenging AI applications, such as cognitive architectures, natural language understanding, and visual object recognition share some basic operations including pattern recognition, sequence learning, clustering, and association of related data. Both the representations used and the structure of a system significantly influence which tasks and problems are most readily supported. A memory model and a representation that facilitate these basic tasks would greatly improve the performance of these challenging AI applications.Sparse Distributed Memory (SDM), based on large binary vectors, has several desirable properties: auto-associativity, content addressability, distributed storage, robustness over noisy inputs that would facilitate the implementation of challenging AI applications. Here I introduce two variations on the original SDM, the Extended SDM and the Integer SDM, that significantly improve these desirable properties, as well as a new form of reduced description representation named MCR.Extended SDM, which uses word vectors of larger size than address vectors, enhances its hetero-associativity, improving the storage of sequences of vectors, as well as of other data structures. A novel sequence learning mechanism is introduced, and several experiments demonstrate the capacity and sequence learning capability of this memory.Integer SDM uses modular integer vectors rather than binary vectors, improving the representation capabilities of the memory and its noise robustness. Several experiments show its capacity and noise robustness. Theoretical analyses of its capacity and fidelity are also presented.A reduced description represents a whole hierarchy using a single high-dimensional vector, which can recover individual items and directly be used for complex calculations and procedures, such as making analogies. Furthermore, the hierarchy can be reconstructed from the single vector. Modular Composite Representation (MCR), a new reduced description model for the representation used in challenging AI applications, provides an attractive tradeoff between expressiveness and simplicity of operations. A theoretical analysis of its noise robustness, several experiments, and comparisons with similar models are presented.My implementations of these memories include an object oriented version using a RAM cache, a version for distributed and multi-threading execution, and a GPU version for fast vector processing

    Theory and Practice of Computing with Excitable Dynamics

    Get PDF
    Reservoir computing (RC) is a promising paradigm for time series processing. In this paradigm, the desired output is computed by combining measurements of an excitable system that responds to time-dependent exogenous stimuli. The excitable system is called a reservoir and measurements of its state are combined using a readout layer to produce a target output. The power of RC is attributed to an emergent short-term memory in dynamical systems and has been analyzed mathematically for both linear and nonlinear dynamical systems. The theory of RC treats only the macroscopic properties of the reservoir, without reference to the underlying medium it is made of. As a result, RC is particularly attractive for building computational devices using emerging technologies whose structure is not exactly controllable, such as self-assembled nanoscale circuits. RC has lacked a formal framework for performance analysis and prediction that goes beyond memory properties. To provide such a framework, here a mathematical theory of memory and information processing in ordered and disordered linear dynamical systems is developed. This theory analyzes the optimal readout layer for a given task. The focus of the theory is a standard model of RC, the echo state network (ESN). An ESN consists of a fixed recurrent neural network that is driven by an external signal. The dynamics of the network is then combined linearly with readout weights to produce the desired output. The readout weights are calculated using linear regression. Using an analysis of regression equations, the readout weights can be calculated using only the statistical properties of the reservoir dynamics, the input signal, and the desired output. The readout layer weights can be calculated from a priori knowledge of the desired function to be computed and the weight matrix of the reservoir. This formulation explicitly depends on the input weights, the reservoir weights, and the statistics of the target function. This formulation is used to bound the expected error of the system for a given target function. The effects of input-output correlation and complex network structure in the reservoir on the computational performance of the system have been mathematically characterized. Far from the chaotic regime, ordered linear networks exhibit a homogeneous decay of memory in different dimensions, which keeps the input history coherent. As disorder is introduced in the structure of the network, memory decay becomes inhomogeneous along different dimensions causing decoherence in the input history, and degradation in task-solving performance. Close to the chaotic regime, the ordered systems show loss of temporal information in the input history, and therefore inability to solve tasks. However, by introducing disorder and therefore heterogeneous decay of memory the temporal information of input history is preserved and the task-solving performance is recovered. Thus for systems at the edge of chaos, disordered structure may enhance temporal information processing. Although the current framework only applies to linear systems, in principle it can be used to describe the properties of physical reservoir computing, e.g., photonic RC using short coherence-length light

    VLSI Design

    Get PDF
    This book provides some recent advances in design nanometer VLSI chips. The selected topics try to present some open problems and challenges with important topics ranging from design tools, new post-silicon devices, GPU-based parallel computing, emerging 3D integration, and antenna design. The book consists of two parts, with chapters such as: VLSI design for multi-sensor smart systems on a chip, Three-dimensional integrated circuits design for thousand-core processors, Parallel symbolic analysis of large analog circuits on GPU platforms, Algorithms for CAD tools VLSI design, A multilevel memetic algorithm for large SAT-encoded problems, etc
    corecore