115 research outputs found

    Quick and energy-efficient Bayesian computing of binocular disparity using stochastic digital signals

    Get PDF
    Reconstruction of the tridimensional geometry of a visual scene using the binocular disparity information is an important issue in computer vision and mobile robotics, which can be formulated as a Bayesian inference problem. However, computation of the full disparity distribution with an advanced Bayesian model is usually an intractable problem, and proves computationally challenging even with a simple model. In this paper, we show how probabilistic hardware using distributed memory and alternate representation of data as stochastic bitstreams can solve that problem with high performance and energy efficiency. We put forward a way to express discrete probability distributions using stochastic data representations and perform Bayesian fusion using those representations, and show how that approach can be applied to diparity computation. We evaluate the system using a simulated stochastic implementation and discuss possible hardware implementations of such architectures and their potential for sensorimotor processing and robotics.Comment: Preprint of article submitted for publication in International Journal of Approximate Reasoning and accepted pending minor revision

    Design of robust ultra-low power platform for in-silicon machine learning

    Get PDF
    The rapid development of machine learning plays a key role in enabling next generation computing systems with enhanced intelligence. Present day machine learning systems adopt an "intelligence in the cloud" paradigm, resulting in heavy energy cost despite state-of-the-art performance. It is therefore of great interest to design embedded ultra-low power (ULP) platforms with in-silicon machine learning capability. A self-contained ULP platform consists of the energy delivery, sensing and information processing subsystems. This dissertation proposes techniques to design and optimize the ULP platform for in-silicon machine learning by exploring a trade-off that exists between energy-efficiency and robustness. This trade-off arises when the information processing functionality is integrated into the energy delivery, sensing, or emerging stochastic fabrics (e.g., CMOS operating in near-threshold voltage or voltage overscaling, and beyond CMOS devices). This dissertation presents the Compute VRM (C-VRM) to embed the information processing into the energy delivery subsystem. The C-VRM employs multiple voltage domain stacking and core swapping to achieve high total system energy efficiency in near/sub-threshold region. A prototype IC of the C-VRM is implemented in a 1.2 V, 130 nm CMOS process. Measured results indicate that the C-VRM has up to 44.8% savings in system-level energy per operation compared to the conventional system, and an efficiency ranging from 79% to 83% over an output voltage range of 0.52 V to 0.6 V. This dissertation further proposes the Compute Sensor approach to embed information processing into the sensing subsystem. The Compute Sensor eliminates both the traditional sensor-processor interface, and the high-SNR/high-energy digital processing by moving feature extraction and classification functions into the analog domain. Simulation results in 65 nm CMOS show that the proposed Compute Sensor can achieve a detection accuracy greater than 94.7% using the Caltech101 dataset, which is within 0.5% of that achieved by an ideal digital implementation. The performance is achieved with 7x to 17x lower energy than the conventional architecture for the same level of accuracy. To further explore the energy-efficiency vs. robustness trade-off, this dissertation explores the use of highly energy efficient but unreliable stochastic fabrics to implement in-silicon machine learning kernels. In order to perform reliable computation on the stochastic fabrics, this dissertation proposes to employ statistical error compensation (SEC) as an effective error compensation technique. This dissertation makes a contribution to the portfolio of SEC by proposing embedded algorithmic noise tolerance (E-ANT) for low overhead error compensation. E-ANT operates by reusing part of the main block as estimator and thus embedding the estimator into the main block. System level simulation results in a commercial 45 nm CMOS process show that E-ANT achieves up to 38% error tolerance and up to 51% energy savings compared with an uncompensated system. This dissertation makes a contribution to the theoretical understanding of stochastic fabrics by proposing a class of probabilistic error models that can accurately model the hardware errors on the stochastic fabrics. The models are validated in a commercial 45 nm CMOS process and employed to evaluate the performance of machine learning kernels in the presence of hardware errors. Performance prediction of a support vector machine (SVM) based classifier using these models indicates that the probability of detection P_{det} estimated using the proposed model is within 3% for timing errors due to voltage overscaling when the error rate p_η ≤ 80%, within 5% for timing errors due to process variation in near threshold-voltage (NTV) region (0.3 V-0.7 V) and within 2% for defect errors when the defect rate p_{saf} is between 10^{-3} and 20%, compared with HDL simulation results. Employing the proposed error model and evaluation methodology, this dissertation explores the use of distributed machine learning architectures, named classifier ensemble, to enhance the robustness of in-silicon machine learning kernels. Comparative study of distributed architectures (i.e., random forest (RF)) and centralized architectures (i.e., SVM) is performed in a commercial 45 nm CMOS process. Employing the UCI machine learning repository as input, it is determined that RF-based architectures are significantly more robust than SVM architectures in presence of timing errors in the NTV region (0.3 V- 0.7 V). Additionally, an error weighted voting technique that incorporates the timing error statistics of the NTV circuit fabric is proposed to further enhance the robustness of RF architectures. Simulation results confirm that the error weighted voting technique achieves a P_{det} that varies by only 1.4%, which is 12x lower compared to centralized architectures

    Memristors -- from In-memory computing, Deep Learning Acceleration, Spiking Neural Networks, to the Future of Neuromorphic and Bio-inspired Computing

    Full text link
    Machine learning, particularly in the form of deep learning, has driven most of the recent fundamental developments in artificial intelligence. Deep learning is based on computational models that are, to a certain extent, bio-inspired, as they rely on networks of connected simple computing units operating in parallel. Deep learning has been successfully applied in areas such as object/pattern recognition, speech and natural language processing, self-driving vehicles, intelligent self-diagnostics tools, autonomous robots, knowledgeable personal assistants, and monitoring. These successes have been mostly supported by three factors: availability of vast amounts of data, continuous growth in computing power, and algorithmic innovations. The approaching demise of Moore's law, and the consequent expected modest improvements in computing power that can be achieved by scaling, raise the question of whether the described progress will be slowed or halted due to hardware limitations. This paper reviews the case for a novel beyond CMOS hardware technology, memristors, as a potential solution for the implementation of power-efficient in-memory computing, deep learning accelerators, and spiking neural networks. Central themes are the reliance on non-von-Neumann computing architectures and the need for developing tailored learning and inference algorithms. To argue that lessons from biology can be useful in providing directions for further progress in artificial intelligence, we briefly discuss an example based reservoir computing. We conclude the review by speculating on the big picture view of future neuromorphic and brain-inspired computing systems.Comment: Keywords: memristor, neuromorphic, AI, deep learning, spiking neural networks, in-memory computin

    Robust and reliable decision-making systems and algorithms

    Get PDF
    We investigate robustness and reliability in decision-making systems and algorithms based on the tradeoff between cost and performance. We propose two abstract frameworks to investigate robustness and reliability concerns, which critically impact the design and analysis of systems and algorithms based on unreliable components. We consider robustness in online systems and algorithms under the framework of online optimization subject to adversarial perturbations. The framework of online optimization models a rich class of problems from information theory, machine learning, game theory, optimization, and signal processing. This is a repeated game framework where, on each round, a player selects an action from a decision set using a randomized strategy, and then Nature reveals a loss function for this action, for which the player incurs a loss. Through a worst case adversary framework to model the perturbations, we introduce a randomized algorithm that is provably robust even against such adversarial attacks. In particular, we show that this algorithm is Hannan-consistent with respect to a rich class of randomized strategies under mild regularity conditions. We next focus on reliability of decision-making systems and algorithms based on the problem of fusing several unreliable computational units that perform the same task under cost and fidelity constraints. In particular, we model the relationship between the fidelity of the outcome and the cost of computing it as an additive perturbation. We analyze performance of repetition-based strategies that distribute cost across several unreliable units and fuse their outcomes. When the cost is a convex function of fidelity, the optimal repetition-based strategy in terms of minimizing total incurred cost while achieving a target mean-square error performance may fuse several computational units. For concave and linear costs, a single more reliable unit incurs lower cost compared to fusion of several lower cost and less reliable units while achieving the same mean-square error (MSE) performance. We show how our results give insight into problems from theoretical neuroscience, circuits, and crowdsourcing. We finally study an application of a partial information extension of the cost-fidelity framework of this dissertation to a stochastic gradient descent problem, where the underlying cost-fidelity function is assumed to be unknown. We present a generic framework for trading off fidelity and cost in computing stochastic gradients when the costs of acquiring stochastic gradients of different quality are not known a priori. We consider a mini-batch oracle that distributes a limited query budget over a number of stochastic gradients and aggregates them to estimate the true gradient. Since the optimal mini-batch size depends on the unknown cost fidelity function, we propose an algorithm, EE-Grad, that sequentially explores the performance of mini-batch oracles and exploits the accumulated knowledge to estimate the one achieving the best performance in terms of cost efficiency. We provide performance guarantees for EE-Grad with respect to the optimal mini-batch oracle, and illustrate these results in the case of strongly convex objectives

    one6G white paper, 6G technology overview:Second Edition, November 2022

    Get PDF
    6G is supposed to address the demands for consumption of mobile networking services in 2030 and beyond. These are characterized by a variety of diverse, often conflicting requirements, from technical ones such as extremely high data rates, unprecedented scale of communicating devices, high coverage, low communicating latency, flexibility of extension, etc., to non-technical ones such as enabling sustainable growth of the society as a whole, e.g., through energy efficiency of deployed networks. On the one hand, 6G is expected to fulfil all these individual requirements, extending thus the limits set by the previous generations of mobile networks (e.g., ten times lower latencies, or hundred times higher data rates than in 5G). On the other hand, 6G should also enable use cases characterized by combinations of these requirements never seen before, e.g., both extremely high data rates and extremely low communication latency). In this white paper, we give an overview of the key enabling technologies that constitute the pillars for the evolution towards 6G. They include: terahertz frequencies (Section 1), 6G radio access (Section 2), next generation MIMO (Section 3), integrated sensing and communication (Section 4), distributed and federated artificial intelligence (Section 5), intelligent user plane (Section 6) and flexible programmable infrastructures (Section 7). For each enabling technology, we first give the background on how and why the technology is relevant to 6G, backed up by a number of relevant use cases. After that, we describe the technology in detail, outline the key problems and difficulties, and give a comprehensive overview of the state of the art in that technology. 6G is, however, not limited to these seven technologies. They merely present our current understanding of the technological environment in which 6G is being born. Future versions of this white paper may include other relevant technologies too, as well as discuss how these technologies can be glued together in a coherent system

    Energy-efficient systems for information transfer and processing

    Get PDF
    Machine learning (ML) systems are finding excellent utility in tackling the data deluge of the big data era thanks to the exponential increase in computing power. Current ML systems adopt either centralized cloud computing or distributed edge computing. In both, the challenge of energy efficiency has been drawing increased attention. In cloud computing, data transfer due to inter-chip, inter-board, inter-shelf and inter-rack communications (I/O interface) within data centers is one of the dominant energy costs. This will intensify with the growing demand for increased I/O bandwidth of high-performance computing in data centers. On the other hand, in edge computing, energy efficiency is the primary design challenge, as mobile devices have limited energy, computation and storage resources. This challenge is being exacerbated by the need to embed ML algorithms such as convolutional neural networks (CNNs) for enabling local on-device inference capabilities. In this dissertation, we investigate techniques to address these challenges. To address the energy efficiency challenge in data centers, this dissertation focuses on reducing the energy consumption of the I/O interface. Specifically, in the emerging analog-to-digital converter (ADC)-based multi-Gb/s serial link receivers, the power dissipation is dominated by the ADC. ADCs in serial links employ signal-to-noise-and-distortion-ratio (SNDR) and effective-number-of-bits (ENOB) as performance metrics because these are the standard for generic ADC design. This dissertation presents the use of information-based metrics such as bit-error-rate (BER) to design a BER-optimal ADC (BOA) for serial links. First, theoretical analysis is developed to show when the benefits of BOA over a conventional uniform ADC (CUA) in a serial link receiver are substantial. Second, a \unit[4]{GS/s}, 4-\mbox{\textrm{bit}} on-chip ADC in a \unit[90]{nm} CMOS process is designed and integrated into a 4 Gb/s serial link receiver to verify the aforementioned analysis. Specifically, measured results demonstrate that a 3-\mathrm{bit} BOA receiver outperforms a 4-\mathrm{bit} CUA receiver at a BER <10^{-12} and provides \unit[50]{\%} power savings in the ADC. In the process, it is demonstrated conclusively that BER as opposed to ENOB is a better metric when designing ADCs for serial links. For the problem of resource-constrained computing at the edge, this dissertation tackles the issue of energy-efficient implementation of ML algorithms, particularly CNNs which have recently gained considerable interest due to their record-breaking performance in many recognition tasks. However, their implementation complexity hinders their deployment on power-constrained embedded platforms. This dissertation develops two techniques for energy-efficient CNN design. The first technique is a predictive CNN (PredictiveNet), which makes use of high sparsity in well-trained CNNs to bypass a large fraction of power-dominant convolutions at runtime without modifying the CNN structure. Analysis supported by simulations is provided to justify PredictiveNet's effectiveness. When applied to both the MNIST and CIFAR-10 datasets, simulation results show that PredictiveNet achieves 7.2\times and 4.4\times reduction in the computational and representational costs, respectively, compared with a conventional CNN. It is further shown that PredictiveNet enables computational and representational cost reductions of 2.5\times and 1.7\times, respectively, compared to a state-of-the-art CNN, while incurring only 0.02 classification accuracy loss. The second technique is a variation-tolerant architecture for CNN capable of operating in near threshold voltage (NTV) regime for aggressive energy efficiency. It is well-known that NTV computing can achieve up to 10\times energy savings but is sensitive to process, temperature, and voltage (PVT) variations which can lead to timing errors. To leverage the great potential of NTV for energy efficiency, this dissertation develops a new statistical error compensation (SEC) technique referred to as rank decomposed SEC (RD-SEC). RD-SEC makes use of inherent redundancy in CNNs to handle timing errors due to NTV computing. When evaluated in CNNs for both the MNIST and CIFAR-10 datasets, simulation results in \unit[45]{nm} CMOS show that RD-SEC enables robust CNNs operating in the NTV regime. Specifically, the proposed RD-SEC can achieve up to 11\times improvement in variation tolerance and enable up to 113\times reduction in the standard deviation of classification accuracy while incurring marginal degradation in the median classification accuracy

    Empowering Materials Processing and Performance from Data and AI

    Get PDF
    Third millennium engineering address new challenges in materials sciences and engineering. In particular, the advances in materials engineering combined with the advances in data acquisition, processing and mining as well as artificial intelligence allow for new ways of thinking in designing new materials and products. Additionally, this gives rise to new paradigms in bridging raw material data and processing to the induced properties and performance. This present topical issue is a compilation of contributions on novel ideas and concepts, addressing several key challenges using data and artificial intelligence, such as:- proposing new techniques for data generation and data mining;- proposing new techniques for visualizing, classifying, modeling, extracting knowledge, explaining and certifying data and data-driven models;- processing data to create data-driven models from scratch when other models are absent, too complex or too poor for making valuable predictions;- processing data to enhance existing physic-based models to improve the quality of the prediction capabilities and, at the same time, to enable data to be smarter; and- processing data to create data-driven enrichment of existing models when physics-based models exhibit limits within a hybrid paradigm

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Multiscale Modeling and Gaussian Process Regression for Applications in Composite Materials

    Get PDF
    An ongoing challenge in advanced materials design is the development of accurate multiscale models that consider uncertainty while establishing a link between knowledge or information about constituent materials to overall composite properties. Successful models can accurately predict composite properties, reducing the high financial and labor costs associated with experimental determination and accelerating material innovation. Whereas early pioneers in micromechanics developed simplistic theoretical models to map these relationships, modern advances in computer technology have enabled detailed simulators capable of accurately predicting complex and multiscale phenomena. This work advances domain knowledge via two means: firstly, through the development of high-fidelity, physics-based finite element (FE) models of composite microstructures that incorporate uncertainty in predictions, and secondly, through the development of a novel inverse analysis framework that enables the discovery of unknown or obscure constituent properties using literature data and Gaussian process (GP) surrogate models trained on FE model predictions. This work presents a generalizable approach to modeling a diverse array of composite subtypes, from a simple particulate system to a complex commercial composite. The inverse analysis framework was demonstrated for a thermoplastic composite reinforced by spherical fillers with unknown interphase properties. The framework leverages computer model simulations with easily obtainable macroscale elastic property measurements to infer interphase properties that are otherwise challenging to measure. The interphase modulus and thickness were determined for six different thermoplastic composites; four were reinforced by micron-scale particles and two with nano-scale particles. An alginate fiber embedded with a helically symmetric arrangement of cellulose nanocrystals (CNCs) was investigated using multiscale FE analysis to quantify microstructural uncertainty and the subsequent effect on macroscopic behavior. The macroscale uniaxial tensile simulation revealed that the microstructure induces internal stresses sufficient to rotate or twist the fiber about its axis. The reduction in axial elastic modulus for increases in CNC spiral angle was quantified in a sensitivity analysis using a GP surrogate modeling approach. A predictive model using GP regression was employed to investigate the link between input features and the mechanical properties of fiberglass-reinforced magnesium oxychloride (MOC) cement boards produced from a commercial process. The model evaluated the effect of formulation, crystalline phase compositions, and process control parameters on various mechanical performance metrics
    corecore