8,888 research outputs found

    Graduate Catalog of Studies, 2023-2024

    Get PDF

    Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

    Full text link
    The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.Comment: Under Review at ACM Computing Survey

    Design and Evaluation of Wearable Multimodal RF Sensing System for Vascular Dementia Detection

    Get PDF

    Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators

    Full text link
    Analog in-memory computing (AIMC) -- a promising approach for energy-efficient acceleration of deep learning workloads -- computes matrix-vector multiplications (MVMs) but only approximately, due to nonidealities that often are non-deterministic or nonlinear. This can adversely impact the achievable deep neural network (DNN) inference accuracy as compared to a conventional floating point (FP) implementation. While retraining has previously been suggested to improve robustness, prior work has explored only a few DNN topologies, using disparate and overly simplified AIMC hardware models. Here, we use hardware-aware (HWA) training to systematically examine the accuracy of AIMC for multiple common artificial intelligence (AI) workloads across multiple DNN topologies, and investigate sensitivity and robustness to a broad set of nonidealities. By introducing a new and highly realistic AIMC crossbar-model, we improve significantly on earlier retraining approaches. We show that many large-scale DNNs of various topologies, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers, can in fact be successfully retrained to show iso-accuracy on AIMC. Our results further suggest that AIMC nonidealities that add noise to the inputs or outputs, not the weights, have the largest impact on DNN accuracy, and that RNNs are particularly robust to all nonidealities.Comment: 35 pages, 7 figures, 5 table

    DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator

    Full text link
    The wide adoption and significant computing resource consumption of attention-based Transformers, e.g., Vision Transformer and large language models, have driven the demands for efficient hardware accelerators. While electronic accelerators have been commonly used, there is a growing interest in exploring photonics as an alternative technology due to its high energy efficiency and ultra-fast processing speed. Optical neural networks (ONNs) have demonstrated promising results for convolutional neural network (CNN) workloads that only require weight-static linear operations. However, they fail to efficiently support Transformer architectures with attention operations due to the lack of ability to process dynamic full-range tensor multiplication. In this work, we propose a customized high-performance and energy-efficient photonic Transformer accelerator, DOTA. To overcome the fundamental limitation of existing ONNs, we introduce a novel photonic tensor core, consisting of a crossbar array of interference-based optical vector dot-product engines, that supports highly-parallel, dynamic, and full-range matrix-matrix multiplication. Our comprehensive evaluation demonstrates that DOTA achieves a >4x energy and a >10x latency reduction compared to prior photonic accelerators, and delivers over 20x energy reduction and 2 to 3 orders of magnitude lower latency compared to the electronic Transformer accelerator. Our work highlights the immense potential of photonic computing for efficient hardware accelerators, particularly for advanced machine learning workloads.Comment: The short version is accepted by Next-Gen AI System Workshop at MLSys 202

    Adaptive Microarchitectural Optimizations to Improve Performance and Security of Multi-Core Architectures

    Get PDF
    With the current technological barriers, microarchitectural optimizations are increasingly important to ensure performance scalability of computing systems. The shift to multi-core architectures increases the demands on the memory system, and amplifies the role of microarchitectural optimizations in performance improvement. In a multi-core system, microarchitectural resources are usually shared, such as the cache, to maximize utilization but sharing can also lead to contention and lower performance. This can be mitigated through partitioning of shared caches.However, microarchitectural optimizations which were assumed to be fundamentally secure for a long time, can be used in side-channel attacks to exploit secrets, as cryptographic keys. Timing-based side-channels exploit predictable timing variations due to the interaction with microarchitectural optimizations during program execution. Going forward, there is a strong need to be able to leverage microarchitectural optimizations for performance without compromising security. This thesis contributes with three adaptive microarchitectural resource management optimizations to improve security and/or\ua0performance\ua0of multi-core architectures\ua0and a systematization-of-knowledge of timing-based side-channel attacks.\ua0We observe that to achieve high-performance cache partitioning in a multi-core system\ua0three requirements need to be met: i) fine-granularity of partitions, ii) locality-aware placement and iii) frequent changes. These requirements lead to\ua0high overheads for current centralized partitioning solutions, especially as the number of cores in the\ua0system increases. To address this problem, we present an adaptive and scalable cache partitioning solution (DELTA) using a distributed and asynchronous allocation algorithm. The\ua0allocations occur through core-to-core challenges, where applications with larger performance benefit will gain cache capacity. The\ua0solution is implementable in hardware, due to low computational complexity, and can scale to large core counts.According to our analysis, better performance can be achieved by coordination of multiple optimizations for different resources, e.g., off-chip bandwidth and cache, but is challenging due to the increased number of possible allocations which need to be evaluated.\ua0Based on these observations, we present a solution (CBP) for coordinated management of the optimizations: cache partitioning, bandwidth partitioning and prefetching.\ua0Efficient allocations, considering the inter-resource interactions and trade-offs, are achieved using local resource managers to limit the solution space.The continuously growing number of\ua0side-channel attacks leveraging\ua0microarchitectural optimizations prompts us to review attacks and defenses to understand the vulnerabilities of different microarchitectural optimizations. We identify the four root causes of timing-based side-channel attacks: determinism, sharing, access violation\ua0and information flow.\ua0Our key insight is that eliminating any of the exploited root causes, in any of the attack steps, is enough to provide protection.\ua0Based on our framework, we present a systematization of the attacks and defenses on a wide range of microarchitectural optimizations, which highlights their key similarities.\ua0Shared caches are an attractive attack surface for side-channel attacks, while defenses need to be efficient since the cache is crucial for performance.\ua0To address this issue, we present an adaptive and scalable cache partitioning solution (SCALE) for protection against cache side-channel attacks. The solution leverages randomness,\ua0and provides quantifiable and information theoretic security guarantees using differential privacy. The solution closes the performance gap to a state-of-the-art non-secure allocation policy for a mix of secure and non-secure applications

    Spatial-temporal domain charging optimization and charging scenario iteration for EV

    Get PDF
    Environmental problems have become increasingly serious around the world. With lower carbon emissions, Electric Vehicles (EVs) have been utilized on a large scale over the past few years. However, EVs are limited by battery capacity and require frequent charging. Currently, EVs suffer from long charging time and charging congestion. Therefore, EV charging optimization is vital to ensure drivers’ mobility. This study first presents a literature analysis of the current charging modes taxonomy to elucidate the advantages and disadvantages of different charging modes. In specific optimization, under plug-in charging mode, an Urgency First Charging (UFC) scheduling policy is proposed with collaborative optimization of the spatialtemporal domain. The UFC policy allows those EVs with charging urgency to get preempted charging services. As conventional plug-in charging mode is limited by the deployment of Charging Stations (CSs), this study further introduces and optimizes Vehicle-to-Vehicle (V2V) charging. This is aim to maximize the utilization of charging infrastructures and to balance the grid load. This proposed reservation-based V2V charging scheme optimizes pair matching of EVs based on minimized distance. Meanwhile, this V2V scheme allows more EVs get fully charged via minimized waiting time based parking lot allocation. Constrained by shortcomings (rigid location of CSs and slow charging power under V2V converters), a single charging mode can hardly meet a large number of parallel charging requests. Thus, this study further proposes a hybrid charging mode. This mode is to utilize the advantages of plug-in and V2V modes to alleviate the pressure on the grid. Finally, this study addresses the potential problems of EV charging with a view to further optimizing EV charging in subsequent studies

    Toward Fault-Tolerant Applications on Reconfigurable Systems-on-Chip

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Sparse-firing regularization methods for spiking neural networks with time-to-first spike coding

    Full text link
    The training of multilayer spiking neural networks (SNNs) using the error backpropagation algorithm has made significant progress in recent years. Among the various training schemes, the error backpropagation method that directly uses the firing time of neurons has attracted considerable attention because it can realize ideal temporal coding. This method uses time-to-first spike (TTFS) coding, in which each neuron fires at most once, and this restriction on the number of firings enables information to be processed at a very low firing frequency. This low firing frequency increases the energy efficiency of information processing in SNNs, which is important not only because of its similarity with information processing in the brain, but also from an engineering point of view. However, only an upper limit has been provided for TTFS-coded SNNs, and the information-processing capability of SNNs at lower firing frequencies has not been fully investigated. In this paper, we propose two spike timing-based sparse-firing (SSR) regularization methods to further reduce the firing frequency of TTFS-coded SNNs. The first is the membrane potential-aware SSR (M-SSR) method, which has been derived as an extreme form of the loss function of the membrane potential value. The second is the firing condition-aware SSR (F-SSR) method, which is a regularization function obtained from the firing conditions. Both methods are characterized by the fact that they only require information about the firing timing and associated weights. The effects of these regularization methods were investigated on the MNIST, Fashion-MNIST, and CIFAR-10 datasets using multilayer perceptron networks and convolutional neural network structures
    • …
    corecore