30 research outputs found

    Long-Term Memory for Cognitive Architectures: A Hardware Approach Using Resistive Devices

    Get PDF
    A cognitive agent capable of reliably performing complex tasks over a long time will acquire a large store of knowledge. To interact with changing circumstances, the agent will need to quickly search and retrieve knowledge relevant to its current context. Real time knowledge search and cognitive processing like this is a challenge for conventional computers, which are not optimised for such tasks. This thesis describes a new content-addressable memory, based on resistive devices, that can perform massively parallel knowledge search in the memory array. The fundamental circuit block that supports this capability is a memory cell that closely couples comparison logic with non-volatile storage. By using resistive devices instead of transistors in both the comparison circuit and storage elements, this cell improves area density by over an order of magnitude compared to state of the art CMOS implementations. The resulting memory does not need power to maintain stored information, and is therefore well suited to cognitive agents with large long-term memories. The memory incorporates activation circuits, which bias the knowledge retrieval process according to past memory access patterns. This is achieved by approximating the widely used base-level activation function using resistive devices to store, maintain and compare activation values. By distributing an instance of this circuit to every row in memory, the activation for all memory objects can be updated in parallel. A test using the word sense disambiguation task shows this circuit-based activation model only incurs a small loss in accuracy compared to exact base-level calculations. A variation of spreading activation can also be achieved in-memory. Memory objects are encoded with high-dimensional vectors that create association between correlated representations. By storing these high-dimensional vectors in the new content-addressable memory, activation can be spread to related objects during search operations. The new memory is scalable, power and area efficient, and performs operations in parallel that are infeasible in real-time for a sequential processor with a conventional memory hierarchy.Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering, 201

    The 1992 4th NASA SERC Symposium on VLSI Design

    Get PDF
    Papers from the fourth annual NASA Symposium on VLSI Design, co-sponsored by the IEEE, are presented. Each year this symposium is organized by the NASA Space Engineering Research Center (SERC) at the University of Idaho and is held in conjunction with a quarterly meeting of the NASA Data System Technology Working Group (DSTWG). One task of the DSTWG is to develop new electronic technologies that will meet next generation electronic data system needs. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The NASA SERC is proud to offer, at its fourth symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories, the electronics industry, and universities. These speakers share insights into next generation advances that will serve as a basis for future VLSI design

    High performance and energy-efficient instruction cache design and optimisation for ultra-low-power multi-core clusters

    Get PDF
    High Energy efficiency and high performance are the key regiments for Internet of Things (IoT) end-nodes. Exploiting cluster of multiple programmable processors has recently emerged as a suitable solution to address this challenge. However, one of the main bottlenecks for multi-core architectures is the instruction cache. While private caches fall into data replication and wasting area, fully shared caches lack scalability and form a bottleneck for the operating frequency. Hence we propose a hybrid solution where a larger shared cache (L1.5) is shared by multiple cores connected through a low-latency interconnect to small private caches (L1). However, it is still limited by large capacity miss with a small L1. Thus, we propose a sequential prefetch from L1 to L1.5 to improve the performance with little area overhead. Moreover, to cut the critical path for better timing, we optimized the core instruction fetch stage with non-blocking transfer by adopting a 4 x 32-bit ring buffer FIFO and adding a pipeline for the conditional branch. We present a detailed comparison of different instruction cache architectures' performance and energy efficiency recently proposed for Parallel Ultra-Low-Power clusters. On average, when executing a set of real-life IoT applications, our two-level cache improves the performance by up to 20% and loses 7% energy efficiency with respect to the private cache. Compared to a shared cache system, it improves performance by up to 17% and keeps the same energy efficiency. In the end, up to 20% timing (maximum frequency) improvement and software control enable the two-level instruction cache with prefetch adapt to various battery-powered usage cases to balance high performance and energy efficiency

    Cache Attacks and Defenses

    Get PDF
    In the digital age, as our daily lives depend heavily on interconnected computing devices, information security has become a crucial concern. The continuous exchange of data between devices over the Internet exposes our information vulnerable to potential security breaches. Yet, even with measures in place to protect devices, computing equipment inadvertently leaks information through side-channels, which emerge as byproducts of computational activities. One particular source of such side channels is the cache, a vital component of modern processors that enhances computational speed by storing frequently accessed data from random access memory (RAM). Due to their limited capacity, caches often need to be shared among concurrently running applications, resulting in vulnerabilities. Cache side-channel attacks, which exploit such vulnerabilities, have received significant attention due to their ability to stealthily compromise information confidentiality and the challenge in detecting and countering them. Consequently, numerous defense strategies have been proposed to mitigate these attacks. This thesis explores these defense strategies against cache side-channels, assesses their effectiveness, and identifies any potential vulnerabilities that could be used to undermine the effectiveness of these defense strategies. The first contribution of this thesis is a software framework to assess the security of secure cache designs. We show that while most secure caches are protected from eviction-set-based attacks, they are vulnerable to occupancybased attacks, which works just as well as eviction-set-based attacks, and therefore should be taken into account when designing and evaluating secure caches. Our second contribution presents a method that utilizes speculative execution to enable high-resolution attacks on low-resolution timers, a common cache attack countermeasure adopted by web browsers. We demonstrate that our technique not only allows for high-resolution attacks to be performed on low-resolution timers, but is also Turing-complete and is capable of performing robust calculations on cache states. Through this research, we uncover a new attack vector on low-resolution timers. By exposing this vulnerability, we hope to prompt the necessary measures to address the issue and enhance the security of systems in the future. Our third contribution is a survey, paired with experimental assessment of cache side-channel attack detection techniques using hardware performance counters. We show that, despite numerous claims regarding their efficacy, most detection techniques fail to perform proper evaluation of their performance, leaving them vulnerable to more advanced attacks. We identify and outline these shortcomings, and furnish experimental evidence to corroborate our findings. Furthermore, we demonstrate a new attack that is capable of compromising these detection methods. Our aim is to bring attention to these shortcomings and provide insights that can aid in the development of more robust cache side-channel attack detection techniques. This thesis contributes to a deeper comprehension of cache side-channel attacks and their potential effects on information security. Furthermore, it offers valuable insights into the efficacy of existing mitigation approaches and detection methods, while identifying areas for future research and development to better safeguard our computing devices and data from these insidious attacks.Thesis (MPhil) -- University of Adelaide, School of Computer and Mathematical Sciences, 202

    A cross-stack, network-centric architectural design for next-generation datacenters

    Get PDF
    This thesis proposes a full-stack, cross-layer datacenter architecture based on in-network computing and near-memory processing paradigms. The proposed datacenter architecture is built atop two principles: (1) utilizing commodity, off-the-shelf hardware (i.e., processor, DRAM, and network devices) with minimal changes to their architecture, and (2) providing a standard interface to the programmers for using the novel hardware. More specifically, the proposed datacenter architecture enables a smart network adapter to collectively compress/decompress data exchange between distributed DNN training nodes and assist the operating system in performing aggressive processor power management. It also deploys specialized memory modules in the servers, capable of performing general-purpose computation and network connectivity. This thesis unlocks the potentials of hardware and operating system co-design in architecting application-transparent, near-data processing hardware for improving datacenter's performance, energy efficiency, and scalability. We evaluate the proposed datacenter architecture using a combination of full-system simulation, FPGA prototyping, and real-system experiments

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    Bit-Flip Aware Data Structures for Phase Change Memory

    Get PDF
    Big, non-volatile, byte-addressable, low-cost, and fast non-volatile memories like Phase Change Memory are appearing in the marketplace. They have the capability to unify both memory and storage and allow us to rethink the present memory hierarchy. An important draw-back to Phase Change Memory is limited write-endurance. In addition, Phase Change Memory shares with other Non-Volatile Random Access Memories an asym- metry in the energy costs of writes and reads. Best use of Non-Volatile Random Access Memories limits the number of times a Non-Volatile Random Access Memory cell changes contents, called a bit-flip. While the future of main memory is still unknown, we should already start to create data structures for them in order to shape the future era. This thesis investigates the creation of bit-flip aware data structures.The thesis first considers general ways in which a data structure can save bit- flips by smart overwrites and by using the exclusive-or of pointers. It then shows how a simple content dependent encoding can reduce bit-flips for web corpora. It then shows how to build hash based dictionary structures for Linear Hashing and Spiral Storage. Finally, the thesis presents Gray counters, close to bit-flip optimal counters that even enable age- based wear leveling with counters managed by the Non-Volatile Random Access Memories themselves instead of by the Operating Systems

    Programming Languages and Systems

    Get PDF
    This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems

    Programming Languages and Systems

    Get PDF
    This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems

    An OS-Based Alternative to Full Hardware Coherence on Tiled Chip-Multiprocessors

    Get PDF
    Institute for Computing Systems ArchitectureThe interconnect mechanisms (shared bus or crossbar) used in current chip-multiprocessors (CMPs) are expected to become a bottleneck that prevents these architectures from scaling to a larger number of cores. Tiled CMPs offer better scalability by integrating relatively simple cores with a lightweight point-to-point interconnect. However, such interconnects make snooping impractical and, thus, require alternative solutions to cache coherence. This thesis proposes a novel, cost-effective hardware mechanism to support shared-memory parallel applications that forgoes hardware maintained cache coherence. The proposed mech- anism is based on the key ideas that mapping of lines to physical caches is done at the page level with OS support and that hardware supports remote cache accesses. It allows only some controlled migration and replication of data and provides a sufficient degree of flexibility in the mapping through an extra level of indirection between virtual pages and physical tiles. The proposed tiled CMP architecture is evaluated on the SPLASH-2 scientific benchmarks and ALPBench multimedia benchmarks against one with private caches and a distributed direc- tory cache coherence mechanism. Experimental results show that the performance degradation is as little as 0%, and 16% on average, compared to the cache coherent architecture across all benchmarks for 16 and 32 processors
    corecore