12 research outputs found

    A Self-Organizing Wireless Sensor Network and Distributed Computing Engine for Commodity and Future Palmtop Computers

    Get PDF
    The embedded class processors found in commodity palmtop computers continue to become increasingly capable while retaining an energy-efficient footprint. Palmtop computers themselves, including smartphones and tablets, provide a small form factor system integrating wireless communication and non-volatile storage with these energy-efficient processors. Also, various wireless connectivity functions on mobile devices provide new opportunities in designing more flexible, smarter wireless sensor networks (WSNs), and utilizing the computation power in a way we could never imagine before. In this dissertation, I present a WSN concept for current and future generation tablet devices. My contributions include developments at the system level, architecture level, and collaborative design between different layers of the system. At the system level, I developed Ocelot, a grid-like computing environment for palmtop computers in place of traditional workstation or server class machines to compute highly parallel light to medium-weight tasks in an energy efficient manner. Additionally, I developed Lynx, a self-organizing wireless sensor network, which is a further step taken in exploiting the potential of palmtop computers. At the architecture level, to increase the storage capacity of future palmtop computers, I explore the use of a new storage class magnetic memory, Racetrack Memory (RM), throughout the memory hierarchy. Thus, I developed FusedCache, a naturally inclusive, dual-level private cache design for RM that provides fast uniform access at one level, and non-uniform access at the next, which allows RM to be effective as close to the processor as an L1 cache. For higher levels of the memory hierarchy such as the last level cache, I propose a Multilane Racetrack Cache (MRC), an RM last level cache design utilizing lightweight compression combined with independent shifting. MRCs allow cache lines mapped to the same Racetrack structure to be accessed in parallel when compressed, mitigating potential shifting stalls in an RM cache. Finally, leveraging the lightweight compression from MRC and the need for efficient communication in Lynx, I present a cross-level design combining memory-level lightweight compression with network-level packet transfer, together with a technique called Source-Aware Layout Reorganization (SALR) to increase the compressibility of sensor data

    Computing with Spintronics: Circuits and architectures

    Get PDF
    This thesis makes the following contributions towards the design of computing platforms with spintronic devices. 1) It explores the use of spintronic memories in the design of a domain-specific processor for an emerging class of data-intensive applications, namely recognition, mining and synthesis (RMS). Two different spintronic memory technologies — Domain Wall Memory (DWM) and STT-MRAM — are utilized to realize the different levels in the memory hierarchy of the domain-specific processor, based on their respective access characteristics. Architectural tradeoffs created by the use of spintronic memories are analyzed. The proposed design achieves 1.5X-4X improvements in energy-delay product compared to a CMOS baseline. 2) It describes the first attempt to use DWM in the cache hierarchy of general-purpose processors. DWM promises unparalleled density by packing several bits of data into each bit-cell. TapeCache, the proposed DWM-based cache architecture, utilizes suitable circuit and architectural optimizations to address two key challenges (i) the high energy and latency requirement of write operations and (ii) the need for shift operations to access the data stored in each DWM bit-cell. At the circuit level, DWM bit-cells that are tailored to the distinct design requirements of different levels in the cache hierarchy are proposed. At the architecture level, TapeCache proposes suitable cache organization and management policies to alleviate the performance impact of shift operations required to access data stored in DWM bit-cells. TapeCache achieves more than 7X improvements in both cache area and energy with virtually identical performance compared to an SRAM-based cache hierarchy. 3) It investigates the design of the on-chip memory hierarchy of general-purpose graphics processing units (GPGPUs)—massively parallel processors that are optimized for data-intensive high-throughput workloads—using DWM. STAG, a high density, energy-efficient Spintronic- Tape Architecture for GPGPU cache hierarchies is described. STAG utilizes different DWM bit-cells to realize different memory arrays in the GPGPU cache hierarchy. To address the challenge of high access latencies due to shifts, STAG predicts upcoming cache accesses by leveraging unique characteristics of GPGPU architectures and workloads, and prefetches data that are both likely to be accessed and require large numbers of shift operations. STAG achieves 3.3X energy reduction and 12.1% performance improvement over CMOS SRAM under iso-area conditions. 4) While the potential of spintronic devices for memories is widely recognized, their utility in realizing logic is much less clear. The thesis presents Spintastic, a new paradigm that utilizes Stochastic Computing (SC) to realize spintronic logic. In SC, data is encoded in the form of pseudo-random bitstreams, such that the probability of a \u271\u27 in a bitstream corresponds to the numerical value that it represents. SC can enable compact, low-complexity logic implementations of various arithmetic functions. Spintastic establishes the synergy between stochastic computing and spin-based logic by demonstrating that they mutually alleviate each other\u27s limitations. On the one hand, various building blocks of SC, which incur significant overheads in CMOS implementations, can be efficiently realized by exploiting the physical characteristics of spin devices. On the other hand, the reduced logic complexity and low logic depth of SC circuits alleviates the shortcomings of spintronic logic. Based on this insight, the design of spin-based stochastic arithmetic circuits, bitstream generators, bitstream permuters and stochastic-to-binary converter circuits are presented. Spintastic achieves 7.1X energy reduction over CMOS implementations for a wide range of benchmarks from the image processing, signal processing, and RMS application domains. 5) In order to evaluate the proposed spintronic designs, the thesis describes various device-to-architecture modeling frameworks. Starting with devices models that are calibrated to measurements, the characteristics of spintronic devices are successively abstracted into circuit-level and architectural models, which are incorporated into suitable simulation frameworks. (Abstract shortened by UMI.
    corecore