95 research outputs found

    Studies in Exascale Computer Architecture: Interconnect, Resiliency, and Checkpointing

    Full text link
    Today’s supercomputers are built from the state-of-the-art components to extract as much performance as possible to solve the most computationally intensive problems in the world. Building the next generation of exascale supercomputers, however, would require re-architecting many of these components to extract over 50x more performance than the current fastest supercomputer in the United States. To contribute towards this goal, two aspects of the compute node architecture were examined in this thesis: the on-chip interconnect topology and the memory and storage checkpointing platforms. As a first step, a skeleton exascale system was modeled to meet 1 exaflop of performance along with 100 petabytes of main memory. The model revealed that large kilo-core processors would be necessary to meet the exaflop performance goal; existing topologies, however, would not scale to those levels. To address this new challenge, we investigated and proposed asymmetric high-radix topologies that decoupled local and global communications and used different radix routers for switching network traffic at each level. The proposed topologies scaled more readily to higher numbers of cores with better latency and energy consumption than before. The vast number of components that the model revealed would be needed in these exascale systems cautioned towards better fault tolerance mechanisms. To address this challenge, we showed that local checkpoints within the compute node can be saved to a hybrid DRAM and SSD platform in order to write them faster without wearing out the SSD or consuming a lot of energy. A hybrid checkpointing platform allowed more frequent checkpoints to be made without sacrificing performance. Subsequently, we proposed switching to a DIMM-based SSD in order to perform fine-grained I/O operations that would be integral in interleaving checkpointing and computation while still providing persistence guarantees. Two more techniques that consolidate and overlap checkpointing were designed to better hide the checkpointing latency to the SSD.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137096/1/sabeyrat_1.pd

    Architectural Techniques for Multi-Level Cell Phase Change Memory Based Main Memory

    Get PDF
    Phase change memory (PCM) recently has emerged as a promising technology to meet the fast growing demand for large capacity main memory in modern computing systems. Multi-level cell (MLC) PCM storing multiple bits in a single cell offers high density with low per-byte fabrication cost. However, PCM suffers from long write latency, short cell endurance, limited write throughput and high peak power, which makes it challenging to be integrated in the memory hierarchy. To address the long write latency, I propose write truncation to reduce the number of write iterations with the assistance of an extra error correction code (ECC). I also propose form switch (FS) to reduce the storage overhead of the ECC. By storing highly compressible lines in single level cell (SLC) form, FS improves read latency as well. To attack the short cell endurance and large peak power, I propose elastic RESET (ER) to construct triple-level cell PCM. By reducing RESET energy, ER significantly reduces peak power and prolongs PCM lifetime. To improve the write concurrency, I propose fine-grained write power budgeting (FPB) observing a global power budget and regulates power across write iterations according to the step-down power demand of each iteration. A global charge pump is also integrated onto a DIMM to boost power for hot PCM chips while staying within the global power budget. To further reduce the peak power, I propose intra-write RESET scheduling distributing cell RESET initializations in the whole write operation duration, so that the on-chip charge pump size can also be reduced

    Improving Non-Volatile Memory Lifetime through Temporal Wear-Limiting

    Get PDF
    Non-volatile memory technologies provide a low-power, high-density alternative to traditional DRAM main memories, yet all suffer from some degree of limited write endurance. The non-uniformity of write traffic exacerbates this limited endurance, causing write-induced wear to concentrate on a few specific lines. Wear-leveling attempts to mitigate this issue by distributing write-induced wear uniformly across the memory. Orthogonally, wear-limiting attempts to increase memory lifetime by directly reducing wear. In this paper, we present the concept of temporal wear-limiting, in which we exploit the trade-off between write latency and memory lifetime. Using a history of the slack between per-bank write operations, we predict future write latency, allowing for up to a 1.5x memory lifetime improvement. We present two extensions for improving the effectiveness of this history-based mechanism: a method for dynamically determining the optimum history size, and a method for increasing lifetime improvement through address prediction

    Enhanced Automated Heterogeneous Data Duplication Model Using Parallel Data Compression And Sorting Technique

    Get PDF
    A duplicator machine aims to improve the time taken for duplication or data transfer. The process of duplication is done by copying each data bit from the source (master) device to the slaves including the unused memory region. However, to duplicate a 64GB Embedded Multimedia Card (eMMC) memory is usually very time consuming which takes between 2 hours to 7 hours. In addition, the product speed specification promised by the vendor is different from what they claimed to be when it is tested in real life. Moreover, bigger data creates a transmission problem, causing delay during data duplication. Consequently, this will reduce duplication performance in terms of duration. Therefore, this study was proposed to enhance the duplication technique duration. This was achieved by adopting data storage and transmission concepts through sorting and compression techniques. Parallel technique was adopted to enhance data duplication process using multiple slaves. The impact of data type and data structure to the duplication performance was also studied. Four experiments were conducted by using the same size of heterogeneous digital data (i.e. document, picture, audio and movie). Overall, the results showed that data duplication process using different data type render a different duration. The proposed technique has reduced time consumption by 20% to 50% during data duplication depending on the technique and the environment of local and across devices

    EDAC software implementation to protect small satellites memory

    Get PDF
    Radiation is a well-known problem for satellites in space. It can produce different negative effects on electronic components which can provoke errors and failures. Therefore, mitigating these effects is especially important for the success of space missions. One of the techniques to increase the reliability of memory chips and reduce transient errors and permanent faults is Error Detection and Correction (EDAC). EDAC codes are characterised by the use of redundancy to detect and correct errors. This final project consists in the implementation of a software EDAC algorithm to protect the main memory of a microcontroller. The implementation requirements and the issues of software EDAC are described and the test results are commented

    Reconciliation of essential process parameters for an enhanced predictability of Arctic stratospheric ozone loss and its climate interactions : (RECONCILE) ; activities and results

    Get PDF
    The international research project RECONCILE has addressed central questions regarding polar ozone depletion, with the objective to quantify some of the most relevant yet still uncertain physical and chemical processes and thereby improve prognostic modelling capabilities to realistically predict the response of the ozone layer to climate change. This overview paper outlines the scope and the general approach of RECONCILE, and it provides a summary of observations and modelling in 2010 and 2011 that have generated an in many respects unprecedented dataset to study processes in the Arctic winter stratosphere. Principally, it summarises important outcomes of RECONCILE including (i) better constraints and enhanced consistency on the set of parameters governing catalytic ozone destruction cycles, (ii) a better understanding of the role of cold binary aerosols in heterogeneous chlorine activation, (iii) an improved scheme of polar stratospheric cloud (PSC) processes that includes heterogeneous nucleation of nitric acid trihydrate (NAT) and ice on non-volatile background aerosol leading to better model parameterisations with respect to denitrification, and (iv) long transient simulations with a chemistry-climate model (CCM) updated based on the results of RECONCILE that better reproduce past ozone trends in Antarctica and are deemed to produce more reliable predictions of future ozone trends. The process studies and the global simulations conducted in RECONCILE show that in the Arctic, ozone depletion uncertainties in the chemical and microphysical processes are now clearly smaller than the sensitivity to dynamic variability

    Lightweight Cryptography for Passive RFID Tags

    Get PDF
    corecore