5 research outputs found

    FaulTM: Fault-tolerance using hardware transactional memory

    Get PDF
    Fault-tolerance has become an essential concern for processor designers due to increasing soft-error rates. In this study, we are motivated by the fact that Transactional Memory (TM) hardware provides an ideal base upon which to build a fault-tolerant system. We show how it is possible to provide low-cost faulttolerance for serial programs by using a minimallymodified Hardware Transactional Memory (HTM) that features lazy conflict detection, lazy data versioning. This scheme, called FaulTM, employs a hybrid hardware-software fault-tolerance technique. On the software side, FaulTM programming model is able to provide the flexibility for programmers to decide between performance and reliability. Our experimental results indicate that FaulTM produces relatively less performance overhead by reducing the number of comparisons and by leveraging already proposed TM hardware. We also conduct experiments which indicate that the baseline FaulTM design has a good error coverage. To the best of our knowledge, this is the first architectural fault-tolerance proposal using Hardware Transactional Memory.Peer ReviewedPostprint (published version

    The Impact of Non-coherent Buffers on Lazy Hardware Transactional Memory Systems

    Get PDF
    Abstract When supported in silicon, transactional memory (TM

    Dynamically filtering thread-local variables in lazy-lazy hardware transactional memory

    No full text
    Abstract—Transactional Memory (TM) is an emerging technology which promises to make parallel programming easier. However, to be efficient, underlying TM system should protect only true shared data and leave thread-local data out of the transaction. This speed-up the commit phase of the transaction which is a bottleneck for a lazily versioned HTM. This paper proposes a scheme in the context of a lazylazy (lazy conflict detection and lazy data versioning) Hardware Transactional Memory (HTM) system to identify dynamically variables which are local to a thread and exclude them from the commitset of the transaction. Our proposal covers sharing of both stack and heap but also filters out local accesses to both of them. We also propose, in the same scheme, to identify local variables for which versioning need not be maintained. For evaluation, we have implemented a lazy-lazy model of HTM in line with the conventional and the scalable version of the TCC in a full system simulator. For operating system, we have modified the Linux kernel. We got an average speed-up of 31 % for the conventional TCC, on applications from the STAMP benchmark suite. For the scalable TCC we got an average speedup of 16%. Also, we found that on average 99 % of the local variables can be safely omitted when recording their old values to handle aborts

    Designs for increasing reliability while reducing energy and increasing lifetime

    Get PDF
    In the last decades, the computing technology experienced tremendous developments. For instance, transistors' feature size shrank to half at every two years as consistently from the first time Moore stated his law. Consequently, number of transistors and core count per chip doubles at each generation. Similarly, petascale systems that have the capability of processing more than one billion calculation per second have been developed. As a matter of fact, exascale systems are predicted to be available at year 2020. However, these developments in computer systems face a reliability wall. For instance, transistor feature sizes are getting so small that it becomes easier for high-energy particles to temporarily flip the state of a memory cell from 1-to-0 or 0-to-1. Also, even if we assume that fault-rate per transistor stays constant with scaling, the increase in total transistor and core count per chip will significantly increase the number of faults for future desktop and exascale systems. Moreover, circuit ageing is exacerbated due to increased manufacturing variability and thermal stresses, therefore, lifetime of processor structures are becoming shorter. On the other side, due to the limited power budget of the computer systems such that mobile devices, it is attractive to scale down the voltage. However, when the voltage level scales to beyond the safe margin especially to the ultra-low level, the error rate increases drastically. Nevertheless, new memory technologies such as NAND flashes present only limited amount of nominal lifetime, and when they exceed this lifetime, they can not guarantee storing of the data correctly leading to data retention problems. Due to these issues, reliability became a first-class design constraint for contemporary computing in addition to power and performance. Moreover, reliability even plays increasingly important role when computer systems process sensitive and life-critical information such as health records, financial information, power regulation, transportation, etc. In this thesis, we present several different reliability designs for detecting and correcting errors occurring in processor pipelines, L1 caches and non-volatile NAND flash memories due to various reasons. We design reliability solutions in order to serve three main purposes. Our first goal is to improve the reliability of computer systems by detecting and correcting random and non-predictable errors such as bit flips or ageing errors. Second, we aim to reduce the energy consumption of the computer systems by allowing them to operate reliably at ultra-low voltage level. Third, we target to increase the lifetime of new memory technologies by implementing efficient and low-cost reliability schemes
    corecore