35 research outputs found

    z/OS Internet Integration

    Get PDF

    High Availability and Scalability of Mainframe Environments using System z and z/OS as example

    Get PDF
    Mainframe computers are the backbone of industrial and commercial computing, hosting the most relevant and critical data of businesses. One of the most important mainframe environments is IBM System z with the operating system z/OS. This book introduces mainframe technology of System z and z/OS with respect to high availability and scalability. It highlights their presence on different levels within the hardware and software stack to satisfy the needs for large IT organizations

    A survey of emerging architectural techniques for improving cache energy consumption

    Get PDF
    The search goes on for another ground breaking phenomenon to reduce the ever-increasing disparity between the CPU performance and storage. There are encouraging breakthroughs in enhancing CPU performance through fabrication technologies and changes in chip designs but not as much luck has been struck with regards to the computer storage resulting in material negative system performance. A lot of research effort has been put on finding techniques that can improve the energy efficiency of cache architectures. This work is a survey of energy saving techniques which are grouped on whether they save the dynamic energy, leakage energy or both. Needless to mention, the aim of this work is to compile a quick reference guide of energy saving techniques from 2013 to 2016 for engineers, researchers and students

    Beyond the socket: NUMA-aware GPUs

    Get PDF
    GPUs achieve high throughput and power efficiency by employing many small single instruction multiple thread (SIMT) cores. To minimize scheduling logic and performance variance they utilize a uniform memory system and leverage strong data parallelism exposed via the programming model. With Moore's law slowing, for GPUs to continue scaling performance (which largely depends on SIMT core count) they are likely to embrace multi-socket designs where transistors are more readily available. However when moving to such designs, maintaining the illusion of a uniform memory system is increasingly difficult. In this work we investigate multi-socket non-uniform memory access (NUMA) GPU designs and show that significant changes are needed to both the GPU interconnect and cache architectures to achieve performance scalability. We show that application phase effects can be exploited allowing GPU sockets to dynamically optimize their individual interconnect and cache policies, minimizing the impact of NUMA effects. Our NUMA-aware GPU outperforms a single GPU by 1.5Ă—, 2.3Ă—, and 3.2Ă— while achieving 89%, 84%, and 76% of theoretical application scalability in 2, 4, and 8 sockets designs respectively. Implementable today, NUMA-aware multi-socket GPUs may be a promising candidate for scaling GPU performance beyond a single socket.We would like to thank anonymous reviewers and Steve Keckler for their help in improving this paper. The first author is supported by the Ministry of Economy and Competitiveness of Spain (TIN2012-34557, TIN2015-65316-P, and BES-2013-063925)Peer ReviewedPostprint (published version

    Boosting performance of transactional memory through transactional read tracking and set associative locks

    Get PDF
    Multi-core processors have become so prevalent in server, desktop, and even embedded systems that they are considered the norm for modem computing systems. The trend is likely toward many-core processors with many more than just 2, 4, or 8 cores per CPU. To benefit from the increasing number of cores per chip, application developers have to develop parallel programs [1]. Traditional lock-based programming is too difficult and error prone for most of programmers and is the domain of experts. Deadlock, race, and other synchronization bugs are some of the challenges of lock-based programming. To make parallel programming mainstream, it is necessary to adapt parallel programming by the majority of programmers and not just experts, and thus simplifying parallel programming has become an important challenge. Transactional Memory (TM) is a promising programming model for managing concurrent accesses to the shared memory locations. Transactional memory allows a programmer to specify a section of a code to be "'transactional", and the underlying system guarantees atomic execution of the code. This simplifies parallel programming and reduces the possibility of synchronization bugs. This thesis develops several software- and hardware-based techniques to improve performance of existing transactional memory systems. The first technique is Transactional Read Tracking (TRT). TRT is a software-based approach that employs a locking mechanism for transactional read and write operations. The performance of TRT depends on memory access patterns of applications. In some cases, TRT falls behind the baseline scheme. To further improve performance of TRT, we introduce two hybrid methods that dynamically switches between TRT and the baseline scheme based on applications’ behavior. The second optimization technique is Set Associative Lock (SAL). Memory locations are mapped to a lock table in order to synchronize accesses to the shared memory locations. Direct mapped lock tables usually result in collision which leads to false aborts. In SAL, we increase associativity of the lock table to reduce false abort. While SAL improves performance in most of the applications, in some cases, it increases execution time due to overhead of lock tables in software. To cope with this problem, we propose Hardware-SAL (HW-SAL) which moves the set associative lock table to the hardware. As such, true power of set associativity will be harnessed without sacrificing performance

    Stretching the capacity of Hardware Transactional Memory in IBM POWER architectures

    Full text link
    The hardware transactional memory (HTM) implementations in commercially available processors are significantly hindered by their tight capacity constraints. In practice, this renders current HTMs unsuitable to many real-world workloads of in-memory databases. This paper proposes SI-HTM, which stretches the capacity bounds of the underlying HTM, thus opening HTM to a much broader class of applications. SI-HTM leverages the HTM implementation of the IBM POWER architecture with a software layer to offer a single-version implementation of Snapshot Isolation. When compared to HTM- and software-based concurrency control alternatives, SI-HTM exhibits improved scalability, achieving speedups of up to 300% relatively to HTM on in-memory database benchmarks

    Spring 2015 Vol. 14 No. 1

    Get PDF
    corecore