324 research outputs found

    Energy-efficient and high-performance lock speculation hardware for embedded multicore systems

    Full text link
    Embedded systems are becoming increasingly common in everyday life and like their general-purpose counterparts, they have shifted towards shared memory multicore architectures. However, they are much more resource constrained, and as they often run on batteries, energy efficiency becomes critically important. In such systems, achieving high concurrency is a key demand for delivering satisfactory performance at low energy cost. In order to achieve this high concurrency, consistency across the shared memory hierarchy must be accomplished in a cost-effective manner in terms of performance, energy, and implementation complexity. In this article, we propose Embedded-Spec, a hardware solution for supporting transparent lock speculation, without the requirement for special supporting instructions. Using this approach, we evaluate the energy consumption and performance of a suite of benchmarks, exploring a range of contention management and retry policies. We conclude that for resource-constrained platforms, lock speculation can provide real benefits in terms of improved concurrency and energy efficiency, as long as the underlying hardware support is carefully configured.This work is supported in part by NSF under Grants CCF-0903384, CCF-0903295, CNS-1319495, and CNS-1319095 as well the Semiconductor Research Corporation under grant number 1983.001. (CCF-0903384 - NSF; CCF-0903295 - NSF; CNS-1319495 - NSF; CNS-1319095 - NSF; 1983.001 - Semiconductor Research Corporation

    Analyzing the Impact of Concurrency on Scaling Machine Learning Programs Using TensorFlow

    Get PDF
    In recent times, computer scientists and technology companies have quickly begun to realize that machine learning and creating computer software that is capable of reasoning for itself (at least in theory). What was once only considered science fiction lore is now becoming a reality in front of our very eyes. With this type of computational capability at our disposal, we are left with the question of how best to use it and where to start in creating models that can help us best utilize it. TensorFlow is an open source software library used in machine learning developed and released by Google. It was created by the company in order to help them meet their expanding needs to train systems that can build and detect neural networks for pattern recognition that could be used in their services. It was first released by the Google Brain Team in November 2015 and, at the time of the preparation of this research, the project is still being heavily developed by programmers and researchers both inside of Google and around the world. Thus, it is very possible that some future releases of the software package could remove and/or replace some current capabilities. The point of this thesis is to examine how machine learning programs written with TensorFlow that do not scale well (such as large-scale neural networks) can be made more scalable by using concurrency and distribution of computation among threads. To do this, we will be using lock elision using conditional variables and locking mechanisms (such as semaphores) to allow for smooth distribution of resources to be used by the architecture. We present the trial runs and results of the added implementations and where the results fell short of optimistic expectation. Although TensorFlow is still a work in progress, we will also address where this framework was insufficient in addressing the needs of programmers attempting to write scalable code and whether this type of implementation is sustainable

    Tailoring Transactional Memory to Real-World Applications

    Get PDF
    Transactional Memory (TM) promises to provide a scalable mechanism for synchronizationin concurrent programs, and to offer ease-of-use benefits to programmers. Since multiprocessorarchitectures have dominated CPU design, exploiting parallelism in program

    An Adaptive Middleware for Improved Computational Performance

    Get PDF

    Applying HTM to an OLTP System: No Free Lunch

    Get PDF
    Transactional memory is a promising way for implementing efficient synchronization mechanisms for multicore processors. Intel's introduction of hardware transactional memory (HTM) into their Haswell line of processors marks an important step toward mainstream availability of transactional memory. Transaction processing systems require execution of dozens of critical sections to insure isolation among threads, which makes them one of the target applications for exploiting HTM. In this study, we quantify the opportunities and limitations of directly applying HTM to an existing OLTP system that uses fine-grained synchronization. Our target is Shore-MT, a modern multithreaded transactional storage manager that uses a variety of fine-grained synchronization mechanisms to provide scalability on multicore processors. We find that HTM can improve performance of the TATP workload by 13-17% when applied judiciously. However, attempting to replace all synchronization reduces performance compared to the baseline case due to high percentage of aborts caused by the limitations of the current HTM implementation. Copyright 2015 ACM
    • …
    corecore