27 research outputs found

    Adaptive Transaction Scheduling for Transactional Memory Systems

    Get PDF
    Transactional memory systems are expected to enable parallel programming at lower programming complexity, while delivering improved performance over traditional lock-based systems. Nonetheless, we observed that there are situations where transactional memory systems could actually perform worse, and that these situations will actually become dominant in future workloads as more and larger-scale trans- actional memory systems are available. Transactional memory systems can excel locks only when the executing workloads contain sufficient parallelism. When the workload lacks the inherent parallelism, blindly launching excessive transactions can adversely result in performance degradation. To quantita- tively demonstrate the issues, we introduce the concept of effective transactions in this paper. We show that the effectiveness of a transaction is closely related to a dynamic quantity we call contention inten- sity. By limiting the contention intensity below the desired level, we can significantly increase transaction effectiveness. Increased effectiveness directly increases the overall performance of a transactional memory system. Based on our study, we implemented a transaction scheduler which not only guarantees that hard- ware transactional memory systems perform better than locks, but also significantly improves performance for both the hardware and software transactional memory systems

    Thermal-aware 3D Microarchitectural Floorplanning

    Get PDF
    Next generation deep submicron processor design will need to take into consideration many performance limiting factors. Flip flops are inserted in order to prevent global wire delay from becoming nonlinear, enabling deeper pipelines and higher clock frequency. The move to 3D ICs will also likely be used to further shorten wirelength. This will cause thermal issues to become a major bottleneck to performance improvement. In this paper we propose a floorplanning algorithm which takes into consideration both thermal issues and profile weighted wirelength using mathematical programming. Our profile-driven objective improves performance by 20% over wirelength-driven. While the thermal-driven objective improves temperature by 24% on average over the profile-driven case

    Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin

    Get PDF
    Recent genomic analyses of pathologically-defined tumor types identify “within-a-tissue” disease subtypes. However, the extent to which genomic signatures are shared across tissues is still unclear. We performed an integrative analysis using five genome-wide platforms and one proteomic platform on 3,527 specimens from 12 cancer types, revealing a unified classification into 11 major subtypes. Five subtypes were nearly identical to their tissue-of-origin counterparts, but several distinct cancer types were found to converge into common subtypes. Lung squamous, head & neck, and a subset of bladder cancers coalesced into one subtype typified by TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes. Of note, bladder cancers split into three pan-cancer subtypes. The multi-platform classification, while correlated with tissue-of-origin, provides independent information for predicting clinical outcomes. All datasets are available for data-mining from a unified resource to support further biological discoveries and insights into novel therapeutic strategies

    Riociguat treatment in patients with chronic thromboembolic pulmonary hypertension: Final safety data from the EXPERT registry

    Get PDF
    Objective: The soluble guanylate cyclase stimulator riociguat is approved for the treatment of adult patients with pulmonary arterial hypertension (PAH) and inoperable or persistent/recurrent chronic thromboembolic pulmonary hypertension (CTEPH) following Phase

    Improving energy and performance of data cache architectures by exploiting memory reference characteristics.

    Full text link
    Minimizing power, increasing performance, and delivering effective memory bandwidth are today's primary microprocessor design goals for the embedded, high-end and multimedia workstation markets. In this dissertation, I will discuss three major data cache architecture design optimization techniques, each of which exploits the data memory reference characteristics of the applications written in high-level languages. Through a better understanding of the memory reference behavior, we can design a system that executes at higher performance, while consuming less energy, and delivering more effective memory bandwidth. The first part of this dissertation presents an in-depth characterization of data memory references, including analysis of semantic region accesses and behavior of data stores. This analysis leads to a new organization of the data cache hierarchy called Region-based Cachelets. Region-based Cachelets are capable of improving memory performance of embedded applications while significantly reducing dynamic energy consumption, resulting in a 50% to 70% improvement in energy-delay product efficiency using this approach. Following this, I will discuss a new cache-like structure, the Stack Value File (or SVF), which boosts performance of general purpose applications by routing stack data references to a separate storage structure optimized for the unique characteristics of the stack reference substream. By utilizing a custom structure for stack references, we are able to increase memory level parallelism, reduce memory latency, and reduce off-chip memory activity. The performance can be improved by 24% by implementing an 8KB SVF for a processor with a dual-ported L1 cache. Finally, I will address memory bandwidth issues by proposing a new write policy called Eager Writeback which can effectively improve overall system performance by shifting the writings of dirty cache lines from on-demand to times when the memory bus is less congested. It lessens the criticality of on-demand misses and improves performance by 6% to 16% for the 3D graphics geometry pipeline.Ph.D.Applied SciencesComputer scienceElectrical engineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/127956/2/3029368.pd

    Toward autonomous computing platforms: system-wide hardware/software performance monitoring and adaptation

    Get PDF
    Issued as final reportCornell Universit

    Chameleon: Virtualizing Idle Acceleration Cores of A Heterogeneous Multi-Core Processor for Caching and Prefetching

    Get PDF
    Heterogeneous multi-core processors have emerged as an energy- and area-efficient architectural solution to improving performance for domain-specific applications such as those with a plethora of data-level parallelism. These processors typically contain a large number of small, compute-centric cores for acceleration while keeping one or two high-performance ILP cores on the die to guarantee single-thread performance. Although a major portion of the transistors are occupied by the acceleration cores, these resources will sit idle when running unparallelized legacy codes or the sequential parts of an application. To address this under-utilization issue, in this paper, we introduce Chameleon, a flexible heterogeneous multi-core architecture to virtualize these resources for enhancing memory performance when running sequential programs. The Chameleon architecture can dynamically virtualize the idle acceleration cores into a last-level cache, a data prefetcher, or a hybrid between these two techniques. In addition, Chameleon can operate in an adaptive mode which dynamically configures the acceleration cores between the hybrid mode and the prefetch-only mode by monitoring the effectiveness of Chameleon caching scheme. In our evaluation using SPEC2006 benchmark suite, different levels of performance improvements were achieved in different modes for different applications. In the case of the adaptive mode, Chameleon improves the performance of SPECint06 and SPECfp06 by 33% and 22% on average. When considering only memory-intensive applications, Chameleon improves the system performance by 53% and 33%

    Choice Predictor for Free

    No full text
    Reducing energy consumption has become the first priority in designing microprocessors for all market segments including embedded, mobile, and high performance processors. The trend of state-of-the-art branch predictor designs such as a hybrid predictor continues to feature more and larger prediction tables, thereby exacerbating the energy consumption. In this paper, we present two novel profile-guided static prediction techniques--- Static Correlation Choice (SCC) prediction and Static Choice (SC) prediction for alleviating the energy consumption without compromising performance. Using our techniques, the hardware choice predictor of a hybrid predictor can be completely eliminated from the processor and replaced with our off-line profiling schemes. Our simulation results show an average 40% power reduction compared to several hybrid predictors. In addition, an average 27% die area can be saved in the branch predictor hardware for other performance features

    Supporting Cache Coherence in Heterogeneous Multiprocessor Systems

    Get PDF
    In embedded system-on-a-chip (SoC) applications, the need for integrating heterogeneous processors in a single chip is increasing. An important issue in integrating heterogeneous processors is how to maintain the coherence of data caches. In this paper, we propose a hardware/software methodology to make caches coherent in heterogeneous multiprocessor platforms with shared memory. Our approach works with any combination of processors that support any invalidation-based protocol. As shown in our simulations, up to 38% speedup can be achieved with a 13-cycle miss penalty at the expense of simple hardware, compared to a pure software solution. Speedup can be improved even further as the miss penalty increases. In addition, our approach provides embedded system programmers a transparent view of shared data, removing the burden of software synchronization

    Towards the Issues in Architectural Support for Protection of Software Execution

    Get PDF
    Recently, there is a growing interest in the research community to employ tamper-resistant processors for software protection. Many of these proposed systems rely on a specially tailored secure processor to prevent 1) illegal software duplication, 2) unauthorized software modification, and 3) unauthorized software reverse engineering. Most of these works primarily focus on the feasibility demonstration and design details rather than trying to elucidate many fundamental issues that are either ``elusive'' or ``confusing'' to the architecture researchers. Furthermore, many proposed systems have been built on assumptions whose security implications have not been well studied or understood. Instead of proposing yet another new secure architecture model, in this paper, we will try to answer some of these fundamental questions with respect to using hardware-based cryptography for protecting software execution. Those issues include, 1) Is hardware cryptography necessary? 2) Is per-process single cryptography key enough to provide the flexibility, inter-operability, and compatibility required by today's complex software system? 3) Is OTP (one-time-pad) in combination with ``lazy" authentication secure enough to protect software confidentiality? 4) Is there way to protect software integrity using less hardware resource? Finally, the paper defines the difference between off-line and on-line attacks and presents a very low overhead security enhancement technique that can improve protection on software integrity over on-line attacks by several magnitudes
    corecore