21 research outputs found

    LASER: Light, Accurate Sharing dEtection and Repair

    Get PDF
    Contention for shared memory, in the forms of true sharing and false sharing, is a challenging performance bug to discover and to repair. Understanding cache contention requires global knowledge of the program\u27s actual sharing behavior, and can even arise invisibly in the program due to the opaque decisions of the memory allocator. Previous schemes have focused only on false sharing, and impose significant performance penalties or require non-trivial alterations to the operating system or runtime system environment. This paper presents the Light, Accurate Sharing dEtection and Repair (LASER) system, which leverages new performance counter capabilities available on Intel\u27s Haswell architecture that identify the source of expensive cache coherence events. Using records of these events generated by the hardware, we build a system for online contention detection and repair that operates with low performance overhead and does not require any invasive program, compiler or operating system changes. Our experiments show that LASER imposes just 2% average runtime overhead on the Phoenix, Parsec and Splash2x benchmarks. LASER can automatically improve the performance of programs by up to 19% on commodity hardware

    Achieving sustainable quality in maternity services – using audit of incontinence and dyspareunia to identify shortfalls in meeting standards

    Get PDF
    BACKGROUND: Some complications of childbirth (for example, faecal incontinence) are a source of social embarrassment for women, and are often under reported. Therefore, it was felt important to determine levels of complications (against established standards) and to consider obstetric measures aimed at reducing them. METHODS: Clinical information was collected on 1036 primiparous women delivering at North and South Staffordshire Acute and Community Trusts over a 5-month period in 1997. A questionnaire was sent to 970 women which included self-assessment of levels of incontinence and dyspareunia prior to pregnancy, at 6 weeks post delivery and 9 to 14 months post delivery. RESULTS: The response rate was 48%(470/970). Relatively high levels of obstetric interventions were found. In addition, the rates of instrumental deliveries differed between the two hospitals. The highest rates of postnatal symptoms had occurred at 6 weeks, but for many women problems were still present at the time of the survey. At 9–14 months high rates of dyspareunia (29%(102/347)) and urinary incontinence (35%(133/382)) were reported. Seventeen women (4%) complained of faecal incontinence at this time. Similar rates of urinary incontinence and dyspareunia were seen regardless of mode of delivery. CONCLUSION: Further work should be undertaken to reduce the obstetric interventions, especially instrumental deliveries. Improvements in a number of areas of care should be undertaken, including improved patient information, improved professional communication and improved professional recognition and management of third degree tears. It is likely that these measures would lead to a reduction in incontinence and dyspareunia after childbirth

    Inadequate prenatal care and its association with adverse pregnancy outcomes: A comparison of indices

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The objectives of this study were to determine rates of prenatal care utilization in Winnipeg, Manitoba, Canada from 1991 to 2000; to compare two indices of prenatal care utilization in identifying the proportion of the population receiving inadequate prenatal care; to determine the association between inadequate prenatal care and adverse pregnancy outcomes (preterm birth, low birth weight [LBW], and small-for-gestational age [SGA]), using each of the indices; and, to assess whether or not, and to what extent, gestational age modifies this association.</p> <p>Methods</p> <p>We conducted a population-based study of women having a hospital-based singleton live birth from 1991 to 2000 (N = 80,989). Data sources consisted of a linked mother-baby database and a physician claims file maintained by Manitoba Health. Rates of inadequate prenatal care were calculated using two indices, the R-GINDEX and the APNCU. Logistic regression analysis was used to determine the association between inadequate prenatal care and adverse pregnancy outcomes. Stratified analysis was then used to determine whether the association between inadequate prenatal care and LBW or SGA differed by gestational age.</p> <p>Results</p> <p>Rates of inadequate/no prenatal care ranged from 8.3% using APNCU to 8.9% using R-GINDEX. The association between inadequate prenatal care and preterm birth and LBW varied depending on the index used, with adjusted odds ratios (AOR) ranging from 1.0 to 1.3. In contrast, both indices revealed the same strength of association of inadequate prenatal care with SGA (AOR 1.4). Both indices demonstrated heterogeneity (non-uniformity) across gestational age strata, indicating the presence of effect modification by gestational age.</p> <p>Conclusion</p> <p>Selection of a prenatal care utilization index requires careful consideration of its methodological underpinnings and limitations. The two indices compared in this study revealed different patterns of utilization of prenatal care, and should not be used interchangeably. Use of these indices to study the association between utilization of prenatal care and pregnancy outcomes affected by the duration of pregnancy should be approached cautiously.</p

    Programming Abstractions for Data Locality

    Get PDF
    The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal

    Node Labeling

    No full text
    This document describes a scheme for labeling a program dependence graph (PDG) or control flow graph (CFG) in order to codify the hierarchical control dependence structure of a procedure. The first section describes the scheme and its features. Section 2 provides some applications for the labels. Section 3 proves the correctness of the label relations described in Subsection 2.1. Section 4 shows statistics about labels and program structure for several SPEC benchmarks

    Data Flow Terminology and Representations

    No full text
    This document provides a brief tutorial on the representation of data dependences between individual instructions and of the flow of data through the program. 1. Terminolog

    Automatic Partitioning of Signal Processing Programs for Symmetric Multiprocessors

    No full text
    Symmetric multiprocessor systems are increasingly common, not only as servers, but as a vehicle for executing a single application in parallel in order to reduce its execution latency. This paper presents PEDIGREE, a compilation tool that employs a new partitioning heuristic based on the program dependence graph (PDG). PEDIGREE creates overlapping inter-dependent threads, each executing on a subset of the SMP&apos;s processors that matches the thread&apos;s available parallelism. A unified framework is used to build threads from procedures, loop nests, loop iterations, and smaller constructs. PEDIGREE does not require any parallel language support; it is a post-compilation tool that reads in object code. The SDIO Signal and Data Processing Benchmark Suite has been selected as an example of realtime, latency-sensitive code. Its coarse-grained data flow parallelism is exploited by PEDIGREE to achieve speedups of 1.56x/2.11x (mean/max) and 1.61x/2.60x on two and four processors, respectively. There..

    Compiler Support for Low-Cost Synchronization Among Threads

    No full text
    : Traditional compilation techniques for synchronization have targeted architectures with relatively high synchronization overhead, and have been used to synchronize loops or different processes at a coarse granularity. Processors will soon be available that have multiple, tightly-coupled instruction streams on a single chip, and these processors will support the exploitation of finer-grained parallelism among threads of a single process, e.g. through simultaneous multithreading. This paper proposes synchronization techniques for such architectures. These techniques are unique in their support for architectures with extremely low synchronization overhead and for overlapping different loop nests. The proposed techniques account for underlying communication costs, and make trade-offs between reducing synchronization overhead and maximizing parallelism. These techniques have been implemented in the Pedigree post-pass compiler. 1. Introduction Multiprocessors have traditionally been phys..
    corecore