290 research outputs found

    Contrasting Climate Ensembles: A Model-based Visualization Approach for Analyzing Extreme Events

    Get PDF
    AbstractThe use of increasingly sophisticated means to simulate and observe natural phenomena has led to the production of larger and more complex data. As the size and complexity of this data increases, the task of data analysis becomes more challeng- ing. Determining complex relationships among variables requires new algorithm development. Addressing the challenge of handling large data necessitates that algorithm implementations target high performance computing platforms. In this work we present a technique that allows a user to study the interactions among multiple variables in the same spatial extents as the underlying data. The technique is implemented in an existing parallel analysis and visualization framework in order that it be applicable to the largest datasets. The foundation of our approach is to classify data points via inclusion in, or distance to, multivariate representations of relationships among a subset of the variables of a dataset. We abstract the space in which inclusion is calculated and through various space transformations we alleviate the necessity to consider variables’ scales and distributions when making comparisons. We apply this approach to the problem of highlighting variations in climate model ensembles

    Dynamically Trading Frequency for Complexity in a GALS Microprocessor

    Get PDF
    Microprocessors are traditionally designed to provide “best overall” performance across a wide range of applications and operating environments. Several groups have proposed hardware techniques that save energy by “downsizing” hardware resources that are underutilized by the current application phase. Others have proposed a different energy-saving approach: dividing the processor into domains and dynamically changing the clock frequency and voltage within each domain during phases when the full domain frequency is not required. What has not been studied to date is how to exploit the adaptive nature of these approaches to improve performance rather than to save energy. In this paper, we describe an adaptive globally asynchronous, locally synchronous (GALS) microprocessor with a fixed global voltage and four independently clocked domains. Each domain is streamlined with modest hardware structures for very high clock frequency. Key structures can then be upsized on demand to exploit more distant parallelism, improve branch prediction, or increase cache capacity. Although doing so requires decreasing the associated domain frequency, other domain frequencies are unaffected. Our approach, therefore, is to maximize the throughput of each domain by finding the proper balance between the number of clock periods, and the clock frequency, for each application phase. To achieve this objective, we use novel hardware-based control techniques that accurately and efficiently capture the performance of all possible cache and queue configurations within a single interval, without having to resort to exhaustive online exploration or expensive offline profiling. Measuring across a broad suite of application benchmarks, we find that configuring our adaptive GALS processor just once per application yields 17.6% better performance, on average, than that of the “best overall” fully synchronous design. By adapting automatically to application phases, we can increase this advantage to more than 20%

    Improving Application Performance by Dynamically Balancing Speed and Complexity in a GALS Microprocessor

    Get PDF
    Microprocessors are traditionally designed to provide “best overall” performance across a wide range of applications and operating environments. Several groups have proposed hardware techniques that save energy by “downsizing” hardware resources that are underutilized by particular applications. We explore the converse: “upsizing” hardware resources in order to improve performance relative to an aggressively clocked baseline processor. Our proposal depends critically on the ability to change frequencies independently in separate domains of a globally asynchronous, locally synchronous (GALS) microprocessor. We use a variant of our multiple clock domain (MCD) processor, with four independently clocked domains. Each domain is streamlined with modest hardware structures for very high clock frequency. Key structures can then be upsized on demand to exploit more distant parallelism, improve branch prediction, or increase cache capacity. Although doing so requires decreasing the associated domain frequency, other domain frequencies are unaffected. Measuring across a broad suite of application benchmarks, we find that configuring just once per application increases performance by an average of 17.6% compared to the best fully synchronous design. When adapting to application phases, performance improves by over 20%

    Profile-based Dynamic Voltage and Frequency Scaling for a Multiple Clock Domain Microprocessor

    Get PDF
    A Multiple Clock Domain (MCD) processor addresses the challenges of clock distribution and power dissipation by dividing a chip into several (coarse-grained) clock domains, allowing frequency and voltage to be reduced in domains that are not currently on the application’s critical path. Given a reconfiguration mechanism capable of choosing appropriate times and values for voltage/frequency scaling, an MCD processor has the potential to achieve significant energy savings with low performance degradation. Early work on MCD processors evaluated the potential for energy savings by manually inserting reconfiguration instructions into applications, or by employing an oracle driven by off-line analysis of (identical) prior program runs. Subsequent work developed a hardware-based on-line mechanism that averages 75–85% of the energy-delay improvement achieved via off-line analysis. In this paper we consider the automatic insertion of reconfiguration instructions into applications, using profiledriven binary rewriting. Profile-based reconfiguration introduces the need for “training runs” prior to production use of a given application, but avoids the hardware complexity of on-line reconfiguration. It also has the potential to yield significantly greater energy savings. Experimental results (training on small data sets and then running on larger, alternative data sets) indicate that the profile-driven approach is more stable than hardware-based reconfiguration, and yields virtually all of the energy-delay improvement achieved via off-line analysis

    Malignant melanoma of the rectum: a case report

    Get PDF
    <p>Abstract</p> <p>Introduction</p> <p>Anorectal melanoma represents an unusual but important presentation of rectal malignancy. There have only been a few cases reported and the optimum management for this condition is still undecided, however, prompt diagnosis is essential. We have outlined current treatment options.</p> <p>Case presentation</p> <p>We report a case of malignant melanoma of the rectum in a 55-year-old Caucasian man presenting as an emergency with rectal bleeding. Biopsies were taken of the fleshy mass found on digital examination, which confirmed malignant melanoma. No distant metastases were found. He underwent an abdominoperineal resection. We report the surgical management of this rare and aggressive malignancy.</p> <p>Conclusion</p> <p>Treatment options for this condition are divergent. Surgical management varies from wide local excision to abdominoperineal resection. Clinical awareness in both medical and surgical clinics is required for prompt diagnosis and treatment.</p

    Hiding Synchronization Delays in a GALS Processor Microarchitecture

    Get PDF
    We analyze an Alpha 21264-like Globally–Asynchronous, Locally–Synchronous (GALS) processor organized as a Multiple Clock Domain (MCD) microarchitecture and identify the architectural features of the processor that influence the limited performance degradation measured. We show that the out-oforder superscalar execution features of a processor, which allow traditional instruction execution latency to be hidden, are the same features that reduce the performance degradation impact of the synchronization costs of an MCD processor. In the case of our Alpha 21264-like processor, up to 94% of the MCD synchronization delays are hidden and do not impact overall performance. In addition, we show that by adding out-of-order superscalar execution capabilities to a simpler microarchitecture, such as an Intel StrongARM-like processor, as much as 62% of the performance degradation caused by synchronization delays can be eliminated

    Dynamic Frequency and Voltage Control for a Multiple Clock Domain Microarchitecture

    Get PDF
    We describe the design, analysis, and performance of an on-line algorithm to dynamically control the frequency/voltage of a Multiple Clock Domain (MCD) microarchitecture. The MCD microarchitecture allows the frequency/voltage of micrprocessor regions to be adjusted independently and dynamically, allowing enery savings when the frequency of some regions can be reduced without significantly impacting performance. Our algorithm achieves on average a 19.0% reduction in Energy Per Instriction (EPI), a 3.2% increase in Cycles Per Instruction (CPI), a 16.7% improvement in Energy–Delay Product, and a Power Savings to Performance Degradation ratio of 4.6. Traditional frequency/voltage scaling techniques which apply reductions globally to a fully synchronous processor achieve a Power Savings to Performance Degradation ratio of only 2–3. Our Energy–Delay Product improvement is 85.5% of what has been achieved using an off–line algorithm. These results were achieved using a broad range of applications from the MediaBench, Olden, and Spec2000 benchmark suites using an algorithm we show to require minimal hardware resources

    Dynamic Frequency and Voltage Scaling for a Multiple-Clock-Domain Microprocessor

    Get PDF
    Multiple clock domains is one solution to the increasing problem of propagating the clock signal across increasingly larger and faster chips. The ability to independently scale frequency and voltage in each domain creates a powerful means of reducing power dissipation
    • …
    corecore