Search CORE

279 research outputs found

On the state of public health in England.

Author: Davies Sally C
Fowler Tom
McKee Martin
Publication venue: 'Elsevier BV'
Publication date: 01/11/2012
Field of study

Crossref

LSHTM Research Online

Hardware-only stream prediction + cache prefetching + dynamic access ordering

Author: Mckee Sally A.
Zhang Chengqiang
Publication venue: University of Utah
Publication date: 01/01/1999
Field of study

Journal ArticleThe speed gap between processors and memory system is becoming the performance bottleneck for many applications, and computations with strided access patterns are among those that suffer most. The vectors used in such applications lack temporal and often spatial locality, and are usually too large to cache. In spite of their poor cache behavior, these access patterns have the advantage of being, predictable, which can be exploited to improve the efficiency of the memory subsystem. As a promising technique to relieve memory system bottleneck, prefetching has been studied in its various forms, and so is dynamic memory scheduling. This study builds on these results, combining a stride-based reference prediction table, a mechanism that prefetches L2 cache lines, and a memory controller that dynamically schedules accesses to a Direct Rambus memory subsystem. We find that such a system delivers impressive speedups for scientific applications with regular access patterns (reducing execution time by almost a factor of two) without negatively affecting the performance of non-streaming programs

The University of Utah: J. Willard Marriott Digital Library

Characterizing and Subsetting Big Data Workloads

Author: Han Rui
Jia Zhen
Li Jingwei
Luo Chunjie
McKee Sally A.
Wang Lei
Yang Qiang
Zhan Jianfeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks poses great challenges for the architecture community. First, we need to thoroughly understand the behaviors of a variety of workloads. Second, our usual simulation-based research methods become prohibitively expensive for big data. As big data is an emerging field, more and more software stacks are being proposed to facilitate the development of big data applications, which aggravates hese challenges. In this paper, we first use Principle Component Analysis (PCA) to identify the most important characteristics from 45 metrics to characterize big data workloads from BigDataBench, a comprehensive big data benchmark suite. Second, we apply a clustering technique to the principle components obtained from the PCA to investigate the similarity among big data workloads, and we verify the importance of including different software stacks for big data benchmarking. Third, we select seven representative big data workloads by removing redundant ones and release the BigDataBench simulation version, which is publicly available from http://prof.ict.ac.cn/BigDataBench/simulatorversion/.Comment: 11 pages, 6 figures, 2014 IEEE International Symposium on Workload Characterizatio

arXiv.org e-Print Archive

Crossref

Chalmers Research

Heated aquatic microcosms for climate change experiments

Author: Atkinson David
Collings Sally
Eaton John
Harvey Ian
Hatton Keith
Heyes Tom
McKee Dermot
Moss Brian
Wilson Dave
Wolstenholme Leander
Publication venue
Publication date: 01/01/2000
Field of study

Ponds and shallow lakes are likely to be strongly affected by climate change, and by increase in environmental temperature in particular. Hydrological regimes and nutrient cycling may be altered, plant and animal communities may undergo changes in both composition and dynamics, and long-term and difficult to reverse switches between alternative stable equilibria may occur. A thorough understanding of the potential effects of increased temperature on ponds and shallow lakes is desirable because these ecosystems are of immense importance throughout the world as sources of drinking water, and for their amenity and conservation value. This understanding can only come through experimental studies in which the effects of different temperature regimes are compared. This paper reports design details and operating characteristics of a recently constructed experimental facility consisting of 48 aquatic microcosms which mimic the pond and shallow lake environment. Thirty-two of the microcosms can be heated and regulated to simulate climate change scenarios, including those predicted for the UK. The authors also summarise the current and future experimental uses of the microcosms

Aquatic Commons

An approach to resource-aware coscheduling for cmps.

Author: Major Bhadauria
Sally A Mckee
Publication venue
Publication date: 01/01/2010
Field of study

ABSTRACT We develop real-time scheduling techniques for improving performance and energy for multiprogrammed workloads that scale nonuniformly with increasing thread counts. Multithreaded programs generally deliver higher throughput than single-threaded programs on chip multiprocessors, but performance gains from increasing threads decrease when there is contention for shared resources. We use analytic metrics to derive local search heuristics for creating efficient multiprogrammed, multithreaded workload schedules. Programs are allocated fewer cores than requested, and scheduled to space-share the CMP to improve global throughput. Our holistic approach attempts to co-schedule programs that complement each other with respect to shared resource consumption. We find application co-scheduling for performance and energy in a resource-aware manner achieves better results than solely targeting total throughput or concurrently co-scheduling all programs. Our schedulers improve overall energy delay (E*D) by a factor of 1.5 over time-multiplexed gang scheduling

CiteSeerX

Chalmers Research

Chalmers Publication Library

Main memory in HPC: do we need more, or could we live with less?

Author: Ayguadé Parra Eduard
Carpenter Paul M.
McKee Sally A.
Pavlovic Milan
Radojković Petar
Radulović Milan
Shin Hyunsung
Son Jongpil
Živanovič Darko
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

An important aspect of High-Performance Computing (HPC) system design is the choice of main memory capacity. This choice becomes increasingly important now that 3D-stacked memories are entering the market. Compared with conventional Dual In-line Memory Modules (DIMMs), 3D memory chiplets provide better performance and energy efficiency but lower memory capacities. Therefore, the adoption of 3D-stacked memories in the HPC domain depends on whether we can find use cases that require much less memory than is available now. This study analyzes the memory capacity requirements of important HPC benchmarks and applications. We find that the High-Performance Conjugate Gradients (HPCG) benchmark could be an important success story for 3D-stacked memories in HPC, but High-Performance Linpack (HPL) is likely to be constrained by 3D memory capacity. The study also emphasizes that the analysis of memory footprints of production HPC applications is complex and that it requires an understanding of application scalability and target category, i.e., whether the users target capability or capacity computing. The results show that most of the HPC applications under study have per-core memory footprints in the range of hundreds of megabytes, but we also detect applications and use cases that require gigabytes per core. Overall, the study identifies the HPC applications and use cases with memory footprints that could be provided by 3D-stacked memory chiplets, making a first step toward adoption of this novel technology in the HPC domain.This work was supported by the Collaboration Agreement between Samsung Electronics Co., Ltd. and BSC, Spanish Government through Severo Ochoa programme (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project and by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). This work has also received funding from the European Union’s Horizon 2020 research and innovation programme under ExaNoDe project (grant agreement No 671578). Darko Zivanovic holds the Severo Ochoa grant (SVP-2014-068501) of the Ministry of Economy and Competitiveness of Spain. The authors thank Harald Servat from BSC and Vladimir Marjanovi´c from High Performance Computing Center Stuttgart for their technical support.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Chalmers Research

Code density concerns for new architectures

Author: McKee Sally A
Weaver Vincent M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Reducing a program\u27s instruction count can improve cache behavior and bandwidth utilization, lower power consumption, and increase overall performance. Nonetheless, code density is an often overlooked feature in studying processor architectures. We hand-optimize an assembly language embedded benchmark for size on 21 different instruction set architectures, finding up to a factor of three difference in code sizes from ISA alone. We find that the architectural features that contribute most heavily to code density are instruction length, number of registers, availability of a zero register, bit-width, hardware divide units, number of instruction operands, and the availability of unaligned loads and stores. We extend our results to investigate operating system, compiler, and system library effects on code density. We find that the executable starting address, executable format, and system call interface all affect program size. While ISA effects are important, the efficiency of the entire system stack must be taken into account when developing a new dense instruction set architecture

CiteSeerX

Crossref

Chalmers Research

Automatic Loop Parallelization via Compiler Guided Refactoring

Author: Karlsson Sven
Ladelsky Razya
Larsen Per
Lidman Jacob
McKee Sally A.
Zaks Ayal
Publication venue: Technical University of Denmark
Publication date: 01/01/2011
Field of study

Online Research Database In Technology