4,930 research outputs found
Inter-program Optimizations for Disk Energy Reduction
Compiler support for power and energy management has been shown to be effective in reducing overall power dissipation and energy consumption of programs, for instance through compiler-directed resource hibernation and dynamic frequency and voltage scaling. The multi-programming model with virtual memory presents a virtualized view of the machine such that compilers typically take single programs as input, without the knowledge of other programs that may run at the same time on the target machine. This work investigates the benefits of optimizing sets of programs with the goal of reducing overall disk energy.
The two key ideas are to synchronize the disk accesses across a group of programs thereby allowing longer disk idle periods, and to utilize execution context knowledge to allocate maximal buffer sizes. The compiler inserts runtime system calls for profiling the application and disk, uses execution context in allocating buffers, and synchronizes disk accesses with an inverse barrier policy. Data prefetching has been added to mitigate the overhead of synchronization.
Experimental results are based on three streaming applications and their subsets. The experiments show that inter-program optimizations can have significant disk energy savings over individually optimized programs. Applying the most aggressive inter-program optimizations result in energy savings of up to 49%, and saving 34% on average
A Multilevel Introspective Dynamic Optimization System For Holistic Power-Aware Computing
Power consumption is rapidly becoming the dominant limiting factor for
further improvements in computer design. Curiously, this applies both
at the "high end" of workstations and servers and the "low end" of
handheld devices and embedded computers. At the high-end, the
challenge lies in dealing with exponentially growing power
densities. At the low-end, there is a demand to make mobile devices
more powerful and longer lasting, but battery technology is not
improving at the same
rate that power consumption is rising. Traditional power-management
research is fragmented; techniques are being developed at specific
levels, without fully exploring their synergy with other levels.
Most software techniques target either operating systems or
compilers but do not explore the interaction between the two
layers. These techniques also have not fully explored the potential
of virtual machines for power management.
In contrast, we are developing
a system that integrates information from multiple levels of software
and hardware, connecting these levels through a communication
channel. At the heart of this
system are a virtual machine that compiles and dynamically profiles
code, and an optimizer that reoptimizes
all code, including that of applications and the virtual machine itself.
We believe this introspective, holistic approach
enables more informed power-management decisions
05141 Abstracts Collection -- Power-aware Computing Systems
From 03.04.05 to 08.04.05, the Dagstuhl Seminar 05141 ``Power-aware Computing Systems\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and discussed open problems. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are collected in this paper. The first section
describes the seminar topics and goals.
Links to extended abstracts or full papers are provided, if available
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
An Efficient Cell List Implementation for Monte Carlo Simulation on GPUs
Maximizing the performance potential of the modern day GPU architecture
requires judicious utilization of available parallel resources. Although
dramatic reductions can often be obtained through straightforward mappings,
further performance improvements often require algorithmic redesigns to more
closely exploit the target architecture. In this paper, we focus on efficient
molecular simulations for the GPU and propose a novel cell list algorithm that
better utilizes its parallel resources. Our goal is an efficient GPU
implementation of large-scale Monte Carlo simulations for the grand canonical
ensemble. This is a particularly challenging application because there is
inherently less computation and parallelism than in similar applications with
molecular dynamics. Consistent with the results of prior researchers, our
simulation results show traditional cell list implementations for Monte Carlo
simulations of molecular systems offer effectively no performance improvement
for small systems [5, 14], even when porting to the GPU. However for larger
systems, the cell list implementation offers significant gains in performance.
Furthermore, our novel cell list approach results in better performance for all
problem sizes when compared with other GPU implementations with or without cell
lists.Comment: 30 page
Measuring and Managing Answer Quality for Online Data-Intensive Services
Online data-intensive services parallelize query execution across distributed
software components. Interactive response time is a priority, so online query
executions return answers without waiting for slow running components to
finish. However, data from these slow components could lead to better answers.
We propose Ubora, an approach to measure the effect of slow running components
on the quality of answers. Ubora randomly samples online queries and executes
them twice. The first execution elides data from slow components and provides
fast online answers; the second execution waits for all components to complete.
Ubora uses memoization to speed up mature executions by replaying network
messages exchanged between components. Our systems-level implementation works
for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the
EasyRec Recommendation Engine, and the OpenEphyra question answering system.
Ubora computes answer quality much faster than competing approaches that do not
use memoization. With Ubora, we show that answer quality can and should be used
to guide online admission control. Our adaptive controller processed 37% more
queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor
- …