6,333 research outputs found
Recommended from our members
Statement of Work for Studies in BlueGene/L Scalability and Reconfigurability
As referenced in the subcontract, the work included three major goals: (1) study the performance of an ASCI application, (2) study tradeoffs in using the second CPU in coprocessor mode to optimize use of the L3 scratchpad memory for performing vector-like gather/scatter and streamlining operations, and (3) perform simulator studies of hardware phase detection and identification. We made some modifications to the work contract. Work involving the integration of a cache-conscious data placement algorithm to improve cache utilization on BlueGene/L has been added and work involving the L3 scratchpad memory has been eliminated. This was explained in the previous milestones. In this milestone, we continue to focus on the last goal by modifying a cycle-accurate simulator, sim-alpha [4]. As premise to hardware phase detection and identification, we need to have an infrastructure for testing various cache-conscious data placement methods. For this milestone, we discuss the completed framework that handles cache-conscious placement optimizations, which includes profiling data accesses and handling remapped addresses. We will also introduce an algorithm (ccdp profiling tool) that we implemented for assigning remapped addresses for a given code. Our performance results show that by using our ccdp profiling tool, we achieve reduced miss rates and an improved overall simulation performance. For our test cases, we use four applications from the SPEC CPU 2000 suite [2]. In our past milestones, we studied research that involves implementing cache-conscious data placement techniques. By becoming more familiar with previous research, we can make better decisions on designing our cache-conscious profiling tool. It is important to have a firm understanding of the existing techniques that have proven to be efficient at improving memory performance, since our tool will produce trace files as input to our enhanced simulator framework
Optimal Content Placement for En-Route Web Caching
This paper studies the optimal placement of web files for en-route web caching. It is shown that existing placement policies are all solving restricted partial problems of the file placement problem, and therefore give only sub-optimal solutions. A dynamic programming algorithm of low complexity which computes the optimal solution is presented. It is shown both analytically and experimentally that the file-placement solution output by our algorithm outperforms existing en-route caching policies. The optimal placement of web files can be implemented with a reasonable level of cache coordination and management overhead for en-route caching; and importantly, it can be achieved with or without using data prefetching
Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)
Database management systems (DBMSs) carefully optimize complex multi-join
queries to avoid expensive disk I/O. As servers today feature tens or hundreds
of gigabytes of RAM, a significant fraction of many analytic databases becomes
memory-resident. Even after careful tuning for an in-memory environment, a
linear disk I/O model such as the one implemented in PostgreSQL may make query
response time predictions that are up to 2X slower than the optimal multi-join
query plan over memory-resident data. This paper introduces a memory I/O cost
model to identify good evaluation strategies for complex query plans with
multiple hash-based equi-joins over memory-resident data. The proposed cost
model is carefully validated for accuracy using three different systems,
including an Amazon EC2 instance, to control for hardware-specific differences.
Prior work in parallel query evaluation has advocated right-deep and bushy
trees for multi-join queries due to their greater parallelization and
pipelining potential. A surprising finding is that the conventional wisdom from
shared-nothing disk-based systems does not directly apply to the modern
shared-everything memory hierarchy. As corroborated by our model, the
performance gap between the optimal left-deep and right-deep query plan can
grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in
SoCC'1
An Energy-conscious Transport Protocol for Multi-hop Wireless Networks
We present a transport protocol whose goal is to reduce power consumption without compromising delivery requirements of applications. To meet its goal of energy efficiency, our transport protocol (1) contains mechanisms to balance end-to-end vs. local retransmissions; (2) minimizes acknowledgment traffic using receiver regulated rate-based flow control combined with selected acknowledgements and in-network caching of packets; and (3) aggressively seeks to avoid any congestion-based packet loss. Within a recently developed ultra low-power multi-hop wireless network system, extensive simulations and experimental results demonstrate that our transport protocol meets its goal of preserving the energy efficiency of the underlying network.Defense Advanced Research Projects Agency (NBCHC050053
- …