46 research outputs found
Design and Implementation of MPICH2 over InfiniBand with RDMA Support
For several years, MPI has been the de facto standard for writing parallel
applications. One of the most popular MPI implementations is MPICH. Its
successor, MPICH2, features a completely new design that provides more
performance and flexibility. To ensure portability, it has a hierarchical
structure based on which porting can be done at different levels. In this
paper, we present our experiences designing and implementing MPICH2 over
InfiniBand. Because of its high performance and open standard, InfiniBand is
gaining popularity in the area of high-performance computing. Our study focuses
on optimizing the performance of MPI-1 functions in MPICH2. One of our
objectives is to exploit Remote Direct Memory Access (RDMA) in Infiniband to
achieve high performance. We have based our design on the RDMA Channel
interface provided by MPICH2, which encapsulates architecture-dependent
communication functionalities into a very small set of functions. Starting with
a basic design, we apply different optimizations and also propose a
zero-copy-based design. We characterize the impact of our optimizations and
designs using microbenchmarks. We have also performed an application-level
evaluation using the NAS Parallel Benchmarks. Our optimized MPICH2
implementation achieves 7.6 s latency and 857 MB/s bandwidth, which are
close to the raw performance of the underlying InfiniBand layer. Our study
shows that the RDMA Channel interface in MPICH2 provides a simple, yet
powerful, abstraction that enables implementations with high performance by
exploiting RDMA operations in InfiniBand. To the best of our knowledge, this is
the first high-performance design and implementation of MPICH2 on InfiniBand
using RDMA support.Comment: 12 pages, 17 figure
International Mechanical Engineering Congress and Exposition
ABSTRACT This paper presents our recent investigation on the impact of 3D haptic-augmented learning tools on Dynamics, which is a basic course in most of the engineering education program. Dynamics is considered to be one of the most difficult and nonintuitive courses that engineering students encounter during their undergraduate study because the course combines basic Newtonian physics and various mathematical concepts such as vector algebra, geometry, trigonometry, and calculus and these were applied to dynamical systems. Recent advances in Virtual Reality and robotics enable the human tactual system to be stimulated in a controlled manner through 3-dimensional (3D) force feedback devices, a.k.a. haptic interfaces. In this study, 3D haptic-augmented learning tools are created and used to complement the course materials in Dynamics course. Experiments are conducted with a group of Mechanical Engineering students in the Dynamics class. The assessment result shows that the innovative learning tools: 1) allow the students to interact with virtual objects with force feedback and better understand the abstract concepts by investigating the dynamics responses; 2) stimulate the students' learning interests in understanding the fundamental physics theories
Understanding Storage System Problems and Diagnosing Them Through Log Analysis
99 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2009.Nowadays, over 90% new information produced are stored on hard disk drives. The explosion of data is making storage system a strategic investment priority in the enterprise world. The revenue created by storage system industry steadily increases from 18.4 Billion in 2007. As a key component of enterprise systems, reliable storage systems are critical. However, despite the efforts put into building robust storage systems, as the size and complexity of storage systems have grown to an unprecedented level, storage system problems are common. Unfortunately, many aspects of storage system problems are still not well understood, and most of previous studies only focus on one component - disk drives.To better understand storage system problems, we analyzed the failure characteristics of the core part of storage system - the storage subsystem, which contains disks and all components providing connectivity and usage of disk to the entire storage system. More specifically, we analyzed the storage system logs collected from about 39,000 storage systems commercially deployed at various customer sites. The data set covers a period of 44 months and includes about 1,800,000 disks hosted in about 155,000 storage shelf enclosures. Our study reveals many interesting findings, providing useful guideline for designing reliable storage systems. Some of the major findings include: (1) In addition to disk failures that contribute to 20--55% of storage subsystem failures, other components such as physical interconnects and protocol stacks also account for significant percentages of storage subsystem failures. (2) Each individual storage subsystem failure type and storage subsystem failure as a whole exhibit strong self-correlations. In addition, these failures exhibit bursty patterns. (3) Storage subsystems configured with dual-path interconnects experience 30--40% lower failure rates than those with a single interconnect. (4) Spanning disks of a RAID group across multiple shelves provides a more resilient solution for storage subsystems than within a single shelf.As we found out that storage subsystem problems are far beyond disk failures, we extend the scope of study to various storage system problems, and study the characteristics of storage system problem troubleshooting from various dimensions. Using a large set (636,108) of real world customer problem cases reported from 100,000 commercially deployed storage systems in the last two years, the analysis show that while some problems are either benign, or resolved automatically, many others can take hours or days of manual diagnosis to fix. For modern storage systems, hardware failures and misconfigurations dominate customer cases, but software failures take longer time to resolve. Interestingly, a relatively significant percentage of cases are because customers lack sufficient knowledge about the system. We also evaluate the potential of using storage system logs to resolve these problems. Our analysis shows that a failure message alone is a poor indicator of root cause, and that combining failure messages with multiple log events can improve problem root cause prediction by a factor of three.One key finding is that storage system logs contain useful information for narrowing down the root cause, while they are challenging to analyze manually because they are noisy and the useful log events are often separated by hundreds of irrelevant log events. Motivated by this finding, we designed and implemented an automatic tool, called Log Analyzer, to improve problem troubleshooting process. By applying statistical analysis techniques, the Log Analyzer can automatically infer the dependency relationship between log events, and identify the key log events that capture the essential system states related to storage system problems. By combining classic unsupervised classification techniques - hierarchical clustering with the event ranking techniques, the Log Analyzer can also identify recurrent storage system problems based on similar log patterns, so that previous diagnosis efforts can be systematically retrieved and leveraged. We train the Log Analyze with 18,878 week-long storage system logs and evaluate it with 164 real-world problem cases. The evaluation indicates that the Log Analyzer can effectively reduce the log event number to 3.4%. For most of the 16 real-world problem cases manually annotated with 1--3 key log events, the Log Analyzer accurately ranked the key log events within top 3 without a priori knowledge on how important the events are. For the other 148 problem cases with diagnosis and with root cause information, the Log Analyzer effectively grouped problem cases with the same root cause together with 63--93% accuracy, significantly outperforming other three alternative solutions which only achieve 30--46% accuracy.U of I OnlyRestricted to the U of I community idenfinitely during batch ingest of legacy ETD
Managing Energy-Performance Tradeoffs for Multithreaded Applications on Multiprocessor Architectures ABSTRACT
In modern computers, non-performance metrics such as energy consumption have become increasingly important, requiring tradeoff with performance. A recent work has proposed performance-guaranteed energy management, but it is designed specifically for sequential applications and cannot be used to a large class of multithreaded applications running on high end computers and data servers. To address the above problem, this paper makes the first attempt to provide performance-guaranteed energy management for multithreaded applications on multiprocessor architectures. We first conduct a comprehensive study on the effects of energy adaptation on thread synchronizations and show that a multithreaded application suffers from not only local slowdowns due to energy adaptation, but also significant slowdowns propagated from other threads because of synchronization. Based on these findings, we design thre
DMA-Aware Memory Energy Management
As increasingly larger memories are used to bridge the widening gap between processor and disk speeds, main memory energy consumption is becoming increasingly dominant. Even though much prior research has been conducted on memory energy management, no study has focused on data servers, where main memory is predominantly accessed by DMAs instead of processors. In this paper, we study DMA-aware techniques for memory energy management in data servers. We first characterize the effect of DMA accesses on memory energy and show that, due to the mismatch between memory and I/O bus bandwidths, significant energy is wasted when memory is idle but still active during DMA transfers. To reduce this waste, we propose two novel performance-directed energy management techniques that maximize the utilization of memory devices by increasing the level of concurrency between multiple DMA transfers from different I/O buses to the same memory device. We evaluate our techniques using a detailed trace-driven simulator, and storage and database server traces. The results show that our techniques can effectively minimize the amount of idle energy waste during DMA transfers and, consequently, conserve up to 38.6 % more memory energy than previous approaches while providing similar performance.
Histogram cube: towards lightweight interactive spatiotemporal aggregation of big earth observation data
ABSTRACTIn the era of Earth Observation (EO) big data, interactive spatiotemporal aggregation analysis is a critical tool for exploring geographic patterns. However, existing methods are inefficient and complex. Their interactive performance greatly depends on large-scale computing resources, especially data cube infrastructure. In this study, from a green computing perspective, we propose a lightweight data cube model based on the preaggregation concept, in which the frequency histogram of EO data is employed as a specific measure. The cube space was divided into lattice pyramids by the Google S2 grid system, and histogram statistics of the EO data were injected into in-memory cuboids. Therefore, exploratory aggregation analysis of EO datasets could be rapidly converted into multidimensional-view query processes. We implemented the prototype system on a local PC and conducted a case study of global vegetation index aggregation. The experiments showed that the proposed model is smaller, faster and consumes less energy than ArcGIS Pro and XCube, and facilitates green computing strategies involving a cube infrastructure. Due to the standalone mode, larger dataset will result in longer cube building time with indexing latency. The efficiency of the approach comes at the expense of accuracy, and the inherent uncertainties were examined in this paper
An Extensible Choices System Interface
Traditional OS system interfaces, like the POSIX standard, lack the level of support and ease of programming needed to develop large parallel programs. In this paper, we first describe our vision for an extensible object-oriented system interface designed to cope with the challenges of highlyparallel systems. Ease of maintenance and portability is provided through language-independent interface specification and automatic code generation. We then detail our implementation based on the Choices OS and we show the effectiveness of our design by presenting a comparison between the POSIX thread implementation in Linux and our new system interface. 1