34 research outputs found

    ATCOM: Automatically tuned collective communication system for SMP clusters.

    Get PDF
    Conventional implementations of collective communications are based on point-to-point communications, and their optimizations have been focused on efficiency of those communication algorithms. However, point-to-point communications are not the optimal choice for modern computing clusters of SMPs due to their two-level communication structure. In recent years, a few research efforts have investigated efficient collective communications for SMP clusters. This dissertation is focused on platform-independent algorithms and implementations in this area;There are two main approaches to implementing efficient collective communications for clusters of SMPs: using shared memory operations for intra-node communications, and over-lapping inter-node/intra-node communications. The former fully utilizes the hardware based shared memory of an SMP, and the latter takes advantage of the inherent hierarchy of the communications within a cluster of SMPs. Previous studies focused on clusters of SMP from certain vendors. However, the previously proposed methods are not portable to other systems. Because the performance optimization issue is very complicated and the developing process is very time consuming, it is highly desired to have self-tuning, platform-independent implementations. As proven in this dissertation, such an implementation can significantly outperform the other point-to-point based portable implementations and some platform-specific implementations;The dissertation describes in detail the architecture of the platform-independent implementation. There are four system components: shared memory-based collective communications, overlapping mechanisms for inter-node and intra-node communications, a prediction-based tuning module and a micro-benchmark based tuning module. Each component is carefully designed with the goal of automatic tuning in mind

    Tackling component interoperability in quantum chemistry software

    Get PDF
    The Common Component Architecture (CCA) offers an environment that allows scientific packages to dynamically interact with each other through components. Conceptually, a computation can be constructed with plugand- play components from any componentized scientific package; however, providing such plug-and-play components from scientific packages requires more than componentizing functions/subroutines of interest, especially for large-scale scientific packages with a long development history. In this paper, we present our efforts to construct components for the integral evaluation - a fundamental sub-problem of quantum chemistry computations - that conform to the CCA specification. The goal is to enable fine-grained interoperability between three quantum chemistry packages, GAMESS, NWChem, and MPQC, via CCA integral components. The structures of these packages are quite different and require different approaches to construct and exploit CCA components. We focus on one of the three packages, GAMESS, delineating the structure of the integral computation in GAMESS, followed by our approaches to its component development. Then we use GAMESS as the driver to interoperate with integral components from another package, MPQC, and discuss the possible solutions for interoperability problems along with preliminary results

    ATCOM: Automatically tuned collective communication system for SMP clusters.

    No full text
    Conventional implementations of collective communications are based on point-to-point communications, and their optimizations have been focused on efficiency of those communication algorithms. However, point-to-point communications are not the optimal choice for modern computing clusters of SMPs due to their two-level communication structure. In recent years, a few research efforts have investigated efficient collective communications for SMP clusters. This dissertation is focused on platform-independent algorithms and implementations in this area;There are two main approaches to implementing efficient collective communications for clusters of SMPs: using shared memory operations for intra-node communications, and over-lapping inter-node/intra-node communications. The former fully utilizes the hardware based shared memory of an SMP, and the latter takes advantage of the inherent hierarchy of the communications within a cluster of SMPs. Previous studies focused on clusters of SMP from certain vendors. However, the previously proposed methods are not portable to other systems. Because the performance optimization issue is very complicated and the developing process is very time consuming, it is highly desired to have self-tuning, platform-independent implementations. As proven in this dissertation, such an implementation can significantly outperform the other point-to-point based portable implementations and some platform-specific implementations;The dissertation describes in detail the architecture of the platform-independent implementation. There are four system components: shared memory-based collective communications, overlapping mechanisms for inter-node and intra-node communications, a prediction-based tuning module and a micro-benchmark based tuning module. Each component is carefully designed with the goal of automatic tuning in mind.</p

    A tunable collective communication framework on a cluster of smps

    No full text
    In this paper we investigate a tunable MPI collective communications library on a cluster of SMPs. Most tunable collective communications libraries select optimal algorithms for inter-node communication on a given platform. We add another layer of intra-node communications composed by several tunable shared memory operations. We explore the advantages of our approach, and discuss when to use our approach, when to switch to another approach on the shared memory layer. Experimental results indicate that collective communications designed by such an approach with proper tuning can outperform vendor implementations

    Mixed mode matrix multiplication

    No full text
    In modern clustering environments where the memory hierarchy has many layers (distributed memory, shared memory layer, cache,   ¡  ¢ ), an important question is how to fully utilize all available resources and identify the most dominant layer in certain computation. When combining algorithms on all layers together, what would be the best method to get the best performance out of all the resources we have? Mixed mode programming model that uses thread programming on the shared memory layer and message passing programming on the distributed memory layer is a method that many researchers are using to utilize the memory resources. In this paper, we take an algorithmic approach that uses matrix multiplication as a tool to show how cache algorithms affect the performance of both shared memory and distributed memory algorithms. We show that with good underlying cache algorithm, overall performance is stable. When underlying cache algorithm is bad, superlinear speedup may occur, and increasing number of threads may also improve performance. 1. Memory Hierarchies in the modern clustering environments Figure 1 shows the memory hierarchy that exists in most nodes of modern clustering environments. Globally, man

    以水土保持戶外教室作為國小校外教學場域之適宜性評估

    No full text
    現行國民小學九年一貫課程中,校外教學為課程與教學之一環,目的在於擴充學生知識領域、增加學習體驗、整合學習效果、深化認識臺灣;因此本研究以臺灣中部水土保持戶外教室為研究樣區,依學校教師觀點,運用水保局2012年充實水土保持戶外教室與轉型計畫之評估參數,配合九年一貫環境教育分類之指標,針對戶外教室之設施進行評估,對於校外教學之場所做出建議。發現東勢林場最適合國小校外教學,蓋因其整體教學場地設備較為完善,且通過環境教育場所認證。大湖四份適合參觀其農藝設施,具坡地果園水土保持各類設施展示之優勢。而草屯風水坪展示內容適合以水土保持植物以及各類植生工程之展示為主。而分析指標亦顯示,建立一套生動、完整、具有故事性的互動式解說教材,是戶外教室的未來重要目標。In the current elementary integrated nine-year curriculum, out of school activity is part of course and teaching. The purposes of out-of-school activity are to expand student's knowledge fields, increase learning experiences, integrate learning benefits, and deepen local cognition. Hence, this study selected the outdoor classrooms located in central Taiwan to provide the suggestions focusing on the facilities of the classrooms according to the viewpoints of school teachers. The evaluation parameters derived from the project of fulfilling and improvement for the soil and water conservation outdoor classrooms sponsored by the SWCB in 2012, and coupled with the index of environmental education of integrated nine-year curriculum. The results showed that outdoor classroom in Dongshi Forest Garden is the most appropriate place for outdoor education of elementary schools because of its more perfect teaching equipment and the site had been passed the authentication in environmental education. The outdoor classroom in Dahu Sifen has the advantage of displaying soil and water conservation in hillside orchard and then is suitable for paying a visit to understand the agronomic facilities. Caotun Feng Shui Ping is famous for its exhibitions in conservation plants and various types of vegetation engineering. Analyzed results showed that how to establish a set of vivid, integrated, inter-dynamic teaching material for the use of the outdoor classrooms is an important goal in the future
    corecore