1,676 research outputs found

    Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

    Full text link
    GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

    Adaptive and secured resource management in distributed and Internet systems

    Get PDF
    The effectiveness of computer system resource management has been always determined by two major factors: (1) workload demands and management objectives, (2) the updates of the computer technology. These two factors are dynamically changing, and resource management systems must be timely adaptive to the changes. This dissertation attempts to address several important and related resource management issues.;We first study memory system utilization in centralized servers by improving memory performance of sorting algorithms, which provides fundamental understanding on memory system organizations and its performance optimizations for data-intensive workloads. to reduce different types of cache misses, we restructure the mergesort and quicksort algorithms by integrating tiling, padding, and buffering techniques and by repartitioning the data set. Our study shows substantial performance improvements from our new methods.;We have further extended the work to improve load sharing for utilizing global memory resources in distributed systems. Aiming at reducing the memory resource contention caused by page faults and I/O activities, we have developed and examined load sharing policies by considering effective usage of global memory in addition to CPU load balancing in both homogeneous and heterogeneous clusters.;Extending our research from clusters to Internet systems, we have further investigated memory and storage utilizations in Web caching systems. We have proposed several novel management schemes to restructure and decentralize the existing caching system by exploiting data locality at different levels of the global memory hierarchy and by effectively sharing data objects among the clients and their proxy caches.;Data integrity and communication anonymity issues are raised from our decentralized Web caching system design, which are also security concerns for general peer-to-peer systems. We propose an integrity protocol to ensure data integrity, and several protocols to achieve mutual communication anonymity between an information requester and a provider.;The potential impact and contributions of this dissertation are briefly stated as follows: (1) two major research topics identified in this dissertation are fundamentally important for the growth and development of information technology, and will continue to be demanding topics for a long term. (2) Our proposed cache-effective sorting methods bridge a serious gap between analytical complexity of algorithms and their execution complexity in practice due to the increasingly deep memory hierarchy in computer systems. This approach can also be used to improve memory performance at different levels of the memory hierarchy, such as I/O and file systems. (3) Our load sharing principle of giving a high priority to the requests of data accesses in memory and I/Os timely adapts the technology changes and effectively responds to the increasing demand of data-intensive applications. (4) Our proposed decentralized Web caching framework and its resource management schemes present a comprehensive case study to examine the P2P model. Our results and experiences can be used for related and further studies in distributed computing. (5) The proposed data integrity and communication anonymity protocols address limits and weaknesses of existing ones, and place a solid foundation for us to continue our work in this important area

    High performance computing of explicit schemes for electrofusion jointing process based on message-passing paradigm

    Get PDF
    The research focused on heterogeneous cluster workstations comprising of a number of CPUs in single and shared architecture platform. The problem statements under consideration involved one dimensional parabolic equations. The thermal process of electrofusion jointing was also discussed. Numerical schemes of explicit type such as AGE, Brian, and Charlies Methods were employed. The parallelization of these methods were based on the domain decomposition technique. Some parallel performance measurement for these methods were also addressed. Temperature profile of the one dimensional radial model of the electrofusion process were also given

    Performance Analysis of IO Intensive Task Allocation Strategies for Heterogeneous Web Servers

    Get PDF
    The current rate of growth of the World Wide Web has led to an explosion in internet traffic for many popular websites. To overcome the problem of falling quality of service for its customers an efficient approach would be to use a heterogeneous cluster of nodes which replicate the entire site data. In a centralized system, a master node would load balance the user requests and allocate them to the appropriate node. A web application which mainly provides file sharing services to its users offers a system where the tasks are basically of retrieval based nature and hence more IO intensive. In order to address the allocation problem of these tasks, several IO aware policies have been designed and compared with respect to certain standard performance metrics. The study shows that considering the IO nature of tasks yields significantly better results than other existing algorithms

    Vcluster: A Portable Virtual Computing Library For Cluster Computing

    Get PDF
    Message passing has been the dominant parallel programming model in cluster computing, and libraries like Message Passing Interface (MPI) and Portable Virtual Machine (PVM) have proven their novelty and efficiency through numerous applications in diverse areas. However, as clusters of Symmetric Multi-Processor (SMP) and heterogeneous machines become popular, conventional message passing models must be adapted accordingly to support this new kind of clusters efficiently. In addition, Java programming language, with its features like object oriented architecture, platform independent bytecode, and native support for multithreading, makes it an alternative language for cluster computing. This research presents a new parallel programming model and a library called VCluster that implements this model on top of a Java Virtual Machine (JVM). The programming model is based on virtual migrating threads to support clusters of heterogeneous SMP machines efficiently. VCluster is implemented in 100% Java, utilizing the portability of Java to address the problems of heterogeneous machines. VCluster virtualizes computational and communication resources such as threads, computation states, and communication channels across multiple separate JVMs, which makes a mobile thread possible. Equipped with virtual migrating thread, it is feasible to balance the load of computing resources dynamically. Several large scale parallel applications have been developed using VCluster to compare the performance and usage of VCluster with other libraries. The results of the experiments show that VCluster makes it easier to develop multithreading parallel applications compared to conventional libraries like MPI. At the same time, the performance of VCluster is comparable to MPICH, a widely used MPI library, combined with popular threading libraries like POSIX Thread and OpenMP. In the next phase of our work, we implemented thread group and thread migration to demonstrate the feasibility of dynamic load balancing in VCluster. We carried out experiments to show that the load can be dynamically balanced in VCluster, resulting in a better performance. Thread group also makes it possible to implement collective communication functions between threads, which have been proved to be useful in process based libraries
    corecore