5 research outputs found

    Performance Analysis And Optimal Utilization Of Inter-Process Communications On A Commodity Cluster

    Get PDF
    Classical science is based on theory, observation and physical experimentation. Contemporary science is characterized by theory, observation, experimentation and numerical simulation. With the use of hardware and software we can simulate lots of phenomenon. This saves time, money and physical resources. Simulation of a certain phenomenon requires lots of computing power. Answer to these computational power needs is high performance computer. High performance computers consist of numerous processors working on same task in parallel. In the past, high performance computers were very expensive and affordable by few institutions. After Message Passing Interface library is ported to PC platform, commodity clusters can be built of inexpensive PCs and afforded by any researcher. Lots of performance analyses have been conducted on high-end supercomputers. None has been done on commodity clusters. In this thesis, experiments for six major MPI communication functions were performed on eight different configurations of clusters. Performance analyses were then conducted on the results. Based on the results, methods for optimal utilization of inter-process communications on commodity clusters were proposed

    Efficient Parallel All-Pairs Computation Framework: using Computation - Communication Overlap

    Get PDF
    The advent of parallel computing systems enabled the users with huge computation power to efficiently process huge work loads. Most of the recent applications, which are data intensive, require parallel computing power to complete the job efficiently. To facilitate efficient computing there is a necessity for simplified abstraction of the parallel computing systems. We propose one such parallel computation abstraction, designed to solve All-Pairs problems which fit the needs of several data intensive applications. All-Pairs problems require each data element to be paired with every other data element. This framework aims to address recurring problems of scalability, distributing equal workload to all nodes and reducing memory footprint. Our framework reduces memory footprint of All-Pairs problems, by reducing memory requirement from N/ sqrt(P) to 3N/P. A bioinformatics application is implemented to demonstrate the scalability (ranging up to 512 cores), redundancy management and speed up the performance of the framework(superlinear speed up)

    HDOT — An approach towards productive programming of hybrid applications

    Get PDF
    bulk synchronous parallel (BSP) communication model can hinder performance increases. This is due to the complexity to handle load imbalances, to reduce serialisation imposed by blocking communication patterns, to overlap communication with computation and, finally, to deal with increasing memory overheads. The MPI specification provides advanced features such as non-blocking calls or shared memory to mitigate some of these factors. However, applying these features efficiently usually requires significant changes on the application structure. Task parallel programming models are being developed as a means of mitigating the abovementioned issues but without requiring extensive changes on the application code. In this work, we present a methodology to develop hybrid applications based on tasks called hierarchical domain over-decomposition with tasking (HDOT). This methodology overcomes most of the issues found on MPI-only and traditional hybrid MPI+OpenMP applications. However, by emphasising the reuse of data partition schemes from process-level and applying them to task-level, it enables a natural coexistence between MPI and shared-memory programming models. The proposed methodology shows promising results in terms of programmability and performance measured on a set of applications.This work has been developed with the support of the European Union H2020 program through the INTERTWinE project (agreement number 671602); the Severo Ochoa Program awarded by the Spanish Government (SEV-2015-0493); the Generalitat de Catalunya (contract 2017-SGR-1414); and the Spanish Ministry of Science and Innovation (TIN2015-65316-P, Computaci on de Altas Prestaciones VII). The authors gratefully acknowledge Dr. Arnaud Mura, CNRS researcher at Institut PPRIME in France, for the numerical tool CREAMS. Finally, the manuscript has greatly bene ted from the precise comments of the reviewers.Peer ReviewedPostprint (author's final draft

    High-Stiffness, Lock-and-Key Heat-Reversible Locator-Snap Systems for the Design for Disassembly.

    Full text link
    The use of joints that can disengage with minimum labor, part damage, and material contamination is critical to ensure effective service, part reuse, and material recycling. This dissertation develops a general computational method for designing lock-and-key heat-reversible locator-snap systems that satisfy the aforementioned requirements. The lock-and-key concept is like a security code that allows easy disassembly when the right procedure is followed. It is realized by double-latching snaps that require force within a certain range to disengage, and multiple snaps that require heating multiple locations at different temperatures to disengage. During disassembly, thermal expansion constrained by locators and temperature gradient along the wall thickness are exploited to realize the deformation required to release the snaps. A generic optimization problem is posed to find the orientations, numbers, and locations of locators and snaps, and the numbers, locations, and sizes of heating areas, which realize the release of snaps with minimum heating and maximum stiffness, while satisfying motion and structural requirements. Screw Theory is utilized to pre-calculate the set of feasible orientations of locators and snaps that are examined during optimization. Multi-Objective Genetic Algorithm (MOGA) is used for solving the posed generic optimization problem. A parallel version, using manager-worker scheme, with active load balancing is developed to solve the generic optimization problem efficiently. The proposed algorithm selects between two parallelization schemes based on the average objective function evaluation time and either divides the population evenly over all processors or sends small patches of the population to the idle workers. The proposed heat-reversible locator-snap systems are applied to different case studies ranging from automotive bodies to consumer electronics. The first case study deals with joining internal frames and external panels in automotive bodies. Next, the proposed locator-snap systems are applied to a T-shaped DVD player enclosure, an enclosure model with complex mating line geometry, and a flat panel TV enclosure. In the later, the developed Parallel genetic algorithm is used and its performance is analyzed. In all case studies, the resulting Pareto-optimal solutions result in alternative designs with different trade-offs between the design objectives while satisfying all the constraints.Ph.D.Mechanical Engineering and Scientific ComputingUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/58479/1/mshalaby_1.pd
    corecore