2 research outputs found

    PERFORMANCE ANALYSIS AND FITNESS OF GPGPU AND MULTICORE ARCHITECTURES FOR SCIENTIFIC APPLICATIONS

    Get PDF
    Recent trends in computing architecture development have focused on exploiting task- and data-level parallelism from applications. Major hardware vendors are experimenting with novel parallel architectures, such as the Many Integrated Core (MIC) from Intel that integrates 50 or more x86 processors on a single chip, the Accelerated Processing Unit from AMD that integrates a multicore x86 processor with a graphical processing unit (GPU), and many other initiatives from other hardware vendors that are underway. Therefore, various types of architectures are available to developers for accelerating an application. A performance model that predicts the suitability of the architecture for accelerating an application would be very helpful prior to implementation. Thus, in this research, a Fitness model that ranks the potential performance of accelerators for an application is proposed. Then the Fitness model is extended using statistical multiple regression to model both the runtime performance of accelerators and the impact of programming models on accelerator performance with high degree of accuracy. We have validated both performance models for all the case studies. The error rate of these models, calculated using the experimental performance data, is tolerable in the high-performance computing field. In this research, to develop and validate the two performance models we have also analyzed the performance of several multicore CPUs and GPGPU architectures and the corresponding programming models using multiple case studies. The first case study used in this research is a matrix-matrix multiplication algorithm. By varying the size of the matrix from a small size to a very large size, the performance of the multicore and GPGPU architectures are studied. The second case study used in this research is a biological spiking neural network (SNN), implemented with four neuron models that have varying requirements for communication and computation making them useful for performance analysis of the hardware platforms. We report and analyze the performance variation of the four popular accelerators (Intel Xeon, AMD Opteron, Nvidia Fermi, and IBM PS3) and four advanced CPU architectures (Intel 32 core, AMD 32 core, IBM 16 core, and SUN 32 core) with problem size (matrix and network size) scaling, available optimization techniques and execution configuration. This thorough analysis provides insight regarding how the performance of an accelerator is affected by problem size, optimization techniques, and accelerator configuration. We have analyzed the performance impact of four popular multicore parallel programming models, POSIX-threading, Open Multi-Processing (OpenMP), Open Computing Language (OpenCL), and Concurrency Runtime on an Intel i7 multicore architecture; and, two GPGPU programming models, Compute Unified Device Architecture (CUDA) and OpenCL, on a NVIDIA GPGPU. With the broad study conducted using a wide range of application complexity, multiple optimizations, and varying problem size, it was found that according to their achievable performance, the programming models for the x86 processor cannot be ranked across all applications, whereas the programming models for GPGPU can be ranked conclusively. We also have qualitatively and quantitatively ranked all the six programming models in terms of their perceived programming effort. The results and analysis in this research indicate and are supported by the proposed performance models that for a given hardware system, the best performance for an application is obtained with a proper match of programming model and architecture

    Metalanguage For High-performance Computing On Hybrid Architectures

    No full text
    In high-performance computing, hybrid systems are defined as architectures where shared and distributed memory systems coexist. To explore most of the potential of such systems, programmer usually need more than one programming model simultaneously. For distributed memory systems, the master/worker model with message exchange is commonly used. In that case, the MPI is the most used programming library. On the other hand, for shared memory systems, the fork/join model, as used by PThreads and OpenMP application programming interfaces, are de facto standards. In this paper, I propose a metalanguage to combine both programming models. The metalanguage has annotated statements to specify which parts of code runs in shared memory systems and which others runs in distributed memory systems. A metacompiler will translate the metalanguage and will generate a C code with OpenMP pragmas, PThreads and MPI function calls. As a result, I show that the programs written using metalanguage code are more clean and understandable. Therefore, it is easier to program highperformance computing programs with it.12611621168Rabenseifner, R., Hybrid parallel programming on HPC platforms (2003) European Workshop on OpenMP, 5, pp. 185-194Rabenseifner, R., Hager, G., Jost, G., Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes (2009) Euromicro International Conference on Parallel, Distributed, and Network-based Processing, Maio, 17, pp. 427-436Hager, G., Jost, G., Rabenseifner, R., Communication characteristics and hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes (2009) Proceedings of the Cray Users Group ConferenceHampel, V., Maehle, E., High-level application design for hybrid computing systems (2010) International Conference on Architecture of Computing Systems (ARCS), Fev, 23, pp. 1-7August, A.D., Pingali, K., Chiou, D., Sendag, R., Yi, J.J., (2010) Programming multicores: Do applications programmers need to write explicitly parallel programs?, 30 (3), pp. 19-33. , IEEE Micro maio-junhePatel, I., Gilbert, J., An empirical study of the performance and productivity of two parallel programming models," in (2008) IEEE International Symposium on Parallel and Distributed Processing, pp. 1-7. , Abril 2008Diaz, J., Munoz-Caro, C., Nino, A., A survey of parallel programming models and tools in the multi and many-core era (2012) IEEE Transactions on Parallel and Distributed Systems, 23 (8), pp. 1369-1386Hochstein, L., Basili, V.R., Vishkin, U., Gilbert, J., A pilot study to compare programming effort for two parallel programming models (2008) Journal of Systems and Software, 81 (11), pp. 1920-1930. , NovSkillicorn, D.B., Talia, D., Models and languages for parallel computation (1998) ACM Computing Surveys, 30, pp. 123-169. , JunhoKasim, H., March, V., Zhang, R., See, S., Survey on parallel programming model (2008) Proceedings of IFIP International Conference on Network and Parallel Computing, pp. 266-275. , Berlin, Heidelberg: Springer-VerlagSpeyer, G., Freed, N., Akis, R., Stanzione, D., Mack, E., Paradigms for parallel computation (2008) Proceedings of DoD HPCMP Users Group Conference, pp. 486-494. , julhoMPI: A message passing interface (1993) ACM/IEEE Conference on Supercomputing, Ser. Supercomputing'93, pp. 878-883. , The MPI Forum, New York, NY, USA: ACMRaghunathan, S., Extending inter-process synchronization with robust mutex and variants in condition wait (2008) IEEE International Conference on Parallel and Distributed Systems, 14, pp. 121-12
    corecore