11 research outputs found

    Algorithms and Tools for (Distributed) Heterogeneous Computing: A Prospective Report

    Get PDF
    We discuss algorithms and tools to help program and use metacomputing resources in the forthcoming years. Metacomputing with highly distributed heterogeneous environments stands to become a major, if not dominant, method to implement all kinds of parallel applications. In this report, we survey some general aspects of metacomputing (hardware, system and administration issues, as well as the application field). Next we identify some algorithmic issues and software challenges that must be solved to efficiently program and/or transparently use such platforms: Data decomposition techniques for cluster computing, Granularity issues for metacomputing, Scheduling and load-balancing methods, Programming models. We illustrate each of these issues and challenges by the analysis of several case studies: Cluster ScaLAPACK, AppLeS, Globus, Legion, Albatross and Netsolve. We conclude this report by stating some final remarks and recommendations

    Execution Replay of Parallel Programs

    No full text
    Debugging MIMD programs is often a delicate job. As a matter of fact, they can have different behaviors in successive executions. So, cyclic debugging is not applicable. To make it available for parallel programmers, we propose execution replay (full and partial) for our multi-threaded execution model, the Communicating Active Components (CAC). CAC/s have been defined to implement Parallel Object Oriented Languages. This work is part of the PVC/BOX project which goal is a full parallel object oriented environment. Execution replay mechanism is the basic tool for usable parallel debugging tools building. 1 Introduction Last years, hardware components have made great progress. Now, very powerful architectures and especially multicomputers made of many processors are available. The programming of these machines is often a delicate job, even for expert programmers. The need of efficient and powerful tools to support debugging is very important. Advanced tools exist for sequential programm..

    [Courtrai92a] L. Courtrai, J.F. Roos, J.M. Geib, J.F. Mehaut

    No full text
    : Our research team works on the design and the implementation of ObjectOriented Parallel Language for Distributed Architectures (multicomputer or Workstation network). This study is based on the Cac (Communicating Active Component). A Cac is an entity which includes a mailbox, a local context and an activity running a behavioral function. A distributed application is described by a set of Cac communicating by message passing. Before execution, the code of behavioral functions is distributed on several nodes. During execution, the Cac/s of application run concurrently on the network nodes. Parallel Object Languages may be implemented with Cac/s. So the execution of parallel object program produces a real parallelism and a natural distribution of objects over the network. We will present the Cac implementation realized on a set of SparcStation by using SunOS and especially the lightweight process (Cac Activity) and the sockets. Cac environment is easily realizable on a modern distribute..

    On The Energy Efficiency And Performance Of Irregular Application Executions On Multicore, Numa And Manycore Platforms

    No full text
    Until the last decade, performance of HPC architectures has been almost exclusively quantified by their processing power. However, energy efficiency is being recently considered as important as raw performance and has become a critical aspect to the development of scalable systems. These strict energy constraints guided the development of a new class of so-called light-weight manycore processors. This study evaluates the computing and energy performance of two well-known irregular NP-hard problems-the Traveling-Salesman Problem (TSP) and K-Means clustering-and a numerical seismic wave propagation simulation kernel-Ondes3D-on multicore, NUMA, and manycore platforms. First, we concentrate on the nontrivial task of adapting these applications to a manycore, specifically the novel MPPA-256 manycore processor. Then, we analyze their performance and energy consumption on those different machines. Our results show that applications able to fully use the resources of a manycore can have better performance and may consume from 3.8 × to 13 × less energy when compared to low-power and general-purpose multicore processors, respectively.763248Andreolli, C., Thierry, P., Borges, L., Yount, C., Skinner, G., Genetic algorithm based auto-tuning of seismic applications on multi and manycore computers (2014) EAGE Workshop on High Performance Computing for Upstream, Amsterdam, Netherlands, , http://dx.doi.org/10.3997/2214-4609.20141920, SeptemberAochi, H., Ulrich, T., Ducellier, A., Dupros, F., Michea, D., Finite difference simulations of seismic wave propagation for understanding earthquake physics and predicting ground motions: Advances and challenges (2013) J. Phys.: Conf. Ser., 454, p. 012010Aubry, P., Beaucamps, P.-E., Blanc, F., Bobin, B., Carpov, S., Cudennec, L., David, V., Sirdey, R., Extended cyclostatic dataflow program compilation and execution for an integrated manycore processor (2013) International Conference on Computational Science, ICCS, Vol. 18, pp. 1624-1633. , Elsevier Barcelona, SpainBoillot, L., Barucq, H., Calandra, H., Diaz, J., (Portable) task-based programming model for elastodynamics EAGE Workshop on HPC for Upstream, , Chania, GreeceBrooks, D., Bose, P., Schuster, S.E., Power-aware microarchitecture: Design and modeling challenges for next-generation microprocessors (2000) IEEE Micro, 20, pp. 26-44Castro, M., Francesquini, E., NguĂ©lĂ©, T.M., MĂ©haut, J.-F., Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application (2013) Workshop on Irregular Applications: Architectures & Algorithms (IA3) - Supercomputing Conference (SC), , ACM Denver, EUA p. Article No. 5Collino, F., Perfectly matched absorbing layers for the paraxial equations (1997) J. Comput. Phys., 131, pp. 164-180Cui, Y., Olsen, K., Jordan, T., Lee, K., Zhou, J., Small, P., Roten, D., Maechling, P., Scalable earthquake simulation on petascale supercomputers High Performance Computing, Networking, Storage and Analysis, SC, 2010 International Conference, pp. 1-20Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K., Optimization and performance modeling of stencil computations on modern microprocessors (2009) SIAM Rev., 51, pp. 129-159Dhillon, I., Modha, D., A data-clustering algorithm on distributed memory multiprocessors (2000) Large-Scale Parallel Data Mining, 1759, pp. 245-260. , M. Zaki, C.-T. Ho, Lecture Notes in Computer Science Springer Berlin, HeidelbergDupros, F., Aochi, H., Ducellier, A., Komatitsch, D., Roman, J., Exploiting intensive multithreading for the efficient simulation of 3D seismic wave propagation International Conference on Computational Science and Engineering, pp. 253-260. , SĂŁo Paulo, BrazilDupros, F., Do, H.-T., Aochi, H., On scalability issues of the elastodynamics equations on multicore platforms (2013) International Conference on Computational Science, ICCS, 18, pp. 1226-1234. , Procedia Computer Science Elsevier Barcelona, SpainDupros, F., Pousa, C., Carissimi, A., MĂ©haut, J.-F., Parallel simulations of seismic wave propagation on NUMA architectures (2010) International Parallel Computing Conference, ParCo, 19, pp. 67-74. , Advances in Parallel Computing IOS Press Lyon, FranceFleig, T., Mattes, O., Karl, W., Evaluation of adaptive memory management techniques on the Tilera TILE-Gx platform (2014) International Conference on Architecture of Computing Systems, ARCS, pp. 88-96. , VDE Verlag Luebeck, DeutschlandFurumura, T., Chen, L., Parallel simulation of strong ground motions during recent and historical damaging earthquakes in Tokyo, Japan (2005) Parallel Comput., 31, pp. 149-165. , Parallel Graphics and VisualizationGharaibeh, A., Santos-Neto, E., Costa, L.B.A., Ripeanu, M., The energy case for graph processing on hybrid CPU and GPU systems (2013) Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms, pp. 21-28. , IA3'13 ACM New York, NY, USAGöddeke, D., Komatitsch, D., Geveler, M., Ribbrock, D., Rajovic, N., Puzovic, N., Ramirez, A., Energy efficiency vs. Performance of the numerical solution of PDEs: An application study on a low-Power ARM-Based cluster (2013) J. Comput. Phys., 237, pp. 132-150Gursoy, A., Data decomposition for parallel k-means clustering (2004) Parallel Processing and Applied Mathematics, 3019, pp. 241-248. , R. Wyrzykowski, J. Dongarra, M. Paprzycki, J. Was̈niewski, Lecture Notes in Computer Science Springer Berlin, HeidelbergHĂ€hnel, M., Döbel, B., Völp, M., HĂ€rtig, H., Measuring energy consumption for short code paths using RAPL (2012) ACM SIGMETRICS Perform. Eval. Rev., 40, pp. 13-17Jain, A.K., Dubes, R.C., (1988) Algorithms for Clustering Data, , Prentice-Hall, Inc. Upper Saddle River, NJ, USAKanungo, T., Mount, D., Netanyahu, N., Piatko, C., Silverman, R., Wu, A., An efficient k-means clustering algorithm: Analysis and implementation (2002) IEEE Trans. Pattern Anal. Mach. Intell., 24, pp. 881-892Kaufman, L., Rousseeuw, P.J., (1990) Finding Groups in Data: An Introduction to Cluster Analysis, , John Wiley and Sons New YorkLaporte, G., The traveling salesman problem: An overview of exact and approximate algorithms (1992) European J. Oper. Res., 59, pp. 231-247Larus, J., Spending Moore's dividend (2009) Commun. ACM, 52, pp. 62-69De Dinechin, B.D., De Massas, P.G., Lager, G., LĂ©ger, C., Orgogozo, B., Reybert, J., Strudel, T., A distributed run-time environment for the Kalray MPPA-256 integrated manycore processor (2013) Intl. Conference on Computational Science, ICCS, Vol. 18, pp. 1654-1663. , Elsevier Barcelona, SpainLi, H., Sudarsan, H.L., Stumm, M., Sevcik, K.C., Locality and loop scheduling on NUMA multiprocessors (1993) International Conference on Parallel Processing, ICPP, Vol. 2, pp. 140-147. , IEEE Computer Society Syracuse, USALove, R., Korner, K., CPU affinity (2003) Linux J., (111)Madariaga, R., Dynamics of an expanding circular fault (1976) Bull. Seismol. Soc. Amer., 66, pp. 639-666Moczo, P., Robertsson, J.O.A., Eisner, L., The finite-difference time-domain method for modeling of seismic wave propagation (2007) Advances in Wave Propagation in Heterogeneous Media, 48, pp. 421-516. , Advances in Geophysics Elsevier, Academic PressMorari, A., Tumeo, A., Villa, O., Secchi, S., Valero, M., Efficient sorting on the Tilera manycore architecture (2012) IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD, pp. 171-178. , IEEE Computer Society New York, USAOu, Z., Pang, B., Deng, Y., Nurminen, J., YlĂ€-JÀÀski, A., Hui, P., Energy and cost-efficiency analysis of ARM-based clusters (2012) IEEE/ACM Intl. Symposium on Cluster, Cloud and Grid Computing, CCGrid, pp. 115-123. , IEEE Computer Society Ottawa, CanadaPadoin, E.L., De Oliveira, D.A.G., Velho, P., Navaux, P., Time-to-solution and energy-to-solution: A comparison between ARM and Xeon (2012) Workshop on Applications for Multi-Core Architectures, WAMCA, pp. 48-53. , IEEE Computer Society New York, USARajovic, N., The low-power architecture approach towards exascale computing (2011) Workshop on Scalable Algorithms for Large-Scale Systems (ScalA), pp. 1-2. , ACM New York, USARao, S., Prasad, E.V., Venkateswarlu, N.B., A scalable k-means clustering algorithm on multi-core architecture (2009) Proceeding of International Conference on Methods and Models in Computer Science, pp. 1-9. , ICM2CS 2009Rodrigues, L., Zarate, L., Nobre, C., Freitas, H., Parallel and distributed kmeans to identify the translation initiation site of proteins Systems, Man, and Cybernetics, SMC, 2012 IEEE International Conference, pp. 1639-1645Rotem, E., Naveh, A., Ananthakrishnan, A., Weissmann, E., Rajwan, D., Power-management architecture of the intel microarchitecture code-named sandy bridge (2012) IEEE Micro, 32, pp. 20-27Tesser, R.K., Pilla, L.L., Dupros, F., Navaux, P.O.A., MĂ©haut, J.-F., Mendes, C., Improving the performance of seismic wave simulations with dynamic load balancing (2014) Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP, pp. 196-203. , IEEE Computer Society Turin, ItalyTotoni, E., Behzad, B., Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs (2012) IEEE Intl. Symposium on Performance Analysis of Systems and Software, ISPASS, pp. 78-87. , IEEE Computer Society New Brunswick, CanadaXu, R., Wunsch, I.I.D., Survey of clustering algorithms (2005) IEEE Trans. Neural Netw., 16, pp. 645-67

    Evaluating SEU fault-injection on parallel applications implemented on multicore processors

    No full text
    International audienceThe widespread use of multicore processors in computing systems and the imperative necessity of exploiting massive parallelism to improve performance and dependability, make mandatory to evaluate the impact of SEUs on parallel applications running on multicore processors. This paper presents a method and preliminary results of SEU fault-injection campaigns performed on parallel applications implemented on a quad-core processor. Two representative benchmarks applications were considered for the analysis

    Preliminary results of SEU Fault Injection on Multicore processors in AMP mode

    No full text
    International audienceThe current technological challenge for computing systems is to use multicore processors in order to ensure reliability, improve performance, and reduce power consumption. This paper presents a method and preliminary results of SEU fault-injection campaigns performed on multicore systems in asymmetric multi-processing mode. The target used for this purpose was a Quad-core processor. This work aims at validating the efficiency of TMR fault-tolerance method and to show the weakest variables of a given application running over multicore systems
    corecore