145,174 research outputs found

    Playing with Derivation Modes and Halting Conditions

    Get PDF
    In the area of P systems, besides the standard maximally parallel derivation mode, many other derivation modes have been investigated, too. In this paper, many variants of hierarchical P systems and tissue P systems using different derivation modes are considered and the effects of using di erent derivation modes, especially the maximally parallel derivation modes and the maximally parallel set derivation modes, on the generative and accepting power are illustrated. Moreover, an overview on some control mechanisms used for (tissue) P systems is given. Furthermore, besides the standard total halting mode, we also consider different halting conditions such as unconditional halting and partial halting and explain how the use of different halting modes may considerably change the computing power of P systems and tissue P systems

    Hierarchical approach for deriving a reproducible unblocked LU factorization

    Full text link
    [EN] We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The simulations were performed on resources provided by the Swed-ish National Infrastructure for Computing (SNIC) at PDC Centre for High Performance Computing (PDC-HPC). This work was also granted access to the HPC resources of The Institute for Scientific Computing and Simulation financed by Region Ile-de-France and the project Equip@Meso (reference ANR-10-EQPX-29-01) overseen by the French National Agency for Research (ANR) as part of the Investissements d Avenir pro-gram. This work was also partly supported by the FastRelax (ANR-14-CE25-0018-01) project of ANR.Iakymchuk, R.; Graillat, S.; Defour, D.; Quintana-Orti, ES. (2019). Hierarchical approach for deriving a reproducible unblocked LU factorization. International Journal of High Performance Computing Applications. 33(5):791-803. https://doi.org/10.1177/1094342019832968S791803335Arteaga, A., Fuhrer, O., & Hoefler, T. (2014). Designing Bit-Reproducible Portable High-Performance Applications. 2014 IEEE 28th International Parallel and Distributed Processing Symposium. doi:10.1109/ipdps.2014.127Bientinesi, P., Quintana-Ortí, E. S., & Geijn, R. A. van de. (2005). Representing linear algebra algorithms in code: the FLAME application program interfaces. ACM Transactions on Mathematical Software, 31(1), 27-59. doi:10.1145/1055531.1055533Chohra, C., Langlois, P., & Parello, D. (2016). Efficiency of Reproducible Level 1 BLAS. Lecture Notes in Computer Science, 99-108. doi:10.1007/978-3-319-31769-4_8Collange, S., Defour, D., Graillat, S., & Iakymchuk, R. (2015). Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Computing, 49, 83-97. doi:10.1016/j.parco.2015.09.001Demmel, J., & Hong Diep Nguyen. (2013). Fast Reproducible Floating-Point Summation. 2013 IEEE 21st Symposium on Computer Arithmetic. doi:10.1109/arith.2013.9Demmel, J., & Nguyen, H. D. (2015). Parallel Reproducible Summation. IEEE Transactions on Computers, 64(7), 2060-2070. doi:10.1109/tc.2014.2345391Dongarra, J. J., Du Croz, J., Hammarling, S., & Duff, I. S. (1990). A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1), 1-17. doi:10.1145/77626.79170Dongarra, J., Hittinger, J., Bell, J., Chacon, L., Falgout, R., Heroux, M., … Wild, S. (2014). Applied Mathematics Research for Exascale Computing. doi:10.2172/1149042Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., & Zimmermann, P. (2007). MPFR. ACM Transactions on Mathematical Software, 33(2), 13. doi:10.1145/1236463.1236468Haidar, A., Dong, T., Luszczek, P., Tomov, S., & Dongarra, J. (2015). Batched matrix computations on hardware accelerators based on GPUs. The International Journal of High Performance Computing Applications, 29(2), 193-208. doi:10.1177/1094342014567546Hida, Y., Li, X. S., & Bailey, D. H. (s. f.). Algorithms for quad-double precision floating point arithmetic. Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001. doi:10.1109/arith.2001.930115Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms. doi:10.1137/1.9780898718027Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2015). Reproducible Triangular Solvers for High-Performance Computing. 2015 12th International Conference on Information Technology - New Generations. doi:10.1109/itng.2015.63Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2016). Reproducible and Accurate Matrix Multiplication. Lecture Notes in Computer Science, 126-137. doi:10.1007/978-3-319-31769-4_11Kulisch, U., & Snyder, V. (2010). The exact dot product as basic tool for long interval arithmetic. Computing, 91(3), 307-313. doi:10.1007/s00607-010-0127-7Li, X. S., Demmel, J. W., Bailey, D. H., Henry, G., Hida, Y., Iskandar, J., … Yoo, D. J. (2002). Design, implementation and testing of extended and mixed precision BLAS. ACM Transactions on Mathematical Software, 28(2), 152-205. doi:10.1145/567806.567808Muller, J.-M., Brisebarre, N., de Dinechin, F., Jeannerod, C.-P., Lefèvre, V., Melquiond, G., … Torres, S. (2010). Handbook of Floating-Point Arithmetic. doi:10.1007/978-0-8176-4705-6Ogita, T., Rump, S. M., & Oishi, S. (2005). Accurate Sum and Dot Product. SIAM Journal on Scientific Computing, 26(6), 1955-1988. doi:10.1137/030601818Ortega, J. . (1988). The ijk forms of factorization methods I. Vector computers. Parallel Computing, 7(2), 135-147. doi:10.1016/0167-8191(88)90035-xRump, S. M. (2009). Ultimately Fast Accurate Summation. SIAM Journal on Scientific Computing, 31(5), 3466-3502. doi:10.1137/080738490Skeel, R. D. (1979). Scaling for Numerical Stability in Gaussian Elimination. Journal of the ACM, 26(3), 494-526. doi:10.1145/322139.322148Zhu, Y.-K., & Hayes, W. B. (2010). Algorithm 908. ACM Transactions on Mathematical Software, 37(3), 1-13. doi:10.1145/1824801.182481

    Genet: A Quickly Scalable Fat-Tree Overlay for Personal Volunteer Computing using WebRTC

    Full text link
    WebRTC enables browsers to exchange data directly but the number of possible concurrent connections to a single source is limited. We overcome the limitation by organizing participants in a fat-tree overlay: when the maximum number of connections of a tree node is reached, the new participants connect to the node's children. Our design quickly scales when a large number of participants join in a short amount of time, by relying on a novel scheme that only requires local information to route connection messages: the destination is derived from the hash value of the combined identifiers of the message's source and of the node that is holding the message. The scheme provides deterministic routing of a sequence of connection messages from a single source and probabilistic balancing of newer connections among the leaves. We show that this design puts at least 83% of nodes at the same depth as a deterministic algorithm, can connect a thousand browser windows in 21-55 seconds in a local network, and can be deployed for volunteer computing to tap into 320 cores in less than 30 seconds on a local network to increase the total throughput on the Collatz application by two orders of magnitude compared to a single core

    Pando: Personal Volunteer Computing in Browsers

    Full text link
    The large penetration and continued growth in ownership of personal electronic devices represents a freely available and largely untapped source of computing power. To leverage those, we present Pando, a new volunteer computing tool based on a declarative concurrent programming model and implemented using JavaScript, WebRTC, and WebSockets. This tool enables a dynamically varying number of failure-prone personal devices contributed by volunteers to parallelize the application of a function on a stream of values, by using the devices' browsers. We show that Pando can provide throughput improvements compared to a single personal device, on a variety of compute-bound applications including animation rendering and image processing. We also show the flexibility of our approach by deploying Pando on personal devices connected over a local network, on Grid5000, a French-wide computing grid in a virtual private network, and seven PlanetLab nodes distributed in a wide area network over Europe.Comment: 14 pages, 12 figures, 2 table

    Metropolis Sampling

    Full text link
    Monte Carlo (MC) sampling methods are widely applied in Bayesian inference, system simulation and optimization problems. The Markov Chain Monte Carlo (MCMC) algorithms are a well-known class of MC methods which generate a Markov chain with the desired invariant distribution. In this document, we focus on the Metropolis-Hastings (MH) sampler, which can be considered as the atom of the MCMC techniques, introducing the basic notions and different properties. We describe in details all the elements involved in the MH algorithm and the most relevant variants. Several improvements and recent extensions proposed in the literature are also briefly discussed, providing a quick but exhaustive overview of the current Metropolis-based sampling's world.Comment: Wiley StatsRef-Statistics Reference Online, 201

    Generalized Communicating P Systems Working in Fair Sequential Model

    Get PDF
    In this article we consider a new derivation mode for generalized communicating P systems (GCPS) corresponding to the functioning of population protocols (PP) and based on the sequential derivation mode and a fairness condition. We show that PP can be seen as a particular variant of GCPS. We also consider a particular stochastic evolution satisfying the fairness condition and obtain that it corresponds to the run of a Gillespie's SSA. This permits to further describe the dynamics of GCPS by a system of ODEs when the population size goes to the infinity.Comment: Presented at MeCBIC 201
    • …
    corecore