405,887 research outputs found

    Parallel computing 91

    Get PDF

    Gaussian Artmap: A Neural Network for Fast Incremental Learning of Noisy Multidimensional Maps

    Full text link
    A new neural network architecture for incremental supervised learning of analalog multidimensional maps is introduced. The architecture, called Gaussian ARTMAP, is a synthesis of a Gaussian classifier and an Adaptive Resonance Theory (ART) neural network, achieved by defining the ART choice function as the discriminant function of a Gaussian classifer with separable distributions, and the ART match function as the same, but with the a priori probabilities of the distributions discounted. While Gaussian ARTMAP retains the attractive parallel computing and fast learning properties of fuzzy ARTMAP, it learns a more efficient internal representation of a mapping while being more resistant to noise than fuzzy ARTMAP on a number of benchmark databases. Several simulations are presented which demonstrate that Gaussian ARTMAP consistently obtains a better trade-off of classification rate to number of categories than fuzzy ARTMAP. Results on a vowel classiflcation problem are also presented which demonstrate that Gaussian ARTMAP outperforms many other classifiers.National Science Foundation (IRI 90-00530); Office of Naval Research (N00014-92-J-4015, 40014-91-J-4100

    Adaptive thread scheduling techniques for improving scalability of software transactional memory

    Get PDF
    Software transactional memory (STM) enhances both ease-of-use and concurrency, and is considered state-of-the-art for parallel applications to scale on modern multi-core hardware. However, there are certain situations where STM performs even worse than traditional locks. Upon hotspots where most threads contend over a few pieces of shared data, going transactional will result in excessive conflicts and aborts that adversely degrade performance. We present a new design of adaptive thread scheduler that manages concurrency when the system is about entering and leaving hotspots. The scheduler controls the number of threads spawning new transactions according to the live commit throughput. We implemented two feedback-control policies called Throttle and Probe to realize this adaptive scheduling. Performance evaluation with the STAMP benchmarks shows that enabling Throttle and Probe obtain best-case speedups of 87.5% and 108.7% respectively.postprintThe 10th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN 2011), Innsbruck, Austria, 15-17 February 2011. In Proceedings of the 10th IASTED-PDCN, 2011, p. 91-9

    Neural Dynamics of Motion Grouping: From Aperture Ambiguity to Object Speed and Direction

    Full text link
    A neural network model of visual motion perception and speed discrimination is developed to simulate data concerning the conditions under which components of moving stimuli cohere or not into a global direction of motion, as in barberpole and plaid patterns (both Type 1 and Type 2). The model also simulates how the perceived speed of lines moving in a prescribed direction depends upon their orientation, length, duration, and contrast. Motion direction and speed both emerge as part of an interactive motion grouping or segmentation process. The model proposes a solution to the global aperture problem by showing how information from feature tracking points, namely locations from which unambiguous motion directions can be computed, can propagate to ambiguous motion direction points, and capture the motion signals there. The model does this without computing intersections of constraints or parallel Fourier and non-Fourier pathways. Instead, the model uses orientationally-unselective cell responses to activate directionally-tuned transient cells. These transient cells, in turn, activate spatially short-range filters and competitive mechanisms over multiple spatial scales to generate speed-tuned and directionally-tuned cells. Spatially long-range filters and top-down feedback from grouping cells are then used to track motion of featural points and to select and propagate correct motion directions to ambiguous motion points. Top-down grouping can also prime the system to attend a particular motion direction. The model hereby links low-level automatic motion processing with attention-based motion processing. Homologs of model mechanisms have been used in models of other brain systems to simulate data about visual grouping, figure-ground separation, and speech perception. Earlier versions of the model have simulated data about short-range and long-range apparent motion, second-order motion, and the effects of parvocellular and magnocellular LGN lesions on motion perception.Office of Naval Research (N00014-920J-4015, N00014-91-J-4100, N00014-95-1-0657, N00014-95-1-0409, N00014-91-J-0597); Air Force Office of Scientific Research (F4620-92-J-0225, F49620-92-J-0499); National Science Foundation (IRI-90-00530

    NAS technical summaries. Numerical aerodynamic simulation program, March 1992 - February 1993

    Get PDF
    NASA created the Numerical Aerodynamic Simulation (NAS) Program in 1987 to focus resources on solving critical problems in aeroscience and related disciplines by utilizing the power of the most advanced supercomputers available. The NAS Program provides scientists with the necessary computing power to solve today's most demanding computational fluid dynamics problems and serves as a pathfinder in integrating leading-edge supercomputing technologies, thus benefitting other supercomputer centers in government and industry. The 1992-93 operational year concluded with 399 high-speed processor projects and 91 parallel projects representing NASA, the Department of Defense, other government agencies, private industry, and universities. This document provides a glimpse at some of the significant scientific results for the year

    Computing subdominant unstable modes of turbulent plasma with a parallel Jacobi-Davidson eigensolver

    Full text link
    In the numerical solution of large-scale eigenvalue problems, Davidson-type methods are an increasingly popular alternative to Krylov eigensolvers. The main motivation is to avoid the expensive factorizations that are often needed by Krylov solvers when the problem is generalized or interior eigenvalues are desired. In Davidson-type methods, the factorization is replaced by iterative linear solvers that can be accelerated by a smart preconditioner. Jacobi-Davidson is one of the most effective variants. However, parallel implementations of this method are not widely available, particularly for non-symmetric problems. We present a parallel implementation that has been included in SLEPc, the Scalable Library for Eigenvalue Problem Computations, and test it in the context of a highly scalable plasma turbulence simulation code. We analyze its parallel efficiency and compare it with a Krylov-Schur eigensolver. © 2011 John Wiley and Sons, Ltd..The authors are indebted to Florian Merz for providing us with the test cases and for his useful suggestions. The authors acknowledge the computer resources provided by the Barcelona Supercomputing Center (BSC). This work was supported by the Spanish Ministerio de Ciencia e Innovacion under project TIN2009-07519.Romero Alcalde, E.; Román Moltó, JE. (2011). Computing subdominant unstable modes of turbulent plasma with a parallel Jacobi-Davidson eigensolver. Concurrency and Computation: Practice and Experience. 23:2179-2191. https://doi.org/10.1002/cpe.1740S2179219123Hochstenbach, M. E., & Notay, Y. (2009). Controlling Inner Iterations in the Jacobi–Davidson Method. SIAM Journal on Matrix Analysis and Applications, 31(2), 460-477. doi:10.1137/080732110Heuveline, V., Philippe, B., & Sadkane, M. (1997). Numerical Algorithms, 16(1), 55-75. doi:10.1023/a:1019126827697Arbenz, P., Bečka, M., Geus, R., Hetmaniuk, U., & Mengotti, T. (2006). On a parallel multilevel preconditioned Maxwell eigensolver. Parallel Computing, 32(2), 157-165. doi:10.1016/j.parco.2005.06.005Genseberger, M. (2010). Improving the parallel performance of a domain decomposition preconditioning technique in the Jacobi–Davidson method for large scale eigenvalue problems. Applied Numerical Mathematics, 60(11), 1083-1099. doi:10.1016/j.apnum.2009.07.004Stathopoulos, A., & McCombs, J. R. (2010). PRIMME. ACM Transactions on Mathematical Software, 37(2), 1-30. doi:10.1145/1731022.1731031Baker, C. G., Hetmaniuk, U. L., Lehoucq, R. B., & Thornquist, H. K. (2009). Anasazi software for the numerical solution of large-scale eigenvalue problems. ACM Transactions on Mathematical Software, 36(3), 1-23. doi:10.1145/1527286.1527287Hernandez, V., Roman, J. E., & Vidal, V. (2005). SLEPc. ACM Transactions on Mathematical Software, 31(3), 351-362. doi:10.1145/1089014.1089019Romero, E., Cruz, M. B., Roman, J. E., & Vasconcelos, P. B. (2011). A Parallel Implementation of the Jacobi-Davidson Eigensolver for Unsymmetric Matrices. High Performance Computing for Computational Science – VECPAR 2010, 380-393. doi:10.1007/978-3-642-19328-6_35Romero, E., & Roman, J. E. (2010). A Parallel Implementation of the Jacobi-Davidson Eigensolver and Its Application in a Plasma Turbulence Code. Lecture Notes in Computer Science, 101-112. doi:10.1007/978-3-642-15291-7_11Über ein leichtes Verfahren die in der Theorie der Säcularstörungen vorkommenden Gleichungen numerisch aufzulösen*). (1846). Journal für die reine und angewandte Mathematik (Crelles Journal), 1846(30), 51-94. doi:10.1515/crll.1846.30.51G. Sleijpen, G. L., & Van der Vorst, H. A. (1996). A Jacobi–Davidson Iteration Method for Linear Eigenvalue Problems. SIAM Journal on Matrix Analysis and Applications, 17(2), 401-425. doi:10.1137/s0895479894270427Fokkema, D. R., Sleijpen, G. L. G., & Van der Vorst, H. A. (1998). Jacobi--Davidson Style QR and QZ Algorithms for the Reduction of Matrix Pencils. SIAM Journal on Scientific Computing, 20(1), 94-125. doi:10.1137/s1064827596300073Morgan, R. B. (1991). Computing interior eigenvalues of large matrices. Linear Algebra and its Applications, 154-156, 289-309. doi:10.1016/0024-3795(91)90381-6Paige, C. C., Parlett, B. N., & van der Vorst, H. A. (1995). Approximate solutions and eigenvalue bounds from Krylov subspaces. Numerical Linear Algebra with Applications, 2(2), 115-133. doi:10.1002/nla.1680020205Stathopoulos, A., Saad, Y., & Wu, K. (1998). Dynamic Thick Restarting of the Davidson, and the Implicitly Restarted Arnoldi Methods. SIAM Journal on Scientific Computing, 19(1), 227-245. doi:10.1137/s1064827596304162Sleijpen, G. L. G., Booten, A. G. L., Fokkema, D. R., & van der Vorst, H. A. (1996). Jacobi-davidson type methods for generalized eigenproblems and polynomial eigenproblems. BIT Numerical Mathematics, 36(3), 595-633. doi:10.1007/bf01731936Balay S Buschelman K Eijkhout V Gropp W Kaushik D Knepley M McInnes LC Smith B Zhang H PETSc users manual 2010Hernandez, V., Roman, J. E., & Tomas, A. (2007). Parallel Arnoldi eigensolvers with enhanced scalability via global communications rearrangement. Parallel Computing, 33(7-8), 521-540. doi:10.1016/j.parco.2007.04.004Dannert, T., & Jenko, F. (2005). Gyrokinetic simulation of collisionless trapped-electron mode turbulence. Physics of Plasmas, 12(7), 072309. doi:10.1063/1.1947447Roman, J. E., Kammerer, M., Merz, F., & Jenko, F. (2010). Fast eigenvalue calculations in a massively parallel plasma turbulence code. Parallel Computing, 36(5-6), 339-358. doi:10.1016/j.parco.2009.12.001Merz, F., & Jenko, F. (2010). Nonlinear interplay of TEM and ITG turbulence and its effect on transport. Nuclear Fusion, 50(5), 054005. doi:10.1088/0029-5515/50/5/054005Simoncini, V., & Szyld, D. B. (2002). Flexible Inner-Outer Krylov Subspace Methods. SIAM Journal on Numerical Analysis, 40(6), 2219-2239. doi:10.1137/s0036142902401074Morgan, R. B. (2002). GMRES with Deflated Restarting. SIAM Journal on Scientific Computing, 24(1), 20-37. doi:10.1137/s106482759936465

    The symmetric-Toeplitz linear system problem in parallel

    Full text link
    [EN] Many algorithms exist that exploit the special structure of Toeplitz matrices for solving linear systems. Nevertheless, these algorithms are difficult to parallelize due to its lower computational cost and the great dependency of the operations involved that produces a great communication cost. The foundation of the parallel algorithm presented in this paper consists of transforming the Toeplitz matrix into a another structured matrix called Cauchy¿like. The particular properties of Cauchy¿like matrices are exploited in order to obtain two levels of parallelism that makes possible to highly reduce the execution time. The experimental results were obtained in a cluster of PC¿s.Supported by Spanish MCYT and FEDER under Grant TIC 2003-08238-C02-02Alonso-Jordá, P.; Vidal Maciá, AM. (2005). The symmetric-Toeplitz linear system problem in parallel. Computational Science -- ICCS 2005,Pt 1, Proceedings. 3514:220-228. https://doi.org/10.1007/11428831_28S2202283514Sweet, D.R.: The use of linear-time systolic algorithms for the solution of toeplitz problems. k Technical Report JCU-CS-91/1, Department of Computer Science, James Cook University, Tue, 23 April 1996 15, 17, 55 GMT (1991)Evans, D.J., Oka, G.: Parallel solution of symmetric positive definite Toeplitz systems. Parallel Algorithms and Applications 12, 297–303 (1998)Gohberg, I., Koltracht, I., Averbuch, A., Shoham, B.: Timing analysis of a parallel algorithm for Toeplitz matrices on a MIMD parallel machine. Parallel Computing 17, 563–577 (1991)Gallivan, K., Thirumalai, S., Dooren, P.V.: On solving block toeplitz systems using a block schur algorithm. In: Proceedings of the 23rd International Conference on Parallel Processing, Boca Raton, FL, USA, vol. 3, pp. 274–281. CRC Press, Boca Raton (1994)Thirumalai, S.: High performance algorithms to solve Toeplitz and block Toeplitz systems. Ph.d. th., Grad. College of the U. of Illinois at Urbana–Champaign (1996)Alonso, P., Badía, J.M., Vidal, A.M.: Parallel algorithms for the solution of toeplitz systems of linear equations. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2004. LNCS, vol. 3019, pp. 969–976. Springer, Heidelberg (2004)Anderson, E., et al.: LAPACK Users’ Guide. SIAM, Philadelphia (1995)Blackford, L., et al.: ScaLAPACK Users’ Guide. SIAM, Philadelphia (1997)Alonso, P., Badía, J.M., González, A., Vidal, A.M.: Parallel design of multichannel inverse filters for audio reproduction. In: Parallel and Distributed Computing and Systems, IASTED, Marina del Rey, CA, USA, vol. II, pp. 719–724 (2003)Loan, C.V.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia (1992)Heinig, G.: Inversion of generalized Cauchy matrices and other classes of structured matrices. Linear Algebra and Signal Proc., IMA, Math. Appl. 69, 95–114 (1994)Gohberg, I., Kailath, T., Olshevsky, V.: Fast Gaussian elimination with partial pivoting for matrices with displacement structure. Mathematics of Computation 64, 1557–1576 (1995)Alonso, P., Vidal, A.M.: An efficient and stable parallel solution for symmetric toeplitz linear systems. TR DSIC-II/2005, DSIC–Univ. Polit. Valencia (2005)Kailath, T., Sayed, A.H.: Displacement structure: Theory and applications. SIAM Review 37, 297–386 (1995

    Hierarchical approach for deriving a reproducible unblocked LU factorization

    Full text link
    [EN] We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The simulations were performed on resources provided by the Swed-ish National Infrastructure for Computing (SNIC) at PDC Centre for High Performance Computing (PDC-HPC). This work was also granted access to the HPC resources of The Institute for Scientific Computing and Simulation financed by Region Ile-de-France and the project Equip@Meso (reference ANR-10-EQPX-29-01) overseen by the French National Agency for Research (ANR) as part of the Investissements d Avenir pro-gram. This work was also partly supported by the FastRelax (ANR-14-CE25-0018-01) project of ANR.Iakymchuk, R.; Graillat, S.; Defour, D.; Quintana-Orti, ES. (2019). Hierarchical approach for deriving a reproducible unblocked LU factorization. International Journal of High Performance Computing Applications. 33(5):791-803. https://doi.org/10.1177/1094342019832968S791803335Arteaga, A., Fuhrer, O., & Hoefler, T. (2014). Designing Bit-Reproducible Portable High-Performance Applications. 2014 IEEE 28th International Parallel and Distributed Processing Symposium. doi:10.1109/ipdps.2014.127Bientinesi, P., Quintana-Ortí, E. S., & Geijn, R. A. van de. (2005). Representing linear algebra algorithms in code: the FLAME application program interfaces. ACM Transactions on Mathematical Software, 31(1), 27-59. doi:10.1145/1055531.1055533Chohra, C., Langlois, P., & Parello, D. (2016). Efficiency of Reproducible Level 1 BLAS. Lecture Notes in Computer Science, 99-108. doi:10.1007/978-3-319-31769-4_8Collange, S., Defour, D., Graillat, S., & Iakymchuk, R. (2015). Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Computing, 49, 83-97. doi:10.1016/j.parco.2015.09.001Demmel, J., & Hong Diep Nguyen. (2013). Fast Reproducible Floating-Point Summation. 2013 IEEE 21st Symposium on Computer Arithmetic. doi:10.1109/arith.2013.9Demmel, J., & Nguyen, H. D. (2015). Parallel Reproducible Summation. IEEE Transactions on Computers, 64(7), 2060-2070. doi:10.1109/tc.2014.2345391Dongarra, J. J., Du Croz, J., Hammarling, S., & Duff, I. S. (1990). A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1), 1-17. doi:10.1145/77626.79170Dongarra, J., Hittinger, J., Bell, J., Chacon, L., Falgout, R., Heroux, M., … Wild, S. (2014). Applied Mathematics Research for Exascale Computing. doi:10.2172/1149042Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., & Zimmermann, P. (2007). MPFR. ACM Transactions on Mathematical Software, 33(2), 13. doi:10.1145/1236463.1236468Haidar, A., Dong, T., Luszczek, P., Tomov, S., & Dongarra, J. (2015). Batched matrix computations on hardware accelerators based on GPUs. The International Journal of High Performance Computing Applications, 29(2), 193-208. doi:10.1177/1094342014567546Hida, Y., Li, X. S., & Bailey, D. H. (s. f.). Algorithms for quad-double precision floating point arithmetic. Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001. doi:10.1109/arith.2001.930115Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms. doi:10.1137/1.9780898718027Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2015). Reproducible Triangular Solvers for High-Performance Computing. 2015 12th International Conference on Information Technology - New Generations. doi:10.1109/itng.2015.63Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2016). Reproducible and Accurate Matrix Multiplication. Lecture Notes in Computer Science, 126-137. doi:10.1007/978-3-319-31769-4_11Kulisch, U., & Snyder, V. (2010). The exact dot product as basic tool for long interval arithmetic. Computing, 91(3), 307-313. doi:10.1007/s00607-010-0127-7Li, X. S., Demmel, J. W., Bailey, D. H., Henry, G., Hida, Y., Iskandar, J., … Yoo, D. J. (2002). Design, implementation and testing of extended and mixed precision BLAS. ACM Transactions on Mathematical Software, 28(2), 152-205. doi:10.1145/567806.567808Muller, J.-M., Brisebarre, N., de Dinechin, F., Jeannerod, C.-P., Lefèvre, V., Melquiond, G., … Torres, S. (2010). Handbook of Floating-Point Arithmetic. doi:10.1007/978-0-8176-4705-6Ogita, T., Rump, S. M., & Oishi, S. (2005). Accurate Sum and Dot Product. SIAM Journal on Scientific Computing, 26(6), 1955-1988. doi:10.1137/030601818Ortega, J. . (1988). The ijk forms of factorization methods I. Vector computers. Parallel Computing, 7(2), 135-147. doi:10.1016/0167-8191(88)90035-xRump, S. M. (2009). Ultimately Fast Accurate Summation. SIAM Journal on Scientific Computing, 31(5), 3466-3502. doi:10.1137/080738490Skeel, R. D. (1979). Scaling for Numerical Stability in Gaussian Elimination. Journal of the ACM, 26(3), 494-526. doi:10.1145/322139.322148Zhu, Y.-K., & Hayes, W. B. (2010). Algorithm 908. ACM Transactions on Mathematical Software, 37(3), 1-13. doi:10.1145/1824801.182481
    • …
    corecore