120,061 research outputs found

    Iso-array rewriting P systems with context-free iso-array rules

    Get PDF
    A new computing model called P system is a highly distributed and parallel theoretical model, which is proposed in the area of membrane computing. Ceterchi et al. initially proposed array rewriting P systems by extending the notion of string rewriting P systems to arrays (2003). A theoretical model for picture generation using context-free iso-array grammar rules and puzzle iso-array grammar rules are introduced by Kalyani et al. (2004, 2006). Also iso-array rewriting P systems for iso-picture languages have been studied by Annadurai et al. (2008). In this paper we consider the context-free iso-array rules and context-free puzzle iso-array rules in iso-array rewriting P systems and examine the generative powers of these P systems

    Development of Cluster Computing –A Review

    Get PDF
    This paper presents the review work of “Cluster Computing” in depth and detail.  Cluster Computing: A Mobile Code Approach by R.B.Patel and Manpreet Singh (2006); Performance Evaluation of Parallel Applications Using Message Passing Interface In Network of Workstations Of Different Computing Powers by Rajkumar Sharma, Priyesh Kanungo and Manohar Chandwani (2011); On the Performance of MPI-OpenMP on a 12 nodes Multi-core Cluster by Abdelgadir Tageldin, Al-Sakib Khan Pathan , Mohiuddin Ahmed (2011); Dynamic Load Balancing in Parallel Processing on Non-Homogeneous Clusters by Armando E. De Giusti, Marcelo R. Naiouf, Laura C. De Giusti, Franco Chichizola (2005); Performance Evaluation of Computation Intensive Tasks in Grid by P.Raghu, K. Sriram (2011); Automatic Distribution of Vision-Tasks on Computing Clusters by Thomas Muller, Binh An Tran and Alois Knoll (2011); Terminology And Taxonomy Parallel Computing Architecture by Amardeep Singh, Satinder Pal Singh, Vandana, Sukhnandan Kaur (2011); Research of Distributed Algorithm based on Parallel Computer Cluster System by Xu He-li, Liu Yan (2010); Cluster Computing Using Orders Based Transparent Parallelizing by Vitaliy D. Pavlenko, Victor V. Burdejnyj (2007) and VCE: A New Personated Virtual Cluster Engine for Cluster Computing by Mohsen Sharifi, Masoud Hassani, Ehsan Mousavi Khaneghah, Seyedeh Leili Mirtaheri (2008). Keywords:Cluster computing, Cluster Architectures, Dynamic and Static Load Balancing, Distributed Systems, Homogeneous and Non-Homogeneous Processors, Multicore clusters, Parallel computing, Parallel Computer Vision, Task parallelism, Terminology and taxonomy, Virtualization, Virtual Cluster

    A Quasi-Random Approach to Matrix Spectral Analysis

    Get PDF
    Inspired by the quantum computing algorithms for Linear Algebra problems [HHL,TaShma] we study how the simulation on a classical computer of this type of "Phase Estimation algorithms" performs when we apply it to solve the Eigen-Problem of Hermitian matrices. The result is a completely new, efficient and stable, parallel algorithm to compute an approximate spectral decomposition of any Hermitian matrix. The algorithm can be implemented by Boolean circuits in O(log⁥2n)O(\log^2 n) parallel time with a total cost of O(nω+1)O(n^{\omega+1}) Boolean operations. This Boolean complexity matches the best known rigorous O(log⁥2n)O(\log^2 n) parallel time algorithms, but unlike those algorithms our algorithm is (logarithmically) stable, so further improvements may lead to practical implementations. All previous efficient and rigorous approaches to solve the Eigen-Problem use randomization to avoid bad condition as we do too. Our algorithm makes further use of randomization in a completely new way, taking random powers of a unitary matrix to randomize the phases of its eigenvalues. Proving that a tiny Gaussian perturbation and a random polynomial power are sufficient to ensure almost pairwise independence of the phases (mod  (2π))(\mod (2\pi)) is the main technical contribution of this work. This randomization enables us, given a Hermitian matrix with well separated eigenvalues, to sample a random eigenvalue and produce an approximate eigenvector in O(log⁥2n)O(\log^2 n) parallel time and O(nω)O(n^\omega) Boolean complexity. We conjecture that further improvements of our method can provide a stable solution to the full approximate spectral decomposition problem with complexity similar to the complexity (up to a logarithmic factor) of sampling a single eigenvector.Comment: Replacing previous version: parallel algorithm runs in total complexity nω+1n^{\omega+1} and not nωn^{\omega}. However, the depth of the implementing circuit is log⁥2(n)\log^2(n): hence comparable to fastest eigen-decomposition algorithms know

    Revisiting Matrix Product on Master-Worker Platforms

    Get PDF
    This paper is aimed at designing efficient parallel matrix-product algorithms for heterogeneous master-worker platforms. While matrix-product is well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses that render our work original and innovative: - Centralized data. We assume that all matrix files originate from, and must be returned to, the master. - Heterogeneous star-shaped platforms. We target fully heterogeneous platforms, where computational resources have different computing powers. - Limited memory. Because we investigate the parallelization of large problems, we cannot assume that full matrix panels can be stored in the worker memories and re-used for subsequent updates (as in ScaLAPACK). We have devised efficient algorithms for resource selection (deciding which workers to enroll) and communication ordering (both for input and result messages), and we report a set of numerical experiments on various platforms at Ecole Normale Superieure de Lyon and the University of Tennessee. However, we point out that in this first version of the report, experiments are limited to homogeneous platforms

    Minimizing Communication in Linear Algebra

    Full text link
    In 1981 Hong and Kung proved a lower bound on the amount of communication needed to perform dense, matrix-multiplication using the conventional O(n3)O(n^3) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it to the parallel case. In both cases the lower bound may be expressed as Ω\Omega(#arithmetic operations / M\sqrt{M}), where M is the size of the fast memory (or local memory in the parallel case). Here we generalize these results to a much wider variety of algorithms, including LU factorization, Cholesky factorization, LDLTLDL^T factorization, QR factorization, algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra. The proof works for dense or sparse matrices, and for sequential or parallel algorithms. In addition to lower bounds on the amount of data moved (bandwidth) we get lower bounds on the number of messages required to move it (latency). We illustrate how to extend our lower bound technique to compositions of linear algebra operations (like computing powers of a matrix), to decide whether it is enough to call a sequence of simpler optimal algorithms (like matrix multiplication) to minimize communication, or if we can do better. We give examples of both. We also show how to extend our lower bounds to certain graph theoretic problems. We point out recently designed algorithms for dense LU, Cholesky, QR, eigenvalue and the SVD problems that attain these lower bounds; implementations of LU and QR show large speedups over conventional linear algebra algorithms in standard libraries like LAPACK and ScaLAPACK. Many open problems remain.Comment: 27 pages, 2 table
    • 

    corecore