Search CORE

120,061 research outputs found

Iso-array rewriting P systems with context-free iso-array rules

Author: Bhuvaneswari K.
Kalyani T.
Nagar A. K.
Thamburaj R.
Thomas D. G.
Publication venue: 'Brno University of Technology'
Publication date: 01/01/2014
Field of study

A new computing model called P system is a highly distributed and parallel theoretical model, which is proposed in the area of membrane computing. Ceterchi et al. initially proposed array rewriting P systems by extending the notion of string rewriting P systems to arrays (2003). A theoretical model for picture generation using context-free iso-array grammar rules and puzzle iso-array grammar rules are introduced by Kalyani et al. (2004, 2006). Also iso-array rewriting P systems for iso-picture languages have been studied by Annadurai et al. (2008). In this paper we consider the context-free iso-array rules and context-free puzzle iso-array rules in iso-array rewriting P systems and examine the generative powers of these P systems

CiteSeerX

Digital library of Brno University of Technology

Development of Cluster Computing –A Review

Author: Ahmad Fahad
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 31/01/2015
Field of study

This paper presents the review work of “Cluster Computing” in depth and detail. Cluster Computing: A Mobile Code Approach by R.B.Patel and Manpreet Singh (2006); Performance Evaluation of Parallel Applications Using Message Passing Interface In Network of Workstations Of Different Computing Powers by Rajkumar Sharma, Priyesh Kanungo and Manohar Chandwani (2011); On the Performance of MPI-OpenMP on a 12 nodes Multi-core Cluster by Abdelgadir Tageldin, Al-Sakib Khan Pathan , Mohiuddin Ahmed (2011); Dynamic Load Balancing in Parallel Processing on Non-Homogeneous Clusters by Armando E. De Giusti, Marcelo R. Naiouf, Laura C. De Giusti, Franco Chichizola (2005); Performance Evaluation of Computation Intensive Tasks in Grid by P.Raghu, K. Sriram (2011); Automatic Distribution of Vision-Tasks on Computing Clusters by Thomas Muller, Binh An Tran and Alois Knoll (2011); Terminology And Taxonomy Parallel Computing Architecture by Amardeep Singh, Satinder Pal Singh, Vandana, Sukhnandan Kaur (2011); Research of Distributed Algorithm based on Parallel Computer Cluster System by Xu He-li, Liu Yan (2010); Cluster Computing Using Orders Based Transparent Parallelizing by Vitaliy D. Pavlenko, Victor V. Burdejnyj (2007) and VCE: A New Personated Virtual Cluster Engine for Cluster Computing by Mohsen Sharifi, Masoud Hassani, Ehsan Mousavi Khaneghah, Seyedeh Leili Mirtaheri (2008). Keywords:Cluster computing, Cluster Architectures, Dynamic and Static Load Balancing, Distributed Systems, Homogeneous and Non-Homogeneous Processors, Multicore clusters, Parallel computing, Parallel Computer Vision, Task parallelism, Terminology and taxonomy, Virtualization, Virtual Cluster

International Institute for Science, Technology and Education (IISTE): E-Journals

A Quasi-Random Approach to Matrix Spectral Analysis

Author: Ben-Or Michael
Eldar Lior
Publication venue
Publication date: 06/04/2017
Field of study

Inspired by the quantum computing algorithms for Linear Algebra problems [HHL,TaShma] we study how the simulation on a classical computer of this type of "Phase Estimation algorithms" performs when we apply it to solve the Eigen-Problem of Hermitian matrices. The result is a completely new, efficient and stable, parallel algorithm to compute an approximate spectral decomposition of any Hermitian matrix. The algorithm can be implemented by Boolean circuits in

O(\log^2 n)

parallel time with a total cost of

O(n^{\omega+1})

Boolean operations. This Boolean complexity matches the best known rigorous

O(\log^2 n)

parallel time algorithms, but unlike those algorithms our algorithm is (logarithmically) stable, so further improvements may lead to practical implementations. All previous efficient and rigorous approaches to solve the Eigen-Problem use randomization to avoid bad condition as we do too. Our algorithm makes further use of randomization in a completely new way, taking random powers of a unitary matrix to randomize the phases of its eigenvalues. Proving that a tiny Gaussian perturbation and a random polynomial power are sufficient to ensure almost pairwise independence of the phases

(\mod (2\pi))

is the main technical contribution of this work. This randomization enables us, given a Hermitian matrix with well separated eigenvalues, to sample a random eigenvalue and produce an approximate eigenvector in

O(\log^2 n)

parallel time and

O(n^\omega)

Boolean complexity. We conjecture that further improvements of our method can provide a stable solution to the full approximate spectral decomposition problem with complexity similar to the complexity (up to a logarithmic factor) of sampling a single eigenvector.Comment: Replacing previous version: parallel algorithm runs in total complexity

n^{\omega+1}

and not

n^{\omega}

. However, the depth of the implementing circuit is

\log^2(n)

: hence comparable to fastest eigen-decomposition algorithms know

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Revisiting Matrix Product on Master-Worker Platforms

Author: Dongarra Jack
Laboratoire de l'informatique du parallélisme
Pineau Jean-François
Robert Yves
Shi Zhiao
Vivien Frédéric
Publication venue
Publication date: 01/01/2006
Field of study

This paper is aimed at designing efficient parallel matrix-product algorithms for heterogeneous master-worker platforms. While matrix-product is well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses that render our work original and innovative: - Centralized data. We assume that all matrix files originate from, and must be returned to, the master. - Heterogeneous star-shaped platforms. We target fully heterogeneous platforms, where computational resources have different computing powers. - Limited memory. Because we investigate the parallelization of large problems, we cannot assume that full matrix panels can be stored in the worker memories and re-used for subsequent updates (as in ScaLAPACK). We have devised efficient algorithms for resource selection (deciding which workers to enroll) and communication ordering (both for input and result messages), and we report a set of numerical experiments on various platforms at Ecole Normale Superieure de Lyon and the University of Tennessee. However, we point out that in this first version of the report, experiments are limited to homogeneous platforms

arXiv.org e-Print Archive

HAL-ENS-LYON

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Libre Acces aux Rapports Scientifiques et Techniques

The University of Manchester - Institutional Repository

Hal-Diderot

Minimizing Communication in Linear Algebra

Author: Blackford L. S.
Grey Ballard
James Demmel
Oded Schwartz
Olga Holtz
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2009
Field of study

In 1981 Hong and Kung proved a lower bound on the amount of communication needed to perform dense, matrix-multiplication using the conventional

O(n^3)

algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it to the parallel case. In both cases the lower bound may be expressed as

\Omega

(#arithmetic operations /

\sqrt{M}

), where M is the size of the fast memory (or local memory in the parallel case). Here we generalize these results to a much wider variety of algorithms, including LU factorization, Cholesky factorization,

LDL^T

factorization, QR factorization, algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra. The proof works for dense or sparse matrices, and for sequential or parallel algorithms. In addition to lower bounds on the amount of data moved (bandwidth) we get lower bounds on the number of messages required to move it (latency). We illustrate how to extend our lower bound technique to compositions of linear algebra operations (like computing powers of a matrix), to decide whether it is enough to call a sequence of simpler optimal algorithms (like matrix multiplication) to minimize communication, or if we can do better. We give examples of both. We also show how to extend our lower bounds to certain graph theoretic problems. We point out recently designed algorithms for dense LU, Cholesky, QR, eigenvalue and the SVD problems that attain these lower bounds; implementations of LU and QR show large speedups over conventional linear algebra algorithms in standard libraries like LAPACK and ScaLAPACK. Many open problems remain.Comment: 27 pages, 2 table

arXiv.org e-Print Archive

CiteSeerX

Crossref