120,061 research outputs found
Iso-array rewriting P systems with context-free iso-array rules
A new computing model called P system is a highly distributed and
parallel theoretical model, which is proposed in the area of membrane computing. Ceterchi et al. initially proposed array rewriting P systems by extending the notion of string rewriting P systems to arrays (2003). A theoretical model for picture generation using context-free iso-array grammar rules and puzzle iso-array grammar rules are introduced by Kalyani et al. (2004, 2006). Also iso-array rewriting P systems for iso-picture languages have been studied by Annadurai et al. (2008). In this paper we consider the context-free iso-array rules and context-free puzzle iso-array rules in iso-array rewriting P systems and examine the generative powers
of these P systems
Development of Cluster Computing âA Review
This paper presents the review work of âCluster Computingâ in depth and detail. Cluster Computing: A Mobile Code Approach by R.B.Patel and Manpreet Singh (2006); Performance Evaluation of Parallel Applications Using Message Passing Interface In Network of Workstations Of Different Computing Powers by Rajkumar Sharma, Priyesh Kanungo and Manohar Chandwani (2011); On the Performance of MPI-OpenMP on a 12 nodes Multi-core Cluster by Abdelgadir Tageldin, Al-Sakib Khan Pathan , Mohiuddin Ahmed (2011); Dynamic Load Balancing in Parallel Processing on Non-Homogeneous Clusters by Armando E. De Giusti, Marcelo R. Naiouf, Laura C. De Giusti, Franco Chichizola (2005); Performance Evaluation of Computation Intensive Tasks in Grid by P.Raghu, K. Sriram (2011); Automatic Distribution of Vision-Tasks on Computing Clusters by Thomas Muller, Binh An Tran and Alois Knoll (2011); Terminology And Taxonomy Parallel Computing Architecture by Amardeep Singh, Satinder Pal Singh, Vandana, Sukhnandan Kaur (2011); Research of Distributed Algorithm based on Parallel Computer Cluster System by Xu He-li, Liu Yan (2010); Cluster Computing Using Orders Based Transparent Parallelizing by Vitaliy D. Pavlenko, Victor V. Burdejnyj (2007) and VCE: A New Personated Virtual Cluster Engine for Cluster Computing by Mohsen Sharifi, Masoud Hassani, Ehsan Mousavi Khaneghah, Seyedeh Leili Mirtaheri (2008). Keywords:Cluster computing, Cluster Architectures, Dynamic and Static Load Balancing, Distributed Systems, Homogeneous and Non-Homogeneous Processors, Multicore clusters, Parallel computing, Parallel Computer Vision, Task parallelism, Terminology and taxonomy, Virtualization, Virtual Cluster
A Quasi-Random Approach to Matrix Spectral Analysis
Inspired by the quantum computing algorithms for Linear Algebra problems
[HHL,TaShma] we study how the simulation on a classical computer of this type
of "Phase Estimation algorithms" performs when we apply it to solve the
Eigen-Problem of Hermitian matrices. The result is a completely new, efficient
and stable, parallel algorithm to compute an approximate spectral decomposition
of any Hermitian matrix. The algorithm can be implemented by Boolean circuits
in parallel time with a total cost of Boolean
operations. This Boolean complexity matches the best known rigorous parallel time algorithms, but unlike those algorithms our algorithm is
(logarithmically) stable, so further improvements may lead to practical
implementations.
All previous efficient and rigorous approaches to solve the Eigen-Problem use
randomization to avoid bad condition as we do too. Our algorithm makes further
use of randomization in a completely new way, taking random powers of a unitary
matrix to randomize the phases of its eigenvalues. Proving that a tiny Gaussian
perturbation and a random polynomial power are sufficient to ensure almost
pairwise independence of the phases is the main technical
contribution of this work. This randomization enables us, given a Hermitian
matrix with well separated eigenvalues, to sample a random eigenvalue and
produce an approximate eigenvector in parallel time and
Boolean complexity. We conjecture that further improvements of
our method can provide a stable solution to the full approximate spectral
decomposition problem with complexity similar to the complexity (up to a
logarithmic factor) of sampling a single eigenvector.Comment: Replacing previous version: parallel algorithm runs in total
complexity and not . However, the depth of the
implementing circuit is : hence comparable to fastest
eigen-decomposition algorithms know
Revisiting Matrix Product on Master-Worker Platforms
This paper is aimed at designing efficient parallel matrix-product algorithms
for heterogeneous master-worker platforms. While matrix-product is
well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm
and ScaLAPACK outer product algorithm), there are three key hypotheses that
render our work original and innovative:
- Centralized data. We assume that all matrix files originate from, and must
be returned to, the master.
- Heterogeneous star-shaped platforms. We target fully heterogeneous
platforms, where computational resources have different computing powers.
- Limited memory. Because we investigate the parallelization of large
problems, we cannot assume that full matrix panels can be stored in the worker
memories and re-used for subsequent updates (as in ScaLAPACK).
We have devised efficient algorithms for resource selection (deciding which
workers to enroll) and communication ordering (both for input and result
messages), and we report a set of numerical experiments on various platforms at
Ecole Normale Superieure de Lyon and the University of Tennessee. However, we
point out that in this first version of the report, experiments are limited to
homogeneous platforms
Minimizing Communication in Linear Algebra
In 1981 Hong and Kung proved a lower bound on the amount of communication
needed to perform dense, matrix-multiplication using the conventional
algorithm, where the input matrices were too large to fit in the small, fast
memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and
extended it to the parallel case. In both cases the lower bound may be
expressed as (#arithmetic operations / ), where M is the size
of the fast memory (or local memory in the parallel case). Here we generalize
these results to a much wider variety of algorithms, including LU
factorization, Cholesky factorization, factorization, QR factorization,
algorithms for eigenvalues and singular values, i.e., essentially all direct
methods of linear algebra. The proof works for dense or sparse matrices, and
for sequential or parallel algorithms. In addition to lower bounds on the
amount of data moved (bandwidth) we get lower bounds on the number of messages
required to move it (latency). We illustrate how to extend our lower bound
technique to compositions of linear algebra operations (like computing powers
of a matrix), to decide whether it is enough to call a sequence of simpler
optimal algorithms (like matrix multiplication) to minimize communication, or
if we can do better. We give examples of both. We also show how to extend our
lower bounds to certain graph theoretic problems.
We point out recently designed algorithms for dense LU, Cholesky, QR,
eigenvalue and the SVD problems that attain these lower bounds; implementations
of LU and QR show large speedups over conventional linear algebra algorithms in
standard libraries like LAPACK and ScaLAPACK. Many open problems remain.Comment: 27 pages, 2 table
- âŠ