51,844 research outputs found
Multihop clustering algorithm for load balancing in wireless sensor networks
The paper presents a new cluster based routing algorithm that exploits the redundancy properties of the sensor networks in order to address the traditional problem of load balancing and energy efficiency in the WSNs.The algorithm makes use of the nodes in a sensor network of which area coverage is covered by the neighbours of the nodes and mark them as temporary cluster heads. The algorithm then forms two layers of multi hop communication. The bottom layer which involves intra cluster communication and the top layer which involves inter cluster communication involving the temporary cluster heads. Performance studies indicate that the proposed algorithm solves effectively the problem of load balancing and is also more efficient in terms of energy consumption from Leach and the enhanced version of Leach
Multihop clustering algorithm for load balancing in wireless sensor networks
The paper presents a new cluster based routing algorithm that exploits the redundancy properties of the sensor networks in order to address the traditional problem of load balancing and energy efficiency in the WSNs.The algorithm makes use of the nodes in a sensor network of which area coverage is covered by the neighbours of the nodes and mark them as temporary cluster heads. The algorithm then forms two layers of multi hop communication. The bottom layer which involves intra cluster communication and the top layer which involves inter cluster communication involving the temporary cluster heads. Performance studies indicate that the proposed algorithm solves effectively the problem of load balancing and is also more efficient in terms of energy consumption from Leach and the enhanced version of Leach
Hydra: A Parallel Adaptive Grid Code
We describe the first parallel implementation of an adaptive
particle-particle, particle-mesh code with smoothed particle hydrodynamics.
Parallelisation of the serial code, ``Hydra'', is achieved by using CRAFT, a
Cray proprietary language which allows rapid implementation of a serial code on
a parallel machine by allowing global addressing of distributed memory.
The collisionless variant of the code has already completed several 16.8
million particle cosmological simulations on a 128 processor Cray T3D whilst
the full hydrodynamic code has completed several 4.2 million particle combined
gas and dark matter runs. The efficiency of the code now allows parameter-space
explorations to be performed routinely using particles of each species.
A complete run including gas cooling, from high redshift to the present epoch
requires approximately 10 hours on 64 processors.
In this paper we present implementation details and results of the
performance and scalability of the CRAFT version of Hydra under varying degrees
of particle clustering.Comment: 23 pages, LaTex plus encapsulated figure
Dynamic load balancing in parallel KD-tree k-means
One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis.
Techniques for improving the efficiency of k-Means have been
largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing
issue. Three solutions have been developed and tested. Two
approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy
GreeM : Massively Parallel TreePM Code for Large Cosmological N-body Simulations
In this paper, we describe the implementation and performance of GreeM, a
massively parallel TreePM code for large-scale cosmological N-body simulations.
GreeM uses a recursive multi-section algorithm for domain decomposition. The
size of the domains are adjusted so that the total calculation time of the
force becomes the same for all processes. The loss of performance due to
non-optimal load balancing is around 4%, even for more than 10^3 CPU cores.
GreeM runs efficiently on PC clusters and massively-parallel computers such as
a Cray XT4. The measured calculation speed on Cray XT4 is 5 \times 10^4
particles per second per CPU core, for the case of an opening angle of
\theta=0.5, if the number of particles per CPU core is larger than 10^6.Comment: 13 pages, 11 figures, accepted by PAS
Mixing multi-core CPUs and GPUs for scientific simulation software
Recent technological and economic developments have led to widespread availability of
multi-core CPUs and specialist accelerator processors such as graphical processing units
(GPUs). The accelerated computational performance possible from these devices can be very
high for some applications paradigms. Software languages and systems such as NVIDIA's
CUDA and Khronos consortium's open compute language (OpenCL) support a number of
individual parallel application programming paradigms. To scale up the performance of some
complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and
very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica-
tions using threading approaches and multi-core CPUs to control independent GPU devices.
We present speed-up data and discuss multi-threading software issues for the applications
level programmer and o er some suggested areas for language development and integration
between coarse-grained and ne-grained multi-thread systems. We discuss results from three
common simulation algorithmic areas including: partial di erential equations; graph cluster
metric calculations and random number generation. We report on programming experiences
and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs;
a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and
trends in multi-core programming for scienti c applications developers
- …