69,213 research outputs found
State-of-the-Art in Parallel Computing with R
R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly useful for general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems four different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix
Fuzzy C-Mean And Genetic Algorithms Based Scheduling For Independent Jobs In Computational Grid
The concept of Grid computing is becoming the most important research area in the high performance computing. Under this concept, the jobs scheduling in Grid computing has more complicated problems to discover a diversity of available resources, select the appropriate applications and map to suitable resources. However, the major problem is the optimal job scheduling, which Grid nodes need to allocate the appropriate resources for each job. In this paper, we combine Fuzzy C-Mean and Genetic Algorithms which are popular algorithms, the Grid can be used for scheduling. Our model presents the method of the jobs classifications based mainly on Fuzzy C-Mean algorithm and mapping the jobs to the appropriate resources based mainly on Genetic algorithm. In the experiments, we used the workload historical information and put it into our simulator. We get the better result when compared to the traditional algorithms for scheduling policies. Finally, the paper also discusses approach of the jobs classifications and the optimization engine in Grid scheduling
Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform
“This is a post-peer-review, pre-copyedit version of an article published in Journal of Grid Computing. The final authenticated version is available online at: https://doi.org/10.1007/s10723-013-9250-y[Abstract] Cloud computing is currently being explored by the scientific community to assess its suitability for High Performance Computing (HPC) environments. In this novel paradigm, compute and storage resources, as well as applications, can be dynamically provisioned on a pay-per-use basis. This paper presents a thorough evaluation of the I/O storage subsystem using the Amazon EC2 Cluster Compute platform and the recent High I/O instance type, to determine its suitability for I/O-intensive applications. The evaluation has been carried out at different layers using representative benchmarks in order to evaluate the low-level cloud storage devices available in Amazon EC2, ephemeral disks and Elastic Block Store (EBS) volumes, both on local and distributed file systems. In addition, several I/O interfaces (POSIX, MPI-IO and HDF5) commonly used by scientific workloads have also been assessed. Furthermore, the scalability of a representative parallel I/O code has also been analyzed at the application level, taking into account both performance and cost metrics. The analysis of the experimental results has shown that available cloud storage devices can have different performance characteristics and usage constraints. Our comprehensive evaluation can help scientists to increase significantly (up to several times) the performance of I/O-intensive applications in Amazon EC2 cloud. An example of optimal configuration that can maximize I/O performance in this cloud is the use of a RAID 0 of 2 ephemeral disks, TCP with 9,000 bytes MTU, NFS async and MPI-IO on the High I/O instance type, which provides ephemeral disks backed by Solid State Drive (SSD) technology.Ministerio de Ciencia e Innovación; TIN2010-16735Ministerio de Educación; AP2010-4348Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; ref. 2010/
Libra: An Economy driven Job Scheduling System for Clusters
Clusters of computers have emerged as mainstream parallel and distributed
platforms for high-performance, high-throughput and high-availability
computing. To enable effective resource management on clusters, numerous
cluster managements systems and schedulers have been designed. However, their
focus has essentially been on maximizing CPU performance, but not on improving
the value of utility delivered to the user and quality of services. This paper
presents a new computational economy driven scheduling system called Libra,
which has been designed to support allocation of resources based on the users?
quality of service (QoS) requirements. It is intended to work as an add-on to
the existing queuing and resource management system. The first version has been
implemented as a plugin scheduler to the PBS (Portable Batch System) system.
The scheduler offers market-based economy driven service for managing batch
jobs on clusters by scheduling CPU time according to user utility as determined
by their budget and deadline rather than system performance considerations. The
Libra scheduler ensures that both these constraints are met within an O(n)
run-time. The Libra scheduler has been simulated using the GridSim toolkit to
carry out a detailed performance analysis. Results show that the deadline and
budget based proportional resource allocation strategy improves the utility of
the system and user satisfaction as compared to system-centric scheduling
strategies.Comment: 13 page
Cluster-Based Load Balancing Algorithms for Grids
E-science applications may require huge amounts of data and high processing
power where grid infrastructures are very suitable for meeting these
requirements. The load distribution in a grid may vary leading to the
bottlenecks and overloaded sites. We describe a hierarchical dynamic load
balancing protocol for Grids. The Grid consists of clusters and each cluster is
represented by a coordinator. Each coordinator first attempts to balance the
load in its cluster and if this fails, communicates with the other coordinators
to perform transfer or reception of load. This process is repeated
periodically. We analyze the correctness, performance and scalability of the
proposed protocol and show from the simulation results that our algorithm
balances the load by decreasing the number of high loaded nodes in a grid
environment.Comment: 17 pages, 11 figures; International Journal of Computer Networks,
volume3, number 5, 201
- …