1,464 research outputs found
State-of-the-Art in Parallel Computing with R
R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly useful for general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems four different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix
A First Step Towards Automatically Building Network Representations
To fully harness Grids, users or middlewares must have some knowledge on the
topology of the platform interconnection network. As such knowledge is usually
not available, one must uses tools which automatically build a topological
network model through some measurements. In this article, we define a
methodology to assess the quality of these network model building tools, and we
apply this methodology to representatives of the main classes of model builders
and to two new algorithms. We show that none of the main existing techniques
build models that enable to accurately predict the running time of simple
application kernels for actual platforms. However some of the new algorithms we
propose give excellent results in a wide range of situations
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
Recommended from our members
A high resolution coupled hydrologic–hydraulic model (HiResFlood-UCI) for flash flood modeling
HiResFlood-UCI was developed by coupling the NWS's hydrologic model (HL-RDHM) with the hydraulic model (BreZo) for flash flood modeling at decameter resolutions. The coupled model uses HL-RDHM as a rainfall-runoff generator and replaces the routing scheme of HL-RDHM with the 2D hydraulic model (BreZo) in order to predict localized flood depths and velocities. A semi-automated technique of unstructured mesh generation was developed to cluster an adequate density of computational cells along river channels such that numerical errors are negligible compared with other sources of error, while ensuring that computational costs of the hydraulic model are kept to a bare minimum. HiResFlood-UCI was implemented for a watershed (ELDO2) in the DMIP2 experiment domain in Oklahoma. Using synthetic precipitation input, the model was tested for various components including HL-RDHM parameters (a priori versus calibrated), channel and floodplain Manning n values, DEM resolution (10 m versus 30 m) and computation mesh resolution (10 m+ versus 30 m+). Simulations with calibrated versus a priori parameters of HL-RDHM show that HiResFlood-UCI produces reasonable results with the a priori parameters from NWS. Sensitivities to hydraulic model resistance parameters, mesh resolution and DEM resolution are also identified, pointing to the importance of model calibration and validation for accurate prediction of localized flood intensities. HiResFlood-UCI performance was examined using 6 measured precipitation events as model input for model calibration and validation of the streamflow at the outlet. The Nash–Sutcliffe Efficiency (NSE) obtained ranges from 0.588 to 0.905. The model was also validated for the flooded map using USGS observed water level at an interior point. The predicted flood stage error is 0.82 m or less, based on a comparison to measured stage. Validation of stage and discharge predictions builds confidence in model predictions of flood extent and localized velocities, which are fundamental to reliable flash flood warning
Transferring big data across the globe
Transmitting data via the Internet is a routine and common task for users today. The amount of data being transmitted by the average user has dramatically increased over the past few years. Transferring a gigabyte of data in an entire day was normal, however users are now transmitting multiple gigabytes in a single hour. With the influx of big data and massive scientific data sets that are measured in tens of petabytes, a user has the propensity to transfer even larger amounts of data. When transferring data sets of this magnitude on public or shared networks, the performance of all workloads in the system will be impacted.
This dissertation addresses the issues and challenges inherent with transferring big data over shared networks. A survey of current transfer techniques is provided and these techniques are evaluated in simulated, experimental and live environments. The main contribution of this dissertation is the development of a new, nice model for big data transfers, which is based on a store-and-forward methodology instead of an end-to-end approach. This nice model ensures that big data transfers only occur when there is idle bandwidth that can be repurposed for these large transfers. The nice model improves overall performance and significantly reduces the transmission time for big data transfers. The model allows for efficient transfers regardless of time zone differences or variations in bandwidth between sender and receiver. Nice is the first model that addresses the challenges of transferring big data across the globe
Resource and Application Models for Advanced Grid Schedulers
As Grid computing is becoming an inevitable future, managing, scheduling and monitoring dynamic, heterogeneous resources will present new challenges. Solutions will have to be agile and adaptive, support self-organization and autonomous management, while maintaining optimal resource utilisation. Presented in this paper are basic principles and architectural concepts for efficient resource allocation in heterogeneous Grid environment
- …