34 research outputs found
CAWL: A Cache-aware Write Performance Model of Linux Systems
The performance of data intensive applications is often dominated by their
input/output (I/O) operations but the I/O stack of systems is complex and
severely depends on system specific settings and hardware components. This
situation makes generic performance optimisation challenging and costly for
developers as they would have to run their application on a large variety of
systems to evaluate their improvements. Here, simulation frameworks can help
reducing the experimental overhead but they typically handle the topic of I/O
rather coarse-grained, which leads to significant inaccuracies in performance
predictions. Here, we propose a more accurate model of the write performance of
Linux-based systems that takes different I/O methods and levels (via system
calls, library calls, direct or indirect, etc.), the page cache, background
writing, and the I/O throttling capabilities of the Linux kernel into account.
With our model, we reduce, for example, the relative prediction error compared
to a standard I/O model included in SimGrid for a random I/O scenario from 67 %
down to 10 % relative error against real measurements of the simulated
workload. In other scenarios the differences are even more pronounced.Comment: 22 pages, 9 figures, 1 tabl
Autonomic Management of Large Clusters and Their Integration into the Grid
We present a framework for the co-ordinated, autonomic management of multiple clusters in a compute center and their integration into a Grid environment. Site autonomy and the automation of administrative tasks are prime aspects in this framework. The system behavior is continuously monitored in a steering cycle and appropriate actions are taken to resolve any problems. All presented components have been implemented in the course of the EU project DataGrid: The Lemon monitoring components, the FT fault-tolerance mechanism, the quattor system for software installation and configuration, the RMS job and resource management system, and the Gridification scheme that integrates clusters into the Grid
C3Grid als Werkzeug für das Datenmanagement in der Klimaforschung
Im C3Grid wird ein zentrales Datenmanagement eingesetzt, um die Datenbestände von verteilten Archiven zu verwalten. In einem kollaborativen Workspace können die Daten unabhängig von ihrem Speicherort vom Nutzer bearbeitet werden. Mit seiner Hilfe wird auch eine Brücke zu Datenknoten des Earth System Grid geschlagen, in denen sich die Daten des CMIP5/IPCC AR5 befinden
On the Cost of Reliability in Large Data Grids
Global grid environments do not only provide massive aggregated computing power but also an unprecedented amount of distributed storage space. Unfortunately, dynamic changes caused by component failures, local decisions, and irregular data updates make it difficult to efficiently use this capacity. In this paper, we address the problem of improving data availability in the presence of unreliable components. We present an analytical model for determining an optimal combination of distributed replica catalogs, catalog sizes, and replica servers. Empirical simulation results confirm the accuracy of our theoretical analysis. Our model captures the characteristics of highly dynamic environments like peer-to-peer networks, but it can also be applied to more centralized, less dynamic grid environments like the European DataGrid
Executing and observing CFD applications on the Grid
We present the FlowGrid system, that allows Computational Fluid Dynamics (CFD) simulations to be executed in Grid environments. Using this system, users can observe online the progress of their simulation by looking at intermediate results, that are visualized in the graphical user interface. Several Grid centers across Europe currently use and validate the system with their CFD computations and build a ‘CFD Virtual Organization ’ to share their resources and balance their processing load. We first describe the overall FlowGrid architecture, highlight its special features and present the system along a typical job execution. The Grid infrastructure, i.e. FlowServe, is presented in detail, a description of the accounting system is given and experiences with the FlowGrid testbed are provided. Finally, we provide evidence that the results can be used as a generic CFD Grid service. © 2004 Elsevier B.V. All rights reserved
P2P Routing of Range Queries in Skewed Multidimensional Data Sets ⋆
Abstract. We present a middleware to store multidimensional data sets on Internet-scale distributed systems and to efficiently perform range queries on them. Our structured overlay network SONAR (Structured Overlay Network with Arbitrary Range queries) puts keys which are adjacent in the key space on logically adjacent nodes in the overlay and is thereby able to process multidimensional range queries with a single logarithmic data lookup and local forwarding. The specified ranges may have arbitrary shapes like rectangles, circles, spheres or polygons. Empirical results demonstrate the routing performance of SONAR on several data sets, ranging from real-world data to artificially constructed worst case distributions. We study the quality of SONAR’s routing structure which is based on local knowledge only and measure the indegree of the overlay nodes to find potential hot spots in the overlay. We show that SONAR’s routing table is self-adjusting, even under extreme situations, keeping always a maximum of ⌈log N ⌉ routing entries. Key words: structured overlays, range queries, routing, multidimensional data set
Grid-Enabled Computational Fluid Dynamics using FlowGrid
We present an architecture for Computational Fluid Dynamics (CFD) applications, that we developed for the FlowGrid project. FlowGrid revolutionizes the way CFD simulations are set up, executed and monitored. In this project several Grid centers across Europe develop and validate their software for Grid-based CFD computations. The 'CFD Virtual Organization' of FlowGrid provides industrial end users easy and flexible access to CFD resources