442 research outputs found
Enabling Adaptive Grid Scheduling and Resource Management
Wider adoption of the Grid concept has led to an increasing amount of federated
computational, storage and visualisation resources being available to scientists and
researchers. Distributed and heterogeneous nature of these resources renders most of the
legacy cluster monitoring and management approaches inappropriate, and poses new
challenges in workflow scheduling on such systems. Effective resource utilisation monitoring
and highly granular yet adaptive measurements are prerequisites for a more efficient Grid
scheduler. We present a suite of measurement applications able to monitor per-process
resource utilisation, and a customisable tool for emulating observed utilisation models. We
also outline our future work on a predictive and probabilistic Grid scheduler. The research is
undertaken as part of UK e-Science EPSRC sponsored project SO-GRM (Self-Organising
Grid Resource Management) in cooperation with BT
Probabilistic grid scheduling based on job statistics and monitoring information
This transfer thesis presents a novel, probabilistic approach to scheduling applications on computational Grids based on their historical behaviour, current state of the Grid and predictions of the future execution times and resource utilisation of such applications. The work lays a foundation for enabling a more intuitive, user-friendly and effective scheduling technique termed deadline scheduling.
Initial work has established motivation and requirements for a more efficient Grid scheduler, able to adaptively handle dynamic nature of the Grid resources and submitted workload. Preliminary scheduler research identified the need for a detailed monitoring of Grid resources on the process level, and for a tool to simulate non-deterministic behaviour and statistical properties of Grid applications.
A simulation tool, GridLoader, has been developed to enable modelling of application loads similar to a number of typical Grid applications. GridLoader is able to simulate CPU utilisation, memory allocation and network transfers according to limits set through command line parameters or a configuration file. Its specific strength is in achieving set resource utilisation targets in a probabilistic manner, thus creating a dynamic environment, suitable for testing the scheduler’s adaptability and its prediction algorithm.
To enable highly granular monitoring of Grid applications, a monitoring framework based on the Ganglia Toolkit was developed and tested. The suite is able to collect resource usage information of individual Grid applications, integrate it into standard XML based information flow, provide visualisation through a Web portal, and export data into a format suitable for off-line analysis.
The thesis also presents initial investigation of the utilisation of University College London Central Computing Cluster facility running Sun Grid Engine middleware. Feasibility of basic prediction concepts based on the historical information and process meta-data have been successfully established and possible scheduling improvements using such predictions identified.
The thesis is structured as follows: Section 1 introduces Grid computing and its major concepts; Section 2 presents open research issues and specific focus of the author’s research; Section 3 gives a survey of the related literature, schedulers, monitoring tools and simulation packages; Section 4 presents the platform for author’s work – the Self-Organising Grid Resource management project; Sections 5 and 6 give detailed accounts of the monitoring framework and simulation tool developed; Section 7 presents the initial data analysis while Section 8.4 concludes the thesis with appendices and references
SzámĂtĂłhálĂł alkalmazások teljesĂtmĂ©nyanalĂzise Ă©s optimalizáciĂłja = Performance analysis and optimisation of grid applications
SzámĂtĂłhálĂłn (griden) futĂł alkalmazások, elsĹ‘sorban workflow-k hatĂ©kony vĂ©grehajtására kerestĂĽnk Ăşjszerű megoldásokat a grid teljesĂtmĂ©nyanalĂzis Ă©s optimalizáciĂł terĂĽletĂ©n. ElkĂ©szĂtettĂĽk a Mercury monitort a grid teljesĂtmĂ©nyanalĂzis követelmĂ©nyeit figyelembe vĂ©ve. A párhuzamos programok monitorozására alkalmas GRM monitort integráltuk a reláciĂłs adatmodell alapĂş R-GMA grid informáciĂłs rendszerrel, illetve a Mercury monitorral. ElkĂ©szĂĽlt a Pulse, Ă©s a Prove vizualizáciĂłs eszköz grid teljesĂtmĂ©nyanalĂzist támogatĂł verziĂłja. ElkĂ©szĂtettĂĽnk egy state-of-the-art felmĂ©rĂ©st grid teljesĂtmĂ©nyanalĂzis eszközökrĹ‘l. Kidolgoztuk a P-GRADE rendszer workflow absztrakciĂłs rĂ©tegĂ©t, melyhez kapcsolĂłdĂłan elkĂ©szĂĽlt a P-GRADE portál. Ennek segĂtsĂ©gĂ©vel a felhasználĂłk egy web böngĂ©szĹ‘n keresztĂĽl szerkeszthetnek Ă©s hajthatnak vĂ©gre workflow alkalmazásokat számĂtĂłhálĂłn. A portál kĂĽlönbözĹ‘ számĂtĂłhálĂł implementáciĂłkat támogat. LehetĹ‘sĂ©get biztosĂt informáciĂł gyűjtĂ©sĂ©re teljesĂtmĂ©nyanalĂzis cĂ©ljábĂłl. Megvizsgáltuk a portál erĹ‘forrás brĂłkerekkel valĂł egyĂĽttműködĂ©sĂ©t, felkĂ©szĂtettĂĽk a portált a sikertelen futások javĂtására. A vĂ©grehajtás optimalizálása megkövetelheti az alkalmazás egyes rĂ©szeinek áthelyezĂ©sĂ©t más erĹ‘forrásokra. Ennek támogatására továbbfejlesztettĂĽk a P-GRADE alkalmazások naplĂłzhatĂłságát, Ă©s illesztettĂĽk a Condor feladatĂĽtemezĹ‘jĂ©hez. Sikeresen kapcsoltunk a rendszerhez egy terhelĂ©s elosztĂł modult, mely kĂ©pes a terheltsĂ©gĂ©tĹ‘l fĂĽggĹ‘en áthelyezni a folyamatokat. | We investigated novel approaches for performance analysis and optimization for efficient execution of grid applications, especially workflows. We took into consideration the special requirements of grid performance analysis when elaborated Mercury, a grid monitoring infrastructure. GRM, a performance monitor for parallel applications, has been integrated with R-GMA, a relational grid information system and Mercury as well. We developed Pulse and Prove visualisation tools for supporting grid performance analysis. We wrote a comprehensive state-of-the art survey of grid performance tools. We designed a novel abstraction layer of P-GRADE supporting workflows, and a grid portal. Users can draft and execute workflow applications in the grid via a web browser using the portal. The portal supports multiple grid implementations and provides monitoring capabilities for performance analysis. We tested the integration of the portal with grid resource brokers and also augmented it with some degree of fault-tolerance. Optimization may require the migration of parts of the application to different resources and thus, it requires support for checkpointing. We enhanced the checkpointing facilities of P-GRADE and coupled it to Condor job scheduler. We also extended the system with a load balancer module that is able to migrate processes as part of the optimization
Self-organising management of Grid environments
This paper presents basic concepts, architectural principles and algorithms for efficient resource and security management in cluster computing environments and the Grid. The work presented in this paper is funded by BTExacT and the EPSRC project SO-GRM (GR/S21939)
- …