18 research outputs found

    Distributed Shared Memory in a Grid Environment

    Get PDF

    Számítóháló alkalmazások teljesítményanalízise és optimalizációja = Performance analysis and optimisation of grid applications

    Get PDF
    Számítóhálón (griden) futó alkalmazások, elsősorban workflow-k hatékony végrehajtására kerestünk újszerű megoldásokat a grid teljesítményanalízis és optimalizáció területén. Elkészítettük a Mercury monitort a grid teljesítményanalízis követelményeit figyelembe véve. A párhuzamos programok monitorozására alkalmas GRM monitort integráltuk a relációs adatmodell alapú R-GMA grid információs rendszerrel, illetve a Mercury monitorral. Elkészült a Pulse, és a Prove vizualizációs eszköz grid teljesítményanalízist támogató verziója. Elkészítettünk egy state-of-the-art felmérést grid teljesítményanalízis eszközökről. Kidolgoztuk a P-GRADE rendszer workflow absztrakciós rétegét, melyhez kapcsolódóan elkészült a P-GRADE portál. Ennek segítségével a felhasználók egy web böngészőn keresztül szerkeszthetnek és hajthatnak végre workflow alkalmazásokat számítóhálón. A portál különböző számítóháló implementációkat támogat. Lehetőséget biztosít információ gyűjtésére teljesítményanalízis céljából. Megvizsgáltuk a portál erőforrás brókerekkel való együttműködését, felkészítettük a portált a sikertelen futások javítására. A végrehajtás optimalizálása megkövetelheti az alkalmazás egyes részeinek áthelyezését más erőforrásokra. Ennek támogatására továbbfejlesztettük a P-GRADE alkalmazások naplózhatóságát, és illesztettük a Condor feladatütemezőjéhez. Sikeresen kapcsoltunk a rendszerhez egy terhelés elosztó modult, mely képes a terheltségétől függően áthelyezni a folyamatokat. | We investigated novel approaches for performance analysis and optimization for efficient execution of grid applications, especially workflows. We took into consideration the special requirements of grid performance analysis when elaborated Mercury, a grid monitoring infrastructure. GRM, a performance monitor for parallel applications, has been integrated with R-GMA, a relational grid information system and Mercury as well. We developed Pulse and Prove visualisation tools for supporting grid performance analysis. We wrote a comprehensive state-of-the art survey of grid performance tools. We designed a novel abstraction layer of P-GRADE supporting workflows, and a grid portal. Users can draft and execute workflow applications in the grid via a web browser using the portal. The portal supports multiple grid implementations and provides monitoring capabilities for performance analysis. We tested the integration of the portal with grid resource brokers and also augmented it with some degree of fault-tolerance. Optimization may require the migration of parts of the application to different resources and thus, it requires support for checkpointing. We enhanced the checkpointing facilities of P-GRADE and coupled it to Condor job scheduler. We also extended the system with a load balancer module that is able to migrate processes as part of the optimization

    Probabilistic grid scheduling based on job statistics and monitoring information

    Get PDF
    This transfer thesis presents a novel, probabilistic approach to scheduling applications on computational Grids based on their historical behaviour, current state of the Grid and predictions of the future execution times and resource utilisation of such applications. The work lays a foundation for enabling a more intuitive, user-friendly and effective scheduling technique termed deadline scheduling. Initial work has established motivation and requirements for a more efficient Grid scheduler, able to adaptively handle dynamic nature of the Grid resources and submitted workload. Preliminary scheduler research identified the need for a detailed monitoring of Grid resources on the process level, and for a tool to simulate non-deterministic behaviour and statistical properties of Grid applications. A simulation tool, GridLoader, has been developed to enable modelling of application loads similar to a number of typical Grid applications. GridLoader is able to simulate CPU utilisation, memory allocation and network transfers according to limits set through command line parameters or a configuration file. Its specific strength is in achieving set resource utilisation targets in a probabilistic manner, thus creating a dynamic environment, suitable for testing the scheduler’s adaptability and its prediction algorithm. To enable highly granular monitoring of Grid applications, a monitoring framework based on the Ganglia Toolkit was developed and tested. The suite is able to collect resource usage information of individual Grid applications, integrate it into standard XML based information flow, provide visualisation through a Web portal, and export data into a format suitable for off-line analysis. The thesis also presents initial investigation of the utilisation of University College London Central Computing Cluster facility running Sun Grid Engine middleware. Feasibility of basic prediction concepts based on the historical information and process meta-data have been successfully established and possible scheduling improvements using such predictions identified. The thesis is structured as follows: Section 1 introduces Grid computing and its major concepts; Section 2 presents open research issues and specific focus of the author’s research; Section 3 gives a survey of the related literature, schedulers, monitoring tools and simulation packages; Section 4 presents the platform for author’s work – the Self-Organising Grid Resource management project; Sections 5 and 6 give detailed accounts of the monitoring framework and simulation tool developed; Section 7 presents the initial data analysis while Section 8.4 concludes the thesis with appendices and references

    Grid Environment for On-line Application Monitoring and Performance Analysis

    Get PDF

    Dimensionerings- en werkverdelingsalgoritmen voor lambda grids

    Get PDF
    Grids bestaan uit een verzameling reken- en opslagelementen die geografisch verspreid kunnen zijn, maar waarvan men de gezamenlijke capaciteit wenst te benutten. Daartoe dienen deze elementen verbonden te worden met een netwerk. Vermits veel wetenschappelijke applicaties gebruik maken van een Grid, en deze applicaties doorgaans grote hoeveelheden data verwerken, is het noodzakelijk om een netwerk te voorzien dat dergelijke grote datastromen op betrouwbare wijze kan transporteren. Optische transportnetwerken lenen zich hier uitstekend toe. Grids die gebruik maken van dergelijk netwerk noemt men lambda Grids. Deze thesis beschrijft een kader waarin het ontwerp en dimensionering van optische netwerken voor lambda Grids kunnen beschreven worden. Ook wordt besproken hoe werklast kan verdeeld worden op een Grid eens die gedimensioneerd is. Een groot deel van de resultaten werd bekomen door simulatie, waarbij gebruik gemaakt wordt van een eigen Grid simulatiepakket dat precies focust op netwerk- en Gridelementen. Het ontwerp van deze simulator, en de daarbijhorende implementatiekeuzes worden dan ook uitvoerig toegelicht in dit werk

    Autonomous grid scheduling using probabilistic job runtime scheduling

    Get PDF
    Computational Grids are evolving into a global, service-oriented architecture – a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline, and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job, and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error centroid of only 4.75%. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future

    Desarrollo de un workflow genérico para el modelado de problemas de barrido paramétrico en sistemas distribuidos

    Get PDF
    This work presents the development and experimental validation of a generic workflow model applicable to any parameter sweep problem: the Parameter Sweep Scientific Workflow (PSWF) model. As part of it, a model for the monitoring and management of scientific workflows on distributed systems is developed. This model, Star Superscalar Status (SsTAT), is applicable to the StarSs programming model family. PSWF and SsTAT can be used by the scientific community as a reference for solving problems using the parameter sweep strategy. As an integral part of the work, the treatment of the parameter sweep problem is formalized. This is achieved by developing a general solution based on the PSNSS (Parameter Sweep Nested Summation Symbol) algorithm, using both the original sequential and a concurrent approach. Both versions are implemented and validated, showing its applicability to all automatable PSWF lifecycle phases. Load testing shows that large-scale parameter sweep problems can efficiently be addressed with the proposed approach. In addition, the SsTAT monitoring and management generic model is instantiated for a Grid environment. Thus, an operational implementation of SsTAT based on GRIDSs, GSTAT (GRID Superscalar Status), is developed. A series of tests performed on an actual heterogeneous Grid of computers shows that GSTAT can appropriately develop their functionality even in an environment so demanding as that. As a practical case, the model proposed here is applied to determining the molecular potential energy hypersurfaces. For this purpose, a specific instance of the workflow, called PSHYP (Parameter Sweep Hypersurfaces), is created.En este trabajo se presenta el desarrollo y validación experimental de un modelo de workflow genérico, aplicable a cualquier problema de barrido de parámetros, denominado Parameter Sweep Scientific Workflow (PSWF). Asimismo, se diseña y prueba un modelo de monitorización y gestión de workflows científicos, en sistemas distribuidos, designado como SsTAT (Star Superscalar Status) que es aplicable a la familia de modelos de programación Star Superscalar (StarSs). Los modelos PSWF y SsTAT pueden ser utilizados por la comunidad científica como referencia a la hora de resolver problemas mediante la estrategia de barrido de parámetros. Como parte integral del trabajo se formaliza el tratamiento del problema del barrido de parámetros, desarrollándose una solución general concretada en el algoritmo PSNSS (Parameter Sweep Nested Summation Symbol) en su versión secuencial y concurrente. Ambas versiones se implementan y validan, mostrándose su aplicabilidad a todas las fases automatizables del ciclo de vida PSWF. Mediante la realización de varias pruebas de carga se comprueba que el tratamiento de problemas de barrido de parámetros de gran envergadura puede abordarse eficientemente con la aproximación propuesta. A su vez, el modelo genérico de monitorización y gestión SsTAT se particulariza para un entorno Grid, generándose una implementación operativa del mismo, basada en GRIDSs, denominada GSTAT (GRID Superscalar Status). La realización de una serie de pruebas sobre un Grid real de computadores heterogéneo muestra que GSTAT desarrolla apropiadamente sus funciones incluso en un entorno tan exigente como este. Como caso práctico, se aplica el modelo aquí propuesto a la obtención de la hipersuperficie de energía potencial molecular generando a tal efecto un workflow específico denominado PSHYP (Parameter Sweep Hypersurfaces

    Autonomous grid scheduling using probabilistic job runtime forecasting.

    Get PDF
    Computational Grids are evolving into a global, service-oriented architecture a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline. and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job. and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error eentroid of only 4.75CX. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future