3,747 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Probabilistic grid scheduling based on job statistics and monitoring information

    Get PDF
    This transfer thesis presents a novel, probabilistic approach to scheduling applications on computational Grids based on their historical behaviour, current state of the Grid and predictions of the future execution times and resource utilisation of such applications. The work lays a foundation for enabling a more intuitive, user-friendly and effective scheduling technique termed deadline scheduling. Initial work has established motivation and requirements for a more efficient Grid scheduler, able to adaptively handle dynamic nature of the Grid resources and submitted workload. Preliminary scheduler research identified the need for a detailed monitoring of Grid resources on the process level, and for a tool to simulate non-deterministic behaviour and statistical properties of Grid applications. A simulation tool, GridLoader, has been developed to enable modelling of application loads similar to a number of typical Grid applications. GridLoader is able to simulate CPU utilisation, memory allocation and network transfers according to limits set through command line parameters or a configuration file. Its specific strength is in achieving set resource utilisation targets in a probabilistic manner, thus creating a dynamic environment, suitable for testing the scheduler’s adaptability and its prediction algorithm. To enable highly granular monitoring of Grid applications, a monitoring framework based on the Ganglia Toolkit was developed and tested. The suite is able to collect resource usage information of individual Grid applications, integrate it into standard XML based information flow, provide visualisation through a Web portal, and export data into a format suitable for off-line analysis. The thesis also presents initial investigation of the utilisation of University College London Central Computing Cluster facility running Sun Grid Engine middleware. Feasibility of basic prediction concepts based on the historical information and process meta-data have been successfully established and possible scheduling improvements using such predictions identified. The thesis is structured as follows: Section 1 introduces Grid computing and its major concepts; Section 2 presents open research issues and specific focus of the author’s research; Section 3 gives a survey of the related literature, schedulers, monitoring tools and simulation packages; Section 4 presents the platform for author’s work – the Self-Organising Grid Resource management project; Sections 5 and 6 give detailed accounts of the monitoring framework and simulation tool developed; Section 7 presents the initial data analysis while Section 8.4 concludes the thesis with appendices and references

    PFS: A Productivity Forecasting System For Desktop Computers To Improve Grid Applications Performance In Enterprise Desktop Grid

    Get PDF
    An Enterprise Desktop Grid (EDG) is a low cost platform that gathers desktop computers spread over different institutions. This platform uses desktop computers idle time to run Grid applications. We argue that computers in these environments have a predictable productivity that affects a Grid application execution time. In this paper, we propose a system called PFS for computer productivity forecasting that improves Grid applications performance. We simulated 157.500 applications and compared the performance achieved by our proposal against two recent strategies. Our experiments show that a Grid scheduler based on PFS runs applications faster than schedulers based on other selection strategies.Fil: Salinas, Sergio Ariel. Universidad Nacional de Cuyo; ArgentinaFil: Garcia Garino, Carlos Gabriel. Universidad Nacional de Cuyo; ArgentinaFil: Zunino Suarez, Alejandro Octavio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Tandil. Instituto Superior de Ingenieria del Software; Argentin

    Big Data in Critical Infrastructures Security Monitoring: Challenges and Opportunities

    Full text link
    Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infrastructures, are more and more vulnerable to cyber threats, due to the adoption of commodity computing facilities. Despite the use of several monitoring tools, recent attacks have proven that current defensive mechanisms for CIs are not effective enough against most advanced threats. In this paper we explore the idea of a framework leveraging multiple data sources to improve protection capabilities of CIs. Challenges and opportunities are discussed along three main research directions: i) use of distinct and heterogeneous data sources, ii) monitoring with adaptive granularity, and iii) attack modeling and runtime combination of multiple data analysis techniques.Comment: EDCC-2014, BIG4CIP-201

    PFS: A Productivity Forecasting System for Desktop Computers to Improve Grid Applications Performance in Enterprise Desktop Grid

    Get PDF
    An Enterprise Desktop Grid (EDG) is a low cost platform that gathers desktop computers spread over different institutions. This platform uses desktop computers idle time to run Grid applications. We argue that computers in these environments have a predictable productivity that affects a Grid application execution time. In this paper, we propose a system called PFS for computer productivity forecasting that improves Grid applications performance. We simulated 157.500 applications and compared the performance achieved by our proposal against two recent strategies. Our experiments show that a Grid scheduler based on PFS runs applications faster than schedulers based on other selection strategies

    NetJobs: A new approach to network monitoring for the Grid using Grid jobs

    Get PDF
    With grid computing, the far-fl�ung and disparate IT resources act as a single "virtual datacenter". Grid computing interfaces heterogeneous IT resources so they are available when and where we need them. Grid allows us to provision applications and allocate capacity among research and business groups that are geographically and organizationally dispersed. Building a high availability Grid is hold as the next goal to achieve: protecting against computer failures and site failures to avoid downtime of resource and honor Service Level Agreements. Network monitoring has a key role in this challenge. This work is concerning the design and the prototypal implementation of a new approach to Network monitoring for the Grid based on the usage of Grid scheduled jobs. This work was carried out within the Network Support task (SA2) of the Enabling Grids for E-sciencE (EGEE) project. This thesis is organized as follows: Chapter 1: Grid Computing From the origins of Grid Computing to the latest projects. Conceptual framework and main features characterizing many kind of popular grids will be presented. Chapter 2: The EGEE and EGI projects This chapter describes the Enabling Grids for E-sciencE (EGEE) project and the European Grid Infrastructure (EGI). EGEE project (2004-2010) was the�flagship Grid infrastructure project of the EU. The third and last two-year phase of the project (started on 1 May 2008) was financed with a total budget of around 47 million euro, with a further estimated 50 million euro worth of computing resources contributed by the partners. A total manpower of 9,000 Person Months, of which over 4,500 Person Months has been contributed by the partners from their own funding sources. At its close, EGEE represented a worldwide infrastructure of approximately to 200,000 CPU cores, collaboratively hosted by more than 300 centres around the world. By the end of the project, around 13 million jobs were executed on the EGEE grid each month. The new organization, EGI.eu, has then been created to continue the coordination and evolution of the European Grid Infrastructure (EGI) based on EGEE Grid. Chapter3: gLite Middleware Chapter three gives an overview on the gLite Grid Middleware. gLite is the middleware stack for grid computing used by the EGEE and EGI projects with in a very large variety of scientifi�c domains. Born from the collaborative efforts of more than 80 people in 12 different academic and industrial research centers as part of the EGEE Project, gLite provides a complete set of services for building a production grid infrastructure. gLite provides a framework for building grid applications tapping into the power of distributed computing and storage resources across the Internet. The gLite services are currently adopted by more than 250 Computing Centres and used by more than 15000 researchers in Europe and around the world. Chapter 4: Network Activity in EGEE/EGI Grid infrastructures are distributed by nature, involving many sites, normally in different administrative domains. Individual sites are connected together by a network, which is therefore a critical part of the whole Grid infrastructure; without the network there is no Grid. Monitoring is a key component for the successful operation of any infrastructure, helping in the discovery and diagnosis of any problem which may arise. Network monitoring is able to contribute to the day-to-day operations of the Grid by helping to provide answers to specific questions from users and site administrators. This chapter will discuss all the effort lavished by EGEE and EGI in the Grid Network domain. Chapter 5: Grid Network Monitoring based on Grid Jobs Net Jobs is a prototype of a light weight solution for the Grid network monitoring. A job-based approach has been used in order to prove the feasibility of this non intrusive solution. It is currently configured to monitor eight production sites spread from Italy to France but this method could be applied to the vast majority of Grid sites. The prototype provides coherent RTT, MTU, number of hops and TCP achievable bandwidth tests
    • …
    corecore