159 research outputs found

    Failure-awareness and dynamic adaptation in data scheduling

    Get PDF
    Over the years, scientific applications have become more complex and more data intensive. Especially large scale simulations and scientific experiments in areas such as physics, biology, astronomy and earth sciences demand highly distributed resources to satisfy excessive computational requirements. Increasing data requirements and the distributed nature of the resources made I/O the major bottleneck for end-to-end application performance. Existing systems fail to address issues such as reliability, scalability, and efficiency in dealing with wide area data access, retrieval and processing. In this study, we explore data-intensive distributed computing and study challenges in data placement in distributed environments. After analyzing different application scenarios, we develop new data scheduling methodologies and the key attributes for reliability, adaptability and performance optimization of distributed data placement tasks. Inspired by techniques used in microprocessor and operating system architectures, we extend and adapt some of the known low-level data handling and optimization techniques to distributed computing. Two major contributions of this work include (i) a failure-aware data placement paradigm for increased fault-tolerance, and (ii) adaptive scheduling of data placement tasks for improved end-to-end performance. The failure-aware data placement includes early error detection, error classification, and use of this information in scheduling decisions for the prevention of and recovery from possible future errors. The adaptive scheduling approach includes dynamically tuning data transfer parameters over wide area networks for efficient utilization of available network capacity and optimized end-to-end data transfer performance

    Grid Approach to Satellite Monitoring Systems Integration

    Get PDF
    This paper highlights the challenges of satellite monitoring systems integration, in particular based on Grid platform, and reviews possible solutions for these problems. We describe integration issues on different levels: data integration level and task management level (job submission in terms of Grid). We show example of described technologies for integration of monitoring systems of Ukraine (National Space Agency of Ukraine, NASU) and Russia (Space Research Institute RAS, IKI RAN). Another example refers to the development of InterGrid infrastructure that integrates several regional and national Grid systems: Ukrainian Academician Grid (with Satellite data processing Grid segment) and RSGS Grid (Chinese Academy of Sciences)

    A framework for evolving grid computing systems.

    Get PDF
    Grid computing was born in the 1990s, when researchers were looking for a way to share expensive computing resources and experiment equipment. Grid computing is becoming increasingly popular because it promotes the sharing of distributed resources that may be heterogeneous in nature, and it enables scientists and engineering professionals to solve large scale computing problems. In reality, there are already huge numbers of grid computing facilities distributed around the world, each one having been created to serve a particular group of scientists such as weather forecasters, or a group of users such as stock markets. However, the need to extend the functionalities of current grid systems lends itself to the consideration of grid evolution. This allows the combination of many disjunct grids into a single powerful grid that can operate as one vast computational resource, as well as for grid environments to be flexible, to be able to change and to evolve. The rationale for grid evolution is the current rapid and increasing advances in both software and hardware. Evolution means adding or removing capabilities. This research defines grid evolution as adding new functions and/or equipment and removing unusable resources that affect the performance of some nodes. This thesis produces a new technique for grid evolution, allowing it to be seamless and to operate at run time. Within grid computing, evolution is an integration of software and hardware and can be of two distinct types, external and internal. Internal evolution occurs inside the grid boundary by migrating special resources such as application software from node to node inside the grid. While external evolution occurs between grids. This thesis develops a framework for grid evolution that insulates users from the complexities of grids. This framework has at its core a resource broker together with a grid monitor to cope with internal and external evolution, advance reservation, fault tolerance, the monitoring of the grid environment, increased resource utilisation and the high availability of grid resources. The starting point for the present framework of grid evolution is when the grid receives a job whose requirements do not exist on the required node which triggers grid evolution. If the grid has all the requirements scattered across its nodes, internal evolution enabling the grid to migrate the required resources to the required node in order to satisfy job requirements ensues, but if the grid does not have these resources, external evolution enables the grid either to collect them from other grids (permanent evolution) or to send the job to other grids for execution (just in time) evolution. Finally a simulation tool called (EVOSim) has been designed, developed and tested. It is written in Oracle 10g and has been used for the creation of four grids, each of which has a different setup including different nodes, application software, data and polices. Experiments were done by submitting jobs to the grid at run time, and then comparing the results and analysing the performance of those grids that use the approach of evolution with those that do not. The results of these experiments have demonstrated that these features significantly improve the performance of grid environments and provide excellent scheduling results, with a decreasing number of rejected jobs

    IQ-Services: Network-Aware Middleware for Interactive Large-Data Applications

    Get PDF
    IQ-Services are application-specific, resource-aware code modules executed by data transport middleware. They constitute a 'thin' layer between application components and the underlying computational and communication resources that implements the data manipulations necessary to permit wide-area collaborations to proceed smoothly, despite dynamic resource variations. IQ-Services interact with the application and resource layers via dynamic performance attributes, and end-to-end implementations of such attributes also permit clients to interact with data providers. Joint middleware/resource and provider/consumer interactions implement a cooperative approach to data management for the large-data applications targeted by our research. Experimental results in this paper demonstrate substantial performance improvements attained by coordinating network-level with service-level adaptations of the data being transported and by permitting end users to dynamically deploy and use application-specific services for manipulating data in ways suitable for their current needs

    A Grid-Enabled Infrastructure for Resource Sharing, E-Learning, Searching and Distributed Repository Among Universities

    Get PDF
    In the recent years, service-based approaches for sharing of data among repositories and online learning are rising to prominence because of their potential to meet the requirements in the area of high performance computing. Developing education based grid services and assuring high availability reliability and scalability are demanding in web service architectures. On the other hand, grid computing provides flexibility towards aggregating distributed CPU, memory, storage, data and supports large number of distributed resource sharing to provide the full potential for education like applications to share the knowledge that can be attainable on any single system. However, the literature shows that the potential of grid resources for educational purposes is not being utilized yet. In this paper, an education based grid framework architecture that provides promising platform to support sharing of geographically dispersed learning content among universities is developed. It allows students, faculty and researchers to share and gain knowledge in their area of interest by using e-learning, searching and distributed repository services among universities from anywhere, anytime. Globus toolkit 5.2.5 (GTK) software is used as grid middleware that provides resource access, discovery and management, data movement, security, and so forth. Furthermore, this work uses the OGSA-DAI that provides database access and operations. The resulting infrastructure enables users to discover education services and interact with them using the grid portal

    Architectural approaches to a science network software-defined exchange

    Get PDF
    To interconnect research facilities across wide geographic areas, network operators deploy science networks, also referred to as Research and Education (R&E) networks. These networks allow experimenters to establish dedicated circuits between research facilities for transferring large amounts of data, by using advanced reservation systems. Intercontinental dedicated circuits typically require coordination between multiple administrative domains, which need to reach an agreement on a suitable advance reservation. To enhance provisioning capabilities of multi-domain advance reservations, we propose an architecture for end-to-end service orchestration in multi-domain science networks that leverages software-defined networking (SDN) and software-defined exchanges (SDX) for providing multi-path, multi-domain advance reservations. Our simulations show our orchestration architecture increases the reservation success rate. We evaluate our solution using GridFTP, one of the most popular tools for data transfers in the scientific community. Additionally, we propose an interface that domain scientists can use to request science network services from our orchestration framework. Furthermore, we propose a federated auditing framework (FAS) that allows an SDX to verify whether the configurations requested by a user are correctly enforced by participating SDN domains, whether the configurations requested are correctly removed after their expiration time, and whether configurations exist that are performing non-requested actions. We also propose an architecture for advance reservation access control using SDN and tokens.Ph.D

    Methods and design issues for next generation network-aware applications

    Get PDF
    Networks are becoming an essential component of modern cyberinfrastructure and this work describes methods of designing distributed applications for high-speed networks to improve application scalability, performance and capabilities. As the amount of data generated by scientific applications continues to grow, to be able to handle and process it, applications should be designed to use parallel, distributed resources and high-speed networks. For scalable application design developers should move away from the current component-based approach and implement instead an integrated, non-layered architecture where applications can use specialized low-level interfaces. The main focus of this research is on interactive, collaborative visualization of large datasets. This work describes how a visualization application can be improved through using distributed resources and high-speed network links to interactively visualize tens of gigabytes of data and handle terabyte datasets while maintaining high quality. The application supports interactive frame rates, high resolution, collaborative visualization and sustains remote I/O bandwidths of several Gbps (up to 30 times faster than local I/O). Motivated by the distributed visualization application, this work also researches remote data access systems. Because wide-area networks may have a high latency, the remote I/O system uses an architecture that effectively hides latency. Five remote data access architectures are analyzed and the results show that an architecture that combines bulk and pipeline processing is the best solution for high-throughput remote data access. The resulting system, also supporting high-speed transport protocols and configurable remote operations, is up to 400 times faster than a comparable existing remote data access system. Transport protocols are compared to understand which protocol can best utilize high-speed network connections, concluding that a rate-based protocol is the best solution, being 8 times faster than standard TCP. An HD-based remote teaching application experiment is conducted, illustrating the potential of network-aware applications in a production environment. Future research areas are presented, with emphasis on network-aware optimization, execution and deployment scenarios

    Data transfer scheduling with advance reservation and provisioning

    Get PDF
    Over the years, scientific applications have become more complex and more data intensive. Although through the use of distributed resources the institutions and organizations gain access to the resources needed for their large-scale applications, complex middleware is required to orchestrate the use of these storage and network resources between collaborating parties, and to manage the end-to-end processing of data. We present a new data scheduling paradigm with advance reservation and provisioning. Our methodology provides a basis for provisioning end-to-end high performance data transfers which require integration between system, storage and network resources, and coordination between reservation managers and data transfer nodes. This allows researchers/users and higher level meta-schedulers to use data placement as a service where they can plan ahead and reserve time and resources for their data movement operations. We present a novel approach for evaluating time-dependent structures with bandwidth guaranteed paths. We present a practical online scheduling model using advance reservation in dynamic network with time constraints. In addition, we report a new polynomial algorithm presenting possible reservation options and alternatives for earliest completion and shortest transfer duration. We enhance the advance network reservation system by extending the underlying mechanism to provide a new service in which users submit their constraints and the system suggests possible reservation requests satisfying users\u27 requirements. We have studied scheduling data transfer operation with resource and time conflicts. We have developed a new scheduling methodology considering resource allocation in client sites and bandwidth allocation on network link connecting resources. Some other major contributions of our study include enhanced reliability, adaptability, and performance optimization of distributed data placement tasks. While designing this new data scheduling architecture, we also developed other important methodologies such as early error detection, failure awareness, job aggregation, and dynamic adaptation of distributed data placement tasks. The adaptive tuning includes dynamically setting data transfer parameters and controlling utilization of available network capacity. Our research aims to provide a middleware to improve the data bottleneck in high performance computing systems
    • …
    corecore