2,388 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Methods and design issues for next generation network-aware applications

    Get PDF
    Networks are becoming an essential component of modern cyberinfrastructure and this work describes methods of designing distributed applications for high-speed networks to improve application scalability, performance and capabilities. As the amount of data generated by scientific applications continues to grow, to be able to handle and process it, applications should be designed to use parallel, distributed resources and high-speed networks. For scalable application design developers should move away from the current component-based approach and implement instead an integrated, non-layered architecture where applications can use specialized low-level interfaces. The main focus of this research is on interactive, collaborative visualization of large datasets. This work describes how a visualization application can be improved through using distributed resources and high-speed network links to interactively visualize tens of gigabytes of data and handle terabyte datasets while maintaining high quality. The application supports interactive frame rates, high resolution, collaborative visualization and sustains remote I/O bandwidths of several Gbps (up to 30 times faster than local I/O). Motivated by the distributed visualization application, this work also researches remote data access systems. Because wide-area networks may have a high latency, the remote I/O system uses an architecture that effectively hides latency. Five remote data access architectures are analyzed and the results show that an architecture that combines bulk and pipeline processing is the best solution for high-throughput remote data access. The resulting system, also supporting high-speed transport protocols and configurable remote operations, is up to 400 times faster than a comparable existing remote data access system. Transport protocols are compared to understand which protocol can best utilize high-speed network connections, concluding that a rate-based protocol is the best solution, being 8 times faster than standard TCP. An HD-based remote teaching application experiment is conducted, illustrating the potential of network-aware applications in a production environment. Future research areas are presented, with emphasis on network-aware optimization, execution and deployment scenarios

    Transferring big data across the globe

    Get PDF
    Transmitting data via the Internet is a routine and common task for users today. The amount of data being transmitted by the average user has dramatically increased over the past few years. Transferring a gigabyte of data in an entire day was normal, however users are now transmitting multiple gigabytes in a single hour. With the influx of big data and massive scientific data sets that are measured in tens of petabytes, a user has the propensity to transfer even larger amounts of data. When transferring data sets of this magnitude on public or shared networks, the performance of all workloads in the system will be impacted. This dissertation addresses the issues and challenges inherent with transferring big data over shared networks. A survey of current transfer techniques is provided and these techniques are evaluated in simulated, experimental and live environments. The main contribution of this dissertation is the development of a new, nice model for big data transfers, which is based on a store-and-forward methodology instead of an end-to-end approach. This nice model ensures that big data transfers only occur when there is idle bandwidth that can be repurposed for these large transfers. The nice model improves overall performance and significantly reduces the transmission time for big data transfers. The model allows for efficient transfers regardless of time zone differences or variations in bandwidth between sender and receiver. Nice is the first model that addresses the challenges of transferring big data across the globe

    Failure-awareness and dynamic adaptation in data scheduling

    Get PDF
    Over the years, scientific applications have become more complex and more data intensive. Especially large scale simulations and scientific experiments in areas such as physics, biology, astronomy and earth sciences demand highly distributed resources to satisfy excessive computational requirements. Increasing data requirements and the distributed nature of the resources made I/O the major bottleneck for end-to-end application performance. Existing systems fail to address issues such as reliability, scalability, and efficiency in dealing with wide area data access, retrieval and processing. In this study, we explore data-intensive distributed computing and study challenges in data placement in distributed environments. After analyzing different application scenarios, we develop new data scheduling methodologies and the key attributes for reliability, adaptability and performance optimization of distributed data placement tasks. Inspired by techniques used in microprocessor and operating system architectures, we extend and adapt some of the known low-level data handling and optimization techniques to distributed computing. Two major contributions of this work include (i) a failure-aware data placement paradigm for increased fault-tolerance, and (ii) adaptive scheduling of data placement tasks for improved end-to-end performance. The failure-aware data placement includes early error detection, error classification, and use of this information in scheduling decisions for the prevention of and recovery from possible future errors. The adaptive scheduling approach includes dynamically tuning data transfer parameters over wide area networks for efficient utilization of available network capacity and optimized end-to-end data transfer performance

    Grid Infrastructure for Domain Decomposition Methods in Computational ElectroMagnetics

    Get PDF
    The accurate and efficient solution of Maxwell's equation is the problem addressed by the scientific discipline called Computational ElectroMagnetics (CEM). Many macroscopic phenomena in a great number of fields are governed by this set of differential equations: electronic, geophysics, medical and biomedical technologies, virtual EM prototyping, besides the traditional antenna and propagation applications. Therefore, many efforts are focussed on the development of new and more efficient approach to solve Maxwell's equation. The interest in CEM applications is growing on. Several problems, hard to figure out few years ago, can now be easily addressed thanks to the reliability and flexibility of new technologies, together with the increased computational power. This technology evolution opens the possibility to address large and complex tasks. Many of these applications aim to simulate the electromagnetic behavior, for example in terms of input impedance and radiation pattern in antenna problems, or Radar Cross Section for scattering applications. Instead, problems, which solution requires high accuracy, need to implement full wave analysis techniques, e.g., virtual prototyping context, where the objective is to obtain reliable simulations in order to minimize measurement number, and as consequence their cost. Besides, other tasks require the analysis of complete structures (that include an high number of details) by directly simulating a CAD Model. This approach allows to relieve researcher of the burden of removing useless details, while maintaining the original complexity and taking into account all details. Unfortunately, this reduction implies: (a) high computational effort, due to the increased number of degrees of freedom, and (b) worsening of spectral properties of the linear system during complex analysis. The above considerations underline the needs to identify appropriate information technologies that ease solution achievement and fasten required elaborations. The authors analysis and expertise infer that Grid Computing techniques can be very useful to these purposes. Grids appear mainly in high performance computing environments. In this context, hundreds of off-the-shelf nodes are linked together and work in parallel to solve problems, that, previously, could be addressed sequentially or by using supercomputers. Grid Computing is a technique developed to elaborate enormous amounts of data and enables large-scale resource sharing to solve problem by exploiting distributed scenarios. The main advantage of Grid is due to parallel computing, indeed if a problem can be split in smaller tasks, that can be executed independently, its solution calculation fasten up considerably. To exploit this advantage, it is necessary to identify a technique able to split original electromagnetic task into a set of smaller subproblems. The Domain Decomposition (DD) technique, based on the block generation algorithm introduced in Matekovits et al. (2007) and Francavilla et al. (2011), perfectly addresses our requirements (see Section 3.4 for details). In this chapter, a Grid Computing infrastructure is presented. This architecture allows parallel block execution by distributing tasks to nodes that belong to the Grid. The set of nodes is composed by physical machines and virtualized ones. This feature enables great flexibility and increase available computational power. Furthermore, the presence of virtual nodes allows a full and efficient Grid usage, indeed the presented architecture can be used by different users that run different applications

    Self-organising smart grid architectures for cyber-security

    Get PDF
    PhD ThesisCurrent conventional power systems consist of large-scale centralised generation and unidirectional power flow from generation to demand. This vision for power system design is being challenged by the need to satisfy the energy trilemma, as the system is required to be sustainable, available and secure. Emerging technologies are restructuring the power system; the addition of distributed generation, energy storage and active participation of customers are changing the roles and requirements of the distribution network. Increased controllability and monitoring requirements combined with an increase in controllable technologies has played a pivotal role in the transition towards smart grids. The smart grid concept features a large amount of sensing and monitoring equipment sharing large volumes of information. This increased reliance on the ICT infrastructure, raises the importance of cyber-security due to the number of vulnerabilities which can be exploited by an adversary. The aim of this research was to address the issue of cyber-security within a smart grid context through the application of self-organising communication architectures. The work examined the relevance and potential for self-organisation when performing voltage control in the presence of a denial of service attack event. The devised self-organising architecture used techniques adapted from a range of research domains including underwater sensor networks, wireless communications and smart-vehicle tracking applications. These components were redesigned for a smart grid application and supported by the development of a fuzzy based decision making engine. A multi-agent system was selected as the source platform for delivering the self-organising architecture The application of self-organisation for cyber-security within a smart grid context is a novel research area and one which presents a wide range of potential benefits for a future power system. The results indicated that the developed self-organising architecture was able to avoid control deterioration during an attack event involving up to 24% of the customer population. Furthermore, the system also reduces the communication load on the agents involved in the architecture and demonstrated wider reaching benefits beyond performing voltage control

    On Cloud-based multisource Reliable Multicast Transport in Broadband Multimedia Satellite Networks

    Get PDF
    Multimedia synchronization, Software Over the Air, Personal Information Management on Cloud networks require new reliable protocols, which reduce the traffic load in the core and edge network. This work shows via simulations the performance of an efficient multicast file delivery, which advantage of the distributed file storage in Cloud computing. The performance evaluation focuses on the case of a personal satellite equipment with error prone channels
    • ā€¦
    corecore