212,472 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    HPS-HDS:High Performance Scheduling for Heterogeneous Distributed Systems

    Get PDF
    Heterogeneous Distributed Systems (HDS) are often characterized by a variety of resources that may or may not be coupled with specific platforms or environments. Such type of systems are Cluster Computing, Grid Computing, Peer-to-Peer Computing, Cloud Computing and Ubiquitous Computing all involving elements of heterogeneity, having a large variety of tools and software to manage them. As computing and data storage needs grow exponentially in HDS, increasing the size of data centers brings important diseconomies of scale. In this context, major solutions for scalability, mobility, reliability, fault tolerance and security are required to achieve high performance. More, HDS are highly dynamic in its structure, because the user requests must be respected as an agreement rule (SLA) and ensure QoS, so new algorithm for events and tasks scheduling and new methods for resource management should be designed to increase the performance of such systems. In this special issues, the accepted papers address the advance on scheduling algorithms, energy-aware models, self-organizing resource management, data-aware service allocation, Big Data management and processing, performance analysis and optimization

    Accelerating Large-scale Data Exploration through Data Diffusion

    Full text link
    Data-intensive applications often require exploratory analysis of large datasets. If analysis is performed on distributed resources, data locality can be crucial to high throughput and performance. We propose a "data diffusion" approach that acquires compute and storage resources dynamically, replicates data in response to demand, and schedules computations close to data. As demand increases, more resources are acquired, thus allowing faster response to subsequent requests that refer to the same data; when demand drops, resources are released. This approach can provide the benefits of dedicated hardware without the associated high costs, depending on workload and resource characteristics. The approach is reminiscent of cooperative caching, web-caching, and peer-to-peer storage systems, but addresses different application demands. Other data-aware scheduling approaches assume dedicated resources, which can be expensive and/or inefficient if load varies significantly. To explore the feasibility of the data diffusion approach, we have extended the Falkon resource provisioning and task scheduling system to support data caching and data-aware scheduling. Performance results from both micro-benchmarks and a large scale astronomy application demonstrate that our approach improves performance relative to alternative approaches, as well as provides improved scalability as aggregated I/O bandwidth scales linearly with the number of data cache nodes.Comment: IEEE/ACM International Workshop on Data-Aware Distributed Computing 200

    Bipartite graph structures for efficient balancing of heterogeneous loads

    Get PDF
    International audienceThis paper considers large scale distributed content service platforms, such as peer-to-peer video-on-demand systems. Such systems feature two basic resources, namely storage and bandwidth. Their efficiency critically depends on two factors: (i) content replication within servers, and (ii) how incoming service requests are matched to servers holding requested content. To inform the corresponding design choices, we make the following contributions. We first show that, for underloaded systems, so-called proportional content placement with a simple greedy strategy for matching requests to servers ensures full system efficiency provided storage size grows logarithmically with the system size. However, for constant storage size, this strategy undergoes a phase transition with severe loss of efficiency as system load approaches criticality. To better understand the role of the matching strategy in this performance degradation, we characterize the asymptotic system efficiency under an optimal matching policy. Our analysis shows that -in contrast to greedy matching- optimal matching incurs an inefficiency that is exponentially small in the server storage size, even at critical system loads. It further allows a characterization of content replication policies that minimize the inefficiency. These optimal policies, which differ markedly from proportional placement, have a simple structure which makes them implementable in practice. On the methodological side, our analysis of matching performance uses the theory of local weak limits of random graphs, and highlights a novel characterization of matching numbers in bipartite graphs, which may both be of independent interest

    System support for keyword-based search in structured Peer-to-Peer systems

    Get PDF
    In this dissertation, we present protocols for building a distributed search infrastructure over structured Peer-to-Peer systems. Unlike existing search engines which consist of large server farms managed by a centralized authority, our approach makes use of a distributed set of end-hosts built out of commodity hardware. These end-hosts cooperatively construct and maintain the search infrastructure. The main challenges with distributing such a system include node failures, churn, and data migration. Localities inherent in query patterns also cause load imbalances and hot spots that severely impair performance. Users of search systems want their results returned quickly, and in ranked order. Our main contribution is to show that a scalable, robust, and distributed search infrastructure can be built over existing Peer-to-Peer systems through the use of techniques that address these problems. We present a decentralized scheme for ranking search results without prohibitive network or storage overhead. We show that caching allows for efficient query evaluation and present a distributed data structure, called the View Tree, that enables efficient storage, and retrieval of cached results. We also present a lightweight adaptive replication protocol, called LAR that can adapt to different kinds of query streams and is extremely effective at eliminating hotspots. Finally, we present techniques for storing indexes reliably. Our approach is to use an adaptive partitioning protocol to store large indexes and employ efficient redundancy techniques to handle failures. Through detailed analysis and experiments we show that our techniques are efficient and scalable, and that they make distributed search feasible

    Conceptual modelling to assess how the interplay of hydrological connectivity, catchment storage and tracer dynamics controls nonstationary water age estimates

    Get PDF
    Acknowledgements We would like to gratefully acknowledge the data provided by SEPA, Iain Malcolm. Mark Speed, Susan Waldron and many MSS staff helped with sample collection and lab analysis. We thank the European Research Council (project GA 335910 VEWA) for funding and are grateful for the constructive comments provided by three anonymous reviewers.Peer reviewedPostprin
    corecore