212,472 research outputs found
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
HPS-HDS:High Performance Scheduling for Heterogeneous Distributed Systems
Heterogeneous Distributed Systems (HDS) are often characterized by a variety of resources that may or may not be coupled with specific platforms or environments. Such type of systems are Cluster Computing, Grid Computing, Peer-to-Peer Computing, Cloud Computing and Ubiquitous Computing all involving elements of heterogeneity, having a large variety of tools and software to manage them. As computing and data storage needs grow exponentially in HDS, increasing the size of data centers brings important diseconomies of scale. In this context, major solutions for scalability, mobility, reliability, fault tolerance and security are required to achieve high performance. More, HDS are highly dynamic in its structure, because the user requests must be respected as an agreement rule (SLA) and ensure QoS, so new algorithm for events and tasks scheduling and new methods for resource management should be designed to increase the performance of such systems. In this special issues, the accepted papers address the advance on scheduling algorithms, energy-aware models, self-organizing resource management, data-aware service allocation, Big Data management and processing, performance analysis and optimization
Accelerating Large-scale Data Exploration through Data Diffusion
Data-intensive applications often require exploratory analysis of large
datasets. If analysis is performed on distributed resources, data locality can
be crucial to high throughput and performance. We propose a "data diffusion"
approach that acquires compute and storage resources dynamically, replicates
data in response to demand, and schedules computations close to data. As demand
increases, more resources are acquired, thus allowing faster response to
subsequent requests that refer to the same data; when demand drops, resources
are released. This approach can provide the benefits of dedicated hardware
without the associated high costs, depending on workload and resource
characteristics. The approach is reminiscent of cooperative caching,
web-caching, and peer-to-peer storage systems, but addresses different
application demands. Other data-aware scheduling approaches assume dedicated
resources, which can be expensive and/or inefficient if load varies
significantly. To explore the feasibility of the data diffusion approach, we
have extended the Falkon resource provisioning and task scheduling system to
support data caching and data-aware scheduling. Performance results from both
micro-benchmarks and a large scale astronomy application demonstrate that our
approach improves performance relative to alternative approaches, as well as
provides improved scalability as aggregated I/O bandwidth scales linearly with
the number of data cache nodes.Comment: IEEE/ACM International Workshop on Data-Aware Distributed Computing
200
Bipartite graph structures for efficient balancing of heterogeneous loads
International audienceThis paper considers large scale distributed content service platforms, such as peer-to-peer video-on-demand systems. Such systems feature two basic resources, namely storage and bandwidth. Their efficiency critically depends on two factors: (i) content replication within servers, and (ii) how incoming service requests are matched to servers holding requested content. To inform the corresponding design choices, we make the following contributions. We first show that, for underloaded systems, so-called proportional content placement with a simple greedy strategy for matching requests to servers ensures full system efficiency provided storage size grows logarithmically with the system size. However, for constant storage size, this strategy undergoes a phase transition with severe loss of efficiency as system load approaches criticality. To better understand the role of the matching strategy in this performance degradation, we characterize the asymptotic system efficiency under an optimal matching policy. Our analysis shows that -in contrast to greedy matching- optimal matching incurs an inefficiency that is exponentially small in the server storage size, even at critical system loads. It further allows a characterization of content replication policies that minimize the inefficiency. These optimal policies, which differ markedly from proportional placement, have a simple structure which makes them implementable in practice. On the methodological side, our analysis of matching performance uses the theory of local weak limits of random graphs, and highlights a novel characterization of matching numbers in bipartite graphs, which may both be of independent interest
System support for keyword-based search in structured Peer-to-Peer systems
In this dissertation, we present protocols for building a distributed search infrastructure over structured Peer-to-Peer systems. Unlike existing search engines which consist of large server farms managed by a centralized authority, our approach makes use of a distributed set of end-hosts built out of commodity hardware. These end-hosts cooperatively construct and maintain the search infrastructure.
The main challenges with distributing such a system include node failures, churn, and data migration. Localities inherent in query patterns also cause load imbalances and hot spots that severely impair performance. Users of search systems want their results returned quickly, and in ranked order. Our main contribution is to show that a scalable, robust, and distributed search infrastructure can be built over existing Peer-to-Peer systems through the use of techniques that address these problems. We present a decentralized scheme for ranking search results without prohibitive network or storage overhead. We show that caching allows for efficient query evaluation and present a distributed data structure, called the View Tree, that enables efficient storage, and retrieval of cached results. We also present a lightweight adaptive replication protocol, called LAR that can adapt to different kinds of query streams and is extremely effective at eliminating hotspots. Finally, we present techniques for storing indexes reliably. Our approach is to use an adaptive partitioning protocol to store large indexes and employ efficient redundancy techniques to handle failures. Through detailed analysis and experiments we show that our techniques are efficient and scalable, and that they make distributed search feasible
Conceptual modelling to assess how the interplay of hydrological connectivity, catchment storage and tracer dynamics controls nonstationary water age estimates
Acknowledgements We would like to gratefully acknowledge the data provided by SEPA, Iain Malcolm. Mark Speed, Susan Waldron and many MSS staff helped with sample collection and lab analysis. We thank the European Research Council (project GA 335910 VEWA) for funding and are grateful for the constructive comments provided by three anonymous reviewers.Peer reviewedPostprin
- …