Search CORE

769 research outputs found

A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

Author: Buyya Rajkumar
Ramamohanarao Kotagiri
Venugopal Srikumar
Publication venue
Publication date: 10/06/2005
Field of study

Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

arXiv.org e-Print Archive

CiteSeerX

University of Melbourne Institutional Repository

High performance subgraph mining in molecular compounds

Author: M.J. Zaki
O. Weislow
R. Finkel
T. Washio
Y. Chung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations

KOPS - The Institutional Repository of the University of Konstanz

Central Archive at the University of Reading

Crossref

Dynamic load balancing for the distributed mining of molecular structures

Author: Berthold M.R.
Di Fatta Giuseppe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

KOPS - The Institutional Repository of the University of Konstanz

Central Archive at the University of Reading

Crossref

Découverte et allocation des ressources pour le traitement de requêtes dans les systèmes grilles

Author: Cokuslu Deniz
Publication venue
Publication date: 02/11/2012
Field of study

De nos jours, les systèmes Grille, grâce à leur importante capacité de calcul et de stockage ainsi que leur disponibilité, constituent l'un des plus intéressants environnements informatiques. Dans beaucoup de différents domaines, on constate l'utilisation fréquente des facilités que les environnements Grille procurent. Le traitement des requêtes distribuées est l'un de ces domaines où il existe de grandes activités de recherche en cours, pour transférer l'environnement sous-jacent des systèmes distribués et parallèles à l'environnement Grille. Dans le cadre de cette thèse, nous nous concentrons sur la découverte des ressources et des algorithmes d'allocation de ressources pour le traitement des requêtes dans les environnements Grille. Pour ce faire, nous proposons un algorithme de découverte des ressources pour le traitement des requêtes dans les systèmes Grille en introduisant le contrôle de topologie auto-stabilisant et l'algorithme de découverte des ressources dirigé par l'élection convergente. Ensuite, nous présentons un algorithme d'allocation des ressources, qui réalise l'allocation des ressources pour les requêtes d'opérateur de jointure simple par la génération d'un espace de recherche réduit pour les nœuds candidats et en tenant compte des proximités des candidats aux sources de données. Nous présentons également un autre algorithme d'allocation des ressources pour les requêtes d'opérateurs de jointure multiple. Enfin, on propose un algorithme d'allocation de ressources, qui apporte une tolérance aux pannes lors de l'exécution de la requête par l'utilisation de la réplication passive d'opérateurs à état. La contribution générale de cette thèse est double. Premièrement, nous proposons un nouvel algorithme de découverte de ressource en tenant compte des caractéristiques des environnements Grille. Nous nous adressons également aux problèmes d'extensibilité et de dynamicité en construisant une topologie efficace sur l'environnement Grille et en utilisant le concept d'auto-stabilisation, et par la suite nous adressons le problème de l'hétérogénéité en proposant l'algorithme de découverte de ressources dirigé par l'élection convergente. La deuxième contribution de cette thèse est la proposition d'un nouvel algorithme d'allocation des ressources en tenant compte des caractéristiques de l'environnement Grille. Nous abordons les problèmes causés par la grande échelle caractéristique en réduisant l'espace de recherche pour les ressources candidats. De ce fait nous réduisons les coûts de communication au cours de l'exécution de la requête en allouant des nœuds au plus près des sources de données. Et enfin nous traitons la dynamicité des nœuds, du point de vue de leur existence dans le système, en proposant un algorithme d'affectation des ressources avec une tolérance aux pannes.Grid systems are today's one of the most interesting computing environments because of their large computing and storage capabilities and their availability. Many different domains profit the facilities of grid environments. Distributed query processing is one of these domains in which there exists large amounts of ongoing research to port the underlying environment from distributed and parallel systems to the grid environment. In this thesis, we focus on resource discovery and resource allocation algorithms for query processing in grid environments. For this, we propose resource discovery algorithm for query processing in grid systems by introducing self-stabilizing topology control and converge-cast based resource discovery algorithms. Then, we propose a resource allocation algorithm, which realizes allocation of resources for single join operator queries by generating a reduced search space for the candidate nodes and by considering proximities of candidates to the data sources. We also propose another resource allocation algorithm for queries with multiple join operators. Lastly, we propose a fault-tolerant resource allocation algorithm, which provides fault-tolerance during the execution of the query by the use of passive replication of stateful operators. The general contribution of this thesis is twofold. First, we propose a new resource discovery algorithm by considering the characteristics of the grid environments. We address scalability and dynamicity problems by constructing an efficient topology over the grid environment using the self-stabilization concept; and we deal with the heterogeneity problem by proposing the converge-cast based resource discovery algorithm. The second main contribution of this thesis is the proposition of a new resource allocation algorithm considering the characteristics of the grid environment. We tackle the scalability problem by reducing the search space for candidate resources. We decrease the communication costs during the query execution by allocating nodes closer to the data sources. And finally we deal with the dynamicity of nodes, in terms of their existence in the system, by proposing the fault-tolerant resource allocation algorithm

Thèses en ligne de l'Université Toulouse III - Paul Sabatier

A Practical Study of Self-Stabilization for Prefix-Tree Based Overlay Networks

Author: Acretoaie Vlad
Caron Eddy
Tedeschi Cédric
Publication venue: HAL CCSD
Publication date: 19/04/2010
Field of study

Service discovery is crucial in the development of fully decentralized computational grids. Among the significant amount of work produced by the convergence of peer-to-peer (P2P) systems and grids, a new kind of overlay networks, based on prefix trees, has emerged. In particular, the Distributed Lexicographic Placement Table (DLPT) approach is a decentralized and dynamic service discovery service. Fault-tolerance within the DLPT approach is achieved through best-effort policies relying on formal self-stabilization results. Self-stabilization means that the tree can become transiently inconsistent, but is guaranteed to autonomously converge to a correct topology after arbitrary crashes, in a finite time. However, during convergence, the tree may not be able to process queries correctly. In this paper, we present some simulation results having several objectives. First, we investigate the interest of self-stabilization for such architectures. Second, we explore, still based on simulation, a simple Time-To-Live policy to avoid useless processing during convergence time

HAL-ENS-LYON

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Peer-to-Peer Networks and Computation: Current Trends and Future Perspectives

Author: Awasthi Lalit K.
Gupta Ankur
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

This research papers examines the state-of-the-art in the area of P2P networks/computation. It attempts to identify the challenges that confront the community of P2P researchers and developers, which need to be addressed before the potential of P2P-based systems, can be effectively realized beyond content distribution and file-sharing applications to build real-world, intelligent and commercial software systems. Future perspectives and some thoughts on the evolution of P2P-based systems are also provided

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

ProActive: an Integrated platform for programming and running applications on grids and P2P systems

Author: Caromel Denis
Delbe Christian
Di Costanzo Alexandre
Leyton Mario
Publication venue: 'Institute of Systematics and Evolution of Animals, Polish Academy of Sciences'
Publication date: 01/01/2006
Field of study

International audienceWe propose a grid programming approach using the ProActive middleware. The proposed strategy addresses several grid concerns, which we have classified into three categories. I. Grid Infrastructure which handles the resource acquisition and creation using deployment descriptors and Peer-to-Peer. II. Grid Technical Services which can provide non-functional transparent services like: fault tolerance, load balancing, and file transfer. III. Grid Higher Level programming with: group communication and hierarchical components. We have validated our approach with several grid programming experiences running applications on heterogeneous Grid resource using more than 1000 CPUs

HAL-UNICE

INRIA a CCSD electronic archive server