769 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    High performance subgraph mining in molecular compounds

    Get PDF
    Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations

    Dynamic load balancing for the distributed mining of molecular structures

    Get PDF
    In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

    DĂ©couverte et allocation des ressources pour le traitement de requĂȘtes dans les systĂšmes grilles

    Get PDF
    De nos jours, les systĂšmes Grille, grĂące Ă  leur importante capacitĂ© de calcul et de stockage ainsi que leur disponibilitĂ©, constituent l'un des plus intĂ©ressants environnements informatiques. Dans beaucoup de diffĂ©rents domaines, on constate l'utilisation frĂ©quente des facilitĂ©s que les environnements Grille procurent. Le traitement des requĂȘtes distribuĂ©es est l'un de ces domaines oĂč il existe de grandes activitĂ©s de recherche en cours, pour transfĂ©rer l'environnement sous-jacent des systĂšmes distribuĂ©s et parallĂšles Ă  l'environnement Grille. Dans le cadre de cette thĂšse, nous nous concentrons sur la dĂ©couverte des ressources et des algorithmes d'allocation de ressources pour le traitement des requĂȘtes dans les environnements Grille. Pour ce faire, nous proposons un algorithme de dĂ©couverte des ressources pour le traitement des requĂȘtes dans les systĂšmes Grille en introduisant le contrĂŽle de topologie auto-stabilisant et l'algorithme de dĂ©couverte des ressources dirigĂ© par l'Ă©lection convergente. Ensuite, nous prĂ©sentons un algorithme d'allocation des ressources, qui rĂ©alise l'allocation des ressources pour les requĂȘtes d'opĂ©rateur de jointure simple par la gĂ©nĂ©ration d'un espace de recherche rĂ©duit pour les nƓuds candidats et en tenant compte des proximitĂ©s des candidats aux sources de donnĂ©es. Nous prĂ©sentons Ă©galement un autre algorithme d'allocation des ressources pour les requĂȘtes d'opĂ©rateurs de jointure multiple. Enfin, on propose un algorithme d'allocation de ressources, qui apporte une tolĂ©rance aux pannes lors de l'exĂ©cution de la requĂȘte par l'utilisation de la rĂ©plication passive d'opĂ©rateurs Ă  Ă©tat. La contribution gĂ©nĂ©rale de cette thĂšse est double. PremiĂšrement, nous proposons un nouvel algorithme de dĂ©couverte de ressource en tenant compte des caractĂ©ristiques des environnements Grille. Nous nous adressons Ă©galement aux problĂšmes d'extensibilitĂ© et de dynamicitĂ© en construisant une topologie efficace sur l'environnement Grille et en utilisant le concept d'auto-stabilisation, et par la suite nous adressons le problĂšme de l'hĂ©tĂ©rogĂ©nĂ©itĂ© en proposant l'algorithme de dĂ©couverte de ressources dirigĂ© par l'Ă©lection convergente. La deuxiĂšme contribution de cette thĂšse est la proposition d'un nouvel algorithme d'allocation des ressources en tenant compte des caractĂ©ristiques de l'environnement Grille. Nous abordons les problĂšmes causĂ©s par la grande Ă©chelle caractĂ©ristique en rĂ©duisant l'espace de recherche pour les ressources candidats. De ce fait nous rĂ©duisons les coĂ»ts de communication au cours de l'exĂ©cution de la requĂȘte en allouant des nƓuds au plus prĂšs des sources de donnĂ©es. Et enfin nous traitons la dynamicitĂ© des nƓuds, du point de vue de leur existence dans le systĂšme, en proposant un algorithme d'affectation des ressources avec une tolĂ©rance aux pannes.Grid systems are today's one of the most interesting computing environments because of their large computing and storage capabilities and their availability. Many different domains profit the facilities of grid environments. Distributed query processing is one of these domains in which there exists large amounts of ongoing research to port the underlying environment from distributed and parallel systems to the grid environment. In this thesis, we focus on resource discovery and resource allocation algorithms for query processing in grid environments. For this, we propose resource discovery algorithm for query processing in grid systems by introducing self-stabilizing topology control and converge-cast based resource discovery algorithms. Then, we propose a resource allocation algorithm, which realizes allocation of resources for single join operator queries by generating a reduced search space for the candidate nodes and by considering proximities of candidates to the data sources. We also propose another resource allocation algorithm for queries with multiple join operators. Lastly, we propose a fault-tolerant resource allocation algorithm, which provides fault-tolerance during the execution of the query by the use of passive replication of stateful operators. The general contribution of this thesis is twofold. First, we propose a new resource discovery algorithm by considering the characteristics of the grid environments. We address scalability and dynamicity problems by constructing an efficient topology over the grid environment using the self-stabilization concept; and we deal with the heterogeneity problem by proposing the converge-cast based resource discovery algorithm. The second main contribution of this thesis is the proposition of a new resource allocation algorithm considering the characteristics of the grid environment. We tackle the scalability problem by reducing the search space for candidate resources. We decrease the communication costs during the query execution by allocating nodes closer to the data sources. And finally we deal with the dynamicity of nodes, in terms of their existence in the system, by proposing the fault-tolerant resource allocation algorithm

    A Practical Study of Self-Stabilization for Prefix-Tree Based Overlay Networks

    Get PDF
    Service discovery is crucial in the development of fully decentralized computational grids. Among the significant amount of work produced by the convergence of peer-to-peer (P2P) systems and grids, a new kind of overlay networks, based on prefix trees, has emerged. In particular, the Distributed Lexicographic Placement Table (DLPT) approach is a decentralized and dynamic service discovery service. Fault-tolerance within the DLPT approach is achieved through best-effort policies relying on formal self-stabilization results. Self-stabilization means that the tree can become transiently inconsistent, but is guaranteed to autonomously converge to a correct topology after arbitrary crashes, in a finite time. However, during convergence, the tree may not be able to process queries correctly. In this paper, we present some simulation results having several objectives. First, we investigate the interest of self-stabilization for such architectures. Second, we explore, still based on simulation, a simple Time-To-Live policy to avoid useless processing during convergence time

    Peer-to-Peer Networks and Computation: Current Trends and Future Perspectives

    Get PDF
    This research papers examines the state-of-the-art in the area of P2P networks/computation. It attempts to identify the challenges that confront the community of P2P researchers and developers, which need to be addressed before the potential of P2P-based systems, can be effectively realized beyond content distribution and file-sharing applications to build real-world, intelligent and commercial software systems. Future perspectives and some thoughts on the evolution of P2P-based systems are also provided

    ProActive: an Integrated platform for programming and running applications on grids and P2P systems

    Get PDF
    International audienceWe propose a grid programming approach using the ProActive middleware. The proposed strategy addresses several grid concerns, which we have classified into three categories. I. Grid Infrastructure which handles the resource acquisition and creation using deployment descriptors and Peer-to-Peer. II. Grid Technical Services which can provide non-functional transparent services like: fault tolerance, load balancing, and file transfer. III. Grid Higher Level programming with: group communication and hierarchical components. We have validated our approach with several grid programming experiences running applications on heterogeneous Grid resource using more than 1000 CPUs
    • 

    corecore