1,328 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Semantic Query Reformulation in Social PDMS

    Full text link
    We consider social peer-to-peer data management systems (PDMS), where each peer maintains both semantic mappings between its schema and some acquaintances, and social links with peer friends. In this context, reformulating a query from a peer's schema into other peer's schemas is a hard problem, as it may generate as many rewritings as the set of mappings from that peer to the outside and transitively on, by eventually traversing the entire network. However, not all the obtained rewritings are relevant to a given query. In this paper, we address this problem by inspecting semantic mappings and social links to find only relevant rewritings. We propose a new notion of 'relevance' of a query with respect to a mapping, and, based on this notion, a new semantic query reformulation approach for social PDMS, which achieves great accuracy and flexibility. To find rapidly the most interesting mappings, we combine several techniques: (i) social links are expressed as FOAF (Friend of a Friend) links to characterize peer's friendship and compact mapping summaries are used to obtain mapping descriptions; (ii) local semantic views are special views that contain information about external mappings; and (iii) gossiping techniques improve the search of relevant mappings. Our experimental evaluation, based on a prototype on top of PeerSim and a simulated network demonstrate that our solution yields greater recall, compared to traditional query translation approaches proposed in the literature.Comment: 29 pages, 8 figures, query rewriting in PDM

    A Framework For Efficient Data Distribution In Peer-to-peer Networks.

    Get PDF
    Peer to Peer (P2P) models are based on user altruism, wherein a user shares its content with other users in the pool and it also has an interest in the content of the other nodes. Most P2P systems in their current form are not fair in terms of the content served by a peer and the service obtained from swarm. Most systems suffer from free rider\u27s problem where many high uplink capacity peers contribute much more than they should while many others get a free ride for downloading the content. This leaves high capacity nodes with very little or no motivation to contribute. Many times such resourceful nodes exit the swarm or don\u27t even participate. The whole scenario is unfavorable and disappointing for P2P networks in general, where participation is a must and a very important feature. As the number of users increases in the swarm, the swarm becomes robust and scalable. Other important issues in the present day P2P system are below optimal Quality of Service (QoS) in terms of download time, end-to-end latency and jitter rate, uplink utilization, excessive cross ISP traffic, security and cheating threats etc. These current day problems in P2P networks serve as a motivation for present work. To this end, we present an efficient data distribution framework in Peer-to-Peer (P2P) networks for media streaming and file sharing domain. The experiments with our model, an alliance based peering scheme for media streaming, show that such a scheme distributes data to the swarm members in a near-optimal way. Alliances are small groups of nodes that share data and other vital information for symbiotic association. We show that alliance formation is a loosely coupled and an effective way to organize the peers and our model maps to a small world network, which form efficient overlay structures and are robust to network perturbations such as churn. We present a comparative simulation based study of our model with CoolStreaming/DONet (a popular model) and present a quantitative performance evaluation. Simulation results show that our model scales well under varying workloads and conditions, delivers near optimal levels of QoS, reduces cross ISP traffic considerably and for most cases, performs at par or even better than Cool-Streaming/DONet. In the next phase of our work, we focussed on BitTorrent P2P model as it the most widely used file sharing protocol. Many studies in academia and industry have shown that though BitTorrent scales very well but is far from optimal in terms of fairness to end users, download time and uplink utilization. Furthermore, random peering and data distribution in such model lead to suboptimal performance. Lately, new breed of BitTorrent clients like BitTyrant have shown successful strategic attacks against BitTorrent. Strategic peers configure the BitTorrent client software such that for very less or no contribution, they can obtain good download speeds. Such strategic nodes exploit the altruism in the swarm and consume resources at the expense of other honest nodes and create an unfair swarm. More unfairness is generated in the swarm with the presence of heterogeneous bandwidth nodes. We investigate and propose a new token-based anti-strategic policy that could be used in BitTorrent to minimize the free-riding by strategic clients. We also proposed other policies against strategic attacks that include using a smart tracker that denies the request of strategic clients for peer listmultiple times, and black listing the non-behaving nodes that do not follow the protocol policies. These policies help to stop the strategic behavior of peers to a large extent and improve overall system performance. We also quantify and validate the benefits of using bandwidth peer matching policy. Our simulations results show that with the above proposed changes, uplink utilization and mean download time in BitTorrent network improves considerably. It leaves strategic clients with little or no incentive to behave greedily. This reduces free riding and creates fairer swarm with very little computational overhead. Finally, we show that our model is self healing model where user behavior changes from selfish to altruistic in the presence of the aforementioned policies

    Study of the Topology Mismatch Problem in Peer-to-Peer Networks

    Get PDF
    The advantages of peer-to-peer (P2P) technology are innumerable when compared to other systems like Distributed Messaging System, Client-Server model, Cloud based systems. The vital advantages are not limited to high scalability and low cost. On the other hand the p2p system suffers from a bottle-neck problem caused by topology mismatch. Topology mismatch occurs in an unstructured peer-to-peer (P2P) network when the peers participating in the communication choose their neighbors in random fashion, such that the resultant P2P network mismatches its underlying physical network, resulting in a lengthy communication between the peers and redundant network traffics generated in the underlying network[1] However, most P2P system performance suffers from the mismatch between the overlays topology and the underlying physical network topology, causing a large volume of redundant traffic in the Internet slowing the performance. This paper surveys the P2P topology mismatch problems and the solutions adapted for different applications

    Adaptive and secured resource management in distributed and Internet systems

    Get PDF
    The effectiveness of computer system resource management has been always determined by two major factors: (1) workload demands and management objectives, (2) the updates of the computer technology. These two factors are dynamically changing, and resource management systems must be timely adaptive to the changes. This dissertation attempts to address several important and related resource management issues.;We first study memory system utilization in centralized servers by improving memory performance of sorting algorithms, which provides fundamental understanding on memory system organizations and its performance optimizations for data-intensive workloads. to reduce different types of cache misses, we restructure the mergesort and quicksort algorithms by integrating tiling, padding, and buffering techniques and by repartitioning the data set. Our study shows substantial performance improvements from our new methods.;We have further extended the work to improve load sharing for utilizing global memory resources in distributed systems. Aiming at reducing the memory resource contention caused by page faults and I/O activities, we have developed and examined load sharing policies by considering effective usage of global memory in addition to CPU load balancing in both homogeneous and heterogeneous clusters.;Extending our research from clusters to Internet systems, we have further investigated memory and storage utilizations in Web caching systems. We have proposed several novel management schemes to restructure and decentralize the existing caching system by exploiting data locality at different levels of the global memory hierarchy and by effectively sharing data objects among the clients and their proxy caches.;Data integrity and communication anonymity issues are raised from our decentralized Web caching system design, which are also security concerns for general peer-to-peer systems. We propose an integrity protocol to ensure data integrity, and several protocols to achieve mutual communication anonymity between an information requester and a provider.;The potential impact and contributions of this dissertation are briefly stated as follows: (1) two major research topics identified in this dissertation are fundamentally important for the growth and development of information technology, and will continue to be demanding topics for a long term. (2) Our proposed cache-effective sorting methods bridge a serious gap between analytical complexity of algorithms and their execution complexity in practice due to the increasingly deep memory hierarchy in computer systems. This approach can also be used to improve memory performance at different levels of the memory hierarchy, such as I/O and file systems. (3) Our load sharing principle of giving a high priority to the requests of data accesses in memory and I/Os timely adapts the technology changes and effectively responds to the increasing demand of data-intensive applications. (4) Our proposed decentralized Web caching framework and its resource management schemes present a comprehensive case study to examine the P2P model. Our results and experiences can be used for related and further studies in distributed computing. (5) The proposed data integrity and communication anonymity protocols address limits and weaknesses of existing ones, and place a solid foundation for us to continue our work in this important area

    Mathematical analysis of scheduling policies in peer-to-peer video streaming networks

    Get PDF
    Las redes de pares son comunidades virtuales autogestionadas, desarrolladas en la capa de aplicación sobre la infraestructura de Internet, donde los usuarios (denominados pares) comparten recursos (ancho de banda, memoria, procesamiento) para alcanzar un fin común. La distribución de video representa la aplicación más desafiante, dadas las limitaciones de ancho de banda. Existen básicamente tres servicios de video. El más simple es la descarga, donde un conjunto de servidores posee el contenido original, y los usuarios deben descargar completamente este contenido previo a su reproducción. Un segundo servicio se denomina video bajo demanda, donde los pares se unen a una red virtual siempre que inicien una solicitud de un contenido de video, e inician una descarga progresiva en línea. El último servicio es video en vivo, donde el contenido de video es generado, distribuido y visualizado simultáneamente. En esta tesis se estudian aspectos de diseño para la distribución de video en vivo y bajo demanda. Se presenta un análisis matemático de estabilidad y capacidad de arquitecturas de distribución bajo demanda híbridas, asistidas por pares. Los pares inician descargas concurrentes de múltiples contenidos, y se desconectan cuando lo desean. Se predice la evolución esperada del sistema asumiendo proceso Poisson de arribos y egresos exponenciales, mediante un modelo determinístico de fluidos. Un sub-modelo de descargas secuenciales (no simultáneas) es globalmente y estructuralmente estable, independientemente de los parámetros de la red. Mediante la Ley de Little se determina el tiempo medio de residencia de usuarios en un sistema bajo demanda secuencial estacionario. Se demuestra teóricamente que la filosofía híbrida de cooperación entre pares siempre desempeña mejor que la tecnología pura basada en cliente-servidor

    Ontology-based Search Algorithms over Large-Scale Unstructured Peer-to-Peer Networks

    Get PDF
    Peer-to-Peer(P2P) systems have emerged as a promising paradigm to structure large scale distributed systems. They provide a robust, scalable and decentralized way to share and publish data.The unstructured P2P systems have gained much popularity in recent years for their wide applicability and simplicity. However efficient resource discovery remains a fundamental challenge for unstructured P2P networks due to the lack of a network structure. To effectively harness the power of unstructured P2P systems, the challenges in distributed knowledge management and information search need to be overcome. Current attempts to solve the problems pertaining to knowledge management and search have focused on simple term based routing indices and keyword search queries. Many P2P resource discovery applications will require more complex query functionality, as users will publish semantically rich data and need efficiently content location algorithms that find target content at moderate cost. Therefore, effective knowledge and data management techniques and search tools for information retrieval are imperative and lasting. In my dissertation, I present a suite of protocols that assist in efficient content location and knowledge management in unstructured Peer-to-Peer overlays. The basis of these schemes is their ability to learn from past peer interactions and increasing their performance with time.My work aims to provide effective and bandwidth-efficient searching and data sharing in unstructured P2P environments. A suite of algorithms which provide peers in unstructured P2P overlays with the state necessary in order to efficiently locate, disseminate and replicate objects is presented. Also, Existing approaches to federated search are adapted and new methods are developed for semantic knowledge representation, resource selection, and knowledge evolution for efficient search in dynamic and distributed P2P network environments. Furthermore,autonomous and decentralized algorithms that reorganizes an unstructured network topology into a one with desired search-enhancing properties are proposed in a network evolution model to facilitate effective and efficient semantic search in dynamic environments
    • …
    corecore