361 research outputs found

    Study of the Topology Mismatch Problem in Peer-to-Peer Networks

    Get PDF
    The advantages of peer-to-peer (P2P) technology are innumerable when compared to other systems like Distributed Messaging System, Client-Server model, Cloud based systems. The vital advantages are not limited to high scalability and low cost. On the other hand the p2p system suffers from a bottle-neck problem caused by topology mismatch. Topology mismatch occurs in an unstructured peer-to-peer (P2P) network when the peers participating in the communication choose their neighbors in random fashion, such that the resultant P2P network mismatches its underlying physical network, resulting in a lengthy communication between the peers and redundant network traffics generated in the underlying network[1] However, most P2P system performance suffers from the mismatch between the overlays topology and the underlying physical network topology, causing a large volume of redundant traffic in the Internet slowing the performance. This paper surveys the P2P topology mismatch problems and the solutions adapted for different applications

    EGOIST: Overlay Routing Using Selfish Neighbor Selection

    Full text link
    A foundational issue underlying many overlay network applications ranging from routing to P2P file sharing is that of connectivity management, i.e., folding new arrivals into an existing overlay, and re-wiring to cope with changing network conditions. Previous work has considered the problem from two perspectives: devising practical heuristics for specific applications designed to work well in real deployments, and providing abstractions for the underlying problem that are analytically tractable, especially via game-theoretic analysis. In this paper, we unify these two thrusts by using insights gleaned from novel, realistic theoretic models in the design of Egoist ā€“ a prototype overlay routing system that we implemented, deployed, and evaluated on PlanetLab. Using measurements on PlanetLab and trace-based simulations, we demonstrate that Egoist's neighbor selection primitives significantly outperform existing heuristics on a variety of performance metrics, including delay, available bandwidth, and node utilization. Moreover, we demonstrate that Egoist is competitive with an optimal, but unscalable full-mesh approach, remains highly effective under significant churn, is robust to cheating, and incurs minimal overhead. Finally, we discuss some of the potential benefits Egoist may offer to applications.National Science Foundation (CISE/CSR 0720604, ENG/EFRI 0735974, CISE/CNS 0524477, CNS/NeTS 0520166, CNS/ITR 0205294; CISE/EIA RI 0202067; CAREER 04446522); European Commission (RIDS-011923

    Knowledge extraction from unstructured data and classification through distributed ontologies

    Get PDF
    The World Wide Web has changed the way humans use and share any kind of information. The Web removed several access barriers to the information published and has became an enormous space where users can easily navigate through heterogeneous resources (such as linked documents) and can easily edit, modify, or produce them. Documents implicitly enclose information and relationships among them which become only accessible to human beings. Indeed, the Web of documents evolved towards a space of data silos, linked each other only through untyped references (such as hypertext references) where only humans were able to understand. A growing desire to programmatically access to pieces of data implicitly enclosed in documents has characterized the last efforts of the Web research community. Direct access means structured data, thus enabling computing machinery to easily exploit the linking of different data sources. It has became crucial for the Web community to provide a technology stack for easing data integration at large scale, first structuring the data using standard ontologies and afterwards linking them to external data. Ontologies became the best practices to define axioms and relationships among classes and the Resource Description Framework (RDF) became the basic data model chosen to represent the ontology instances (i.e. an instance is a value of an axiom, class or attribute). Data becomes the new oil, in particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. In the literature these problems have been addressed with several proposals and standards, that mainly focus on technologies to access the data and on formats to represent the semantics of the data and their relationships. With the increasing of the volume of interconnected and serialized RDF data, RDF repositories may suffer from data overloading and may become a single point of failure for the overall Linked Data vision. One of the goals of this dissertation is to propose a thorough approach to manage the large scale RDF repositories, and to distribute them in a redundant and reliable peer-to-peer RDF architecture. The architecture consists of a logic to distribute and mine the knowledge and of a set of physical peer nodes organized in a ring topology based on a Distributed Hash Table (DHT). Each node shares the same logic and provides an entry point that enables clients to query the knowledge base using atomic, disjunctive and conjunctive SPARQL queries. The consistency of the results is increased using data redundancy algorithm that replicates each RDF triple in multiple nodes so that, in the case of peer failure, other peers can retrieve the data needed to resolve the queries. Additionally, a distributed load balancing algorithm is used to maintain a uniform distribution of the data among the participating peers by dynamically changing the key space assigned to each node in the DHT. Recently, the process of data structuring has gained more and more attention when applied to the large volume of text information spread on the Web, such as legacy data, news papers, scientific papers or (micro-)blog posts. This process mainly consists in three steps: \emph{i)} the extraction from the text of atomic pieces of information, called named entities; \emph{ii)} the classification of these pieces of information through ontologies; \emph{iii)} the disambigation of them through Uniform Resource Identifiers (URIs) identifying real world objects. As a step towards interconnecting the web to real world objects via named entities, different techniques have been proposed. The second objective of this work is to propose a comparison of these approaches in order to highlight strengths and weaknesses in different scenarios such as scientific and news papers, or user generated contents. We created the Named Entity Recognition and Disambiguation (NERD) web framework, publicly accessible on the Web (through REST API and web User Interface), which unifies several named entity extraction technologies. Moreover, we proposed the NERD ontology, a reference ontology for comparing the results of these technologies. Recently, the NERD ontology has been included in the NIF (Natural language processing Interchange Format) specification, part of the Creating Knowledge out of Interlinked Data (LOD2) project. Summarizing, this dissertation defines a framework for the extraction of knowledge from unstructured data and its classification via distributed ontologies. A detailed study of the Semantic Web and knowledge extraction fields is proposed to define the issues taken under investigation in this work. Then, it proposes an architecture to tackle the single point of failure issue introduced by the RDF repositories spread within the Web. Although the use of ontologies enables a Web where data is structured and comprehensible by computing machinery, human users may take advantage of it especially for the annotation task. Hence, this work describes an annotation tool for web editing, audio and video annotation in a web front end User Interface powered on the top of a distributed ontology. Furthermore, this dissertation details a thorough comparison of the state of the art of named entity technologies. The NERD framework is presented as technology to encompass existing solutions in the named entity extraction field and the NERD ontology is presented as reference ontology in the field. Finally, this work highlights three use cases with the purpose to reduce the amount of data silos spread within the Web: a Linked Data approach to augment the automatic classification task in a Systematic Literature Review, an application to lift educational data stored in Sharable Content Object Reference Model (SCORM) data silos to the Web of data and a scientific conference venue enhancer plug on the top of several data live collectors. Significant research efforts have been devoted to combine the efficiency of a reliable data structure and the importance of data extraction techniques. This dissertation opens different research doors which mainly join two different research communities: the Semantic Web and the Natural Language Processing community. The Web provides a considerable amount of data where NLP techniques may shed the light within it. The use of the URI as a unique identifier may provide one milestone for the materialization of entities lifted from a raw text to real world object

    Semantic web service search: a brief survey

    Full text link
    Scalable means for the search of relevant web services are essential for the development of intelligent service-based applications in the future Internet. Key idea of semantic web services is to enable such applications to perform a high-precision search and automated composition of services based on formal ontology-based representations of service semantics. In this paper, we briefly survey the state of the art of semantic web service search

    LibraRing: An Architecture for Distributed Digital Libraries Based on DHTs

    Full text link
    Abstract. We present a digital library architecture based on distributed hash tables. We discuss the main components of this architecture and the protocols for offering information retrieval and information filtering functionality. We present an experimental evaluation of our proposals.

    Analysis of current middleware used in peer-to-peer and grid implementations for enhancement by catallactic mechanisms

    Get PDF
    This deliverable describes the work done in task 3.1, Middleware analysis: Analysis of current middleware used in peer-to-peer and grid implementations for enhancement by catallactic mechanisms from work package 3, Middleware Implementation. The document is divided in four parts: The introduction with application scenarios and middleware requirements, Catnets middleware architecture, evaluation of existing middleware toolkits, and conclusions. -- Die Arbeit definiert Anforderungen an Grid und Peer-to-Peer Middleware Architekturen und analysiert diese auf ihre Eignung fĆ¼r die prototypische Umsetzung der Katallaxie. Eine Middleware-Architektur fĆ¼r die Umsetzung der Katallaxie in Application Layer Netzwerken wird vorgestellt.Grid Computing

    Scalable and Distributed Resource Management Protocols for Cloud and Big Data Clusters

    Get PDF
    Cloud data centers require an operating system to manage resources and satisfy operational requirements and management objectives. The growth of popularity in cloud services causes the appearance of a new spectrum of services with sophisticated workload and resource management requirements. Also, data centers are growing by addition of various type of hardware to accommodate the ever-increasing requests of users. Nowadays a large percentage of cloud resources are executing data-intensive applications which need continuously changing workload fluctuations and specific resource management. To this end, cluster computing frameworks are shifting towards distributed resource management for better scalability and faster decision making. Such systems benefit from the parallelization of control and are resilient to failures. Throughout this thesis we investigate algorithms, protocols and techniques to address these challenges in large-scale data centers. We introduce a distributed resource management framework which consolidates virtual machine to as few servers as possible to reduce the energy consumption of data center and hence decrease the cost of cloud providers. This framework can characterize the workload of virtual machines and hence handle trade-off energy consumption and Service Level Agreement (SLA) of customers efficiently. The algorithm is highly scalable and requires low maintenance cost with dynamic workloads and it tries to minimize virtual machines migration costs. We also introduce a scalable and distributed probe-based scheduling algorithm for Big data analytics frameworks. This algorithm can efficiently address the problem job heterogeneity in workloads that has appeared after increasing the level of parallelism in jobs. The algorithm is massively scalable and can reduce significantly average job completion times in comparison with the-state of-the-art. Finally, we propose a probabilistic fault-tolerance technique as part of the scheduling algorithm
    • ā€¦
    corecore