172 research outputs found

    UniWiki: A Collaborative P2P System for Distributed Wiki Applications

    Get PDF
    International audienceThe ever growing request for digital information raises the need for content distribution architectures providing high storage capacity, data availability and good performance. While many simple solutions for scalable distribution of quasi-static content exist, there are still no approaches that can ensure both scalability and consistency for the case of highly dynamic content, such as the data managed inside wikis. We propose a peer to peer solution for distributing and managing dynamic content, that combines two widely studied technologies: distributed hash tables (DHT) and optimistic replication. In our "universal wiki" engine architecture (UniWiki), on top of a reliable, inexpensive and consistent DHT-based storage, any number of front-ends can be added, ensuring both read and write scalability, as well as suitability for large-scale scenarios. The implementation is based on a Distributed Interception Middleware, thus separating distribution, replication, and consistency responsibilities, and also making our system transparently usable by third party wiki engines

    UniWiki: A Reliable and Scalable Peer-to-Peer System for Distributing Wiki Applications

    Get PDF
    The ever growing request for digital information raises the need for content distribution architectures providing high storage capacity, data availability and good performance. While many simple solutions for scalable distribution of quasi-static content exist, there are still no approaches that can ensure both scalability and consistency for the case of highly dynamic content, such as the data managed inside wikis. In this paper, we propose a peer to peer solution for distributing and managing dynamic content, that combines two widely studied technologies: distributed hash tables (DHT) and optimistic replication. In our ``universal wiki'' engine architecture (UniWiki), on top of a reliable, inexpensive and consistent DHT-based storage, any number of front-ends can be added, ensuring both read and write scalability. The implementation is based on a Distributed Interception Middleware, thus separating distribution, replication, and consistency responsibilities, and also making our system usable by third party wiki engines in a transparent way. UniWiki has been proved viable and fairly efficient in large-scale scenarios

    Building a collaborative peer-to-peer wiki system on a structured overlay

    Get PDF
    International audienceThe ever growing request for digital information raises the need for content distribution architectures providing high storage capacity, data availability and good performance. While many simple solutions for scalable distribution of quasi-static content exist, there are still no approaches that can ensure both scalability and consistency for the case of highly dynamic content, such as the data managed inside wikis. We propose a peer-to-peer solution for distributing and managing dynamic content, that combines two widely studied technologies: Distributed Hash Tables (DHT) and optimistic replication. In our ā€œuniversal wikiā€ engine architecture (UniWiki), on top of a reliable, inexpensive and consistent DHT-based storage, any number of front-ends can be added, ensuring both read and write scalability, as well as suitability for large-scale scenarios. The implementation is based on Damon, a distributed AOP middleware, thus separating distribution, replication, and consistency responsibilities, and also making our system transparently usable by third party wiki engines. Finally, UniWiki has been proved viable and fairly efficient in large-scale scenarios

    Multi-synchronous Collaborative Semantic Wikis

    Get PDF
    Semantic wikis have opened an interesting way to mix Web 2.0 advantages with the Semantic Web approach. However, compared to other collaborative tools, wikis do not support all collaborative editing mode such as offline work or multi-synchronous editing. The lack of multi-synchronous supports in wikis is a problematic, especially, when working with semantic wikis. In these systems, it is often important to change multiple pages simultaneous in order to refactor the semantic wiki structure. In this paper, we present a new model of semantic wiki called Multi-Synchronous Semantic Wiki (MS2W). This model extends semantic wikis with multi-synchronous support that allows to create a P2P network of semantic wikis. Semantic wiki pages can be replicated on several semantic servers. The MS2W ensures CCI consistency on these pages relying on the Logoot algorithm

    Access Control in Weakly Consistent Systems

    Get PDF
    Eventually consistent models have become popular in the last years in data storage systems for cloud environments, allowing to give users better availability and lower latency. In this model, it is possible for replicas to be temporarily inconsistent, having been proposed various solutions to deal with this inconsistency and ensure the final convergence of data. However, defining and enforcing access control policies under this model is still an open challenge. The implementation of access control policies for these systems raises itā€™s own challenges, given the information about the permissions is itself kept in a weakly consistent form. In this dissertation, a solution for this problem is proposed, that allows to prevent the non authorized access and modification of data. The proposed solution allows concurrent modifications on the security policies, ensuring their convergence when they are used to verify and enforce access control the associated data. In this dissertation we present an evaluation of the proposed model, showing the solution respects the correct functioning over possible challenging situations, also discussing its application on scenarios that feature peer-to-peer communication between clients and additional replicas on the clients, with the goal of providing a lower latency and reduce the load on centralized components

    Knowledge extraction from unstructured data and classification through distributed ontologies

    Get PDF
    The World Wide Web has changed the way humans use and share any kind of information. The Web removed several access barriers to the information published and has became an enormous space where users can easily navigate through heterogeneous resources (such as linked documents) and can easily edit, modify, or produce them. Documents implicitly enclose information and relationships among them which become only accessible to human beings. Indeed, the Web of documents evolved towards a space of data silos, linked each other only through untyped references (such as hypertext references) where only humans were able to understand. A growing desire to programmatically access to pieces of data implicitly enclosed in documents has characterized the last efforts of the Web research community. Direct access means structured data, thus enabling computing machinery to easily exploit the linking of different data sources. It has became crucial for the Web community to provide a technology stack for easing data integration at large scale, first structuring the data using standard ontologies and afterwards linking them to external data. Ontologies became the best practices to define axioms and relationships among classes and the Resource Description Framework (RDF) became the basic data model chosen to represent the ontology instances (i.e. an instance is a value of an axiom, class or attribute). Data becomes the new oil, in particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. In the literature these problems have been addressed with several proposals and standards, that mainly focus on technologies to access the data and on formats to represent the semantics of the data and their relationships. With the increasing of the volume of interconnected and serialized RDF data, RDF repositories may suffer from data overloading and may become a single point of failure for the overall Linked Data vision. One of the goals of this dissertation is to propose a thorough approach to manage the large scale RDF repositories, and to distribute them in a redundant and reliable peer-to-peer RDF architecture. The architecture consists of a logic to distribute and mine the knowledge and of a set of physical peer nodes organized in a ring topology based on a Distributed Hash Table (DHT). Each node shares the same logic and provides an entry point that enables clients to query the knowledge base using atomic, disjunctive and conjunctive SPARQL queries. The consistency of the results is increased using data redundancy algorithm that replicates each RDF triple in multiple nodes so that, in the case of peer failure, other peers can retrieve the data needed to resolve the queries. Additionally, a distributed load balancing algorithm is used to maintain a uniform distribution of the data among the participating peers by dynamically changing the key space assigned to each node in the DHT. Recently, the process of data structuring has gained more and more attention when applied to the large volume of text information spread on the Web, such as legacy data, news papers, scientific papers or (micro-)blog posts. This process mainly consists in three steps: \emph{i)} the extraction from the text of atomic pieces of information, called named entities; \emph{ii)} the classification of these pieces of information through ontologies; \emph{iii)} the disambigation of them through Uniform Resource Identifiers (URIs) identifying real world objects. As a step towards interconnecting the web to real world objects via named entities, different techniques have been proposed. The second objective of this work is to propose a comparison of these approaches in order to highlight strengths and weaknesses in different scenarios such as scientific and news papers, or user generated contents. We created the Named Entity Recognition and Disambiguation (NERD) web framework, publicly accessible on the Web (through REST API and web User Interface), which unifies several named entity extraction technologies. Moreover, we proposed the NERD ontology, a reference ontology for comparing the results of these technologies. Recently, the NERD ontology has been included in the NIF (Natural language processing Interchange Format) specification, part of the Creating Knowledge out of Interlinked Data (LOD2) project. Summarizing, this dissertation defines a framework for the extraction of knowledge from unstructured data and its classification via distributed ontologies. A detailed study of the Semantic Web and knowledge extraction fields is proposed to define the issues taken under investigation in this work. Then, it proposes an architecture to tackle the single point of failure issue introduced by the RDF repositories spread within the Web. Although the use of ontologies enables a Web where data is structured and comprehensible by computing machinery, human users may take advantage of it especially for the annotation task. Hence, this work describes an annotation tool for web editing, audio and video annotation in a web front end User Interface powered on the top of a distributed ontology. Furthermore, this dissertation details a thorough comparison of the state of the art of named entity technologies. The NERD framework is presented as technology to encompass existing solutions in the named entity extraction field and the NERD ontology is presented as reference ontology in the field. Finally, this work highlights three use cases with the purpose to reduce the amount of data silos spread within the Web: a Linked Data approach to augment the automatic classification task in a Systematic Literature Review, an application to lift educational data stored in Sharable Content Object Reference Model (SCORM) data silos to the Web of data and a scientific conference venue enhancer plug on the top of several data live collectors. Significant research efforts have been devoted to combine the efficiency of a reliable data structure and the importance of data extraction techniques. This dissertation opens different research doors which mainly join two different research communities: the Semantic Web and the Natural Language Processing community. The Web provides a considerable amount of data where NLP techniques may shed the light within it. The use of the URI as a unique identifier may provide one milestone for the materialization of entities lifted from a raw text to real world object

    Mapster: A Peer-to-Peer Data Sharing Environment

    Get PDF
    This paper describes a system called Mapster that allows users in a P2P network to share their databases. The research addresses problems of heterogeneity and scalability in P2P databases. To provide fine-grained access to usersā€™ databases, schema matching and a super-peer topology are used. The schema matching component allows information to be translated by semi-automatically determining the mappings between the databases within the P2P network. A super-peer topology enables the schema matching techniques to operate effectively in large, dynamic, heterogeneous networks

    Conflict-Free Replicated Data Types in Dynamic Environments

    Get PDF
    Over the years, mobile devices have become increasingly popular and gained improved computation capabilities allowing them to perform more complex tasks such as collaborative applications. Given the weak characteristic properties of mobile networks, which represent highly dynamic environments where users may experience regular involuntary disconnection periods, the big question arises of how to maintain data consistency. This issue is most pronounced in collaborative environments where multiple users interact with each other, sharing a replicated state that may diverge due to concurrency conflicts and loss of updates. To maintain consistency, one of todayā€™s best solutions is Conflict-Free Replicated Data Types (CRDTs), which ensure low latency values and automatic conflict resolution, guaranteeing eventual consistency of the shared data. However, a limitation often found on CRDTs and the systems that employ them is the need for the knowledge of the replicas whom the state changes must be disseminated to. This constitutes a problem since it is inconceivable to maintain said knowledge in an environment where clients may leave and join at any given time and consequently get disconnected due to mobile network communications unreliability. In this thesis, we present the study and extension of the CRDT concept to dynamic environments by introducing the developed P/S-CRDTs model, where CRDTs are coupled with the publisher/subscriber interaction scheme and additional mechanisms to ensure users are able to cooperate and maintain consistency whilst accounting for the consequent volatile behaviors of mobile networks. The experimental results show that in volatile scenarios of disconnection, mobile users in collaborative activity maintain consistency among themselves and when compared to other available CRDT models, the P/S-CRDTs model is able to decouple the required knowledge of whom the updates must be disseminated to, while ensuring appropriate network traffic values

    Cloud-edge hybrid applications

    Get PDF
    Many modern applications are designed to provide interactions among users, including multi- user games, social networks and collaborative tools. Users expect application response time to be in the order of milliseconds, to foster interaction and interactivity. The design of these applications typically adopts a client-server model, where all interac- tions are mediated by a centralized component. This approach introduces availability and fault- tolerance issues, which can be mitigated by replicating the server component, and even relying on geo-replicated solutions in cloud computing infrastructures. Even in this case, the client-server communication model leads to unnecessary latency penalties for geographically close clients and high operational costs for the application provider. This dissertation proposes a cloud-edge hybrid model with secure and ecient propagation and consistency mechanisms. This model combines client-side replication and client-to-client propagation for providing low latency and minimizing the dependency on the server infras- tructure, fostering availability and fault tolerance. To realize this model, this works makes the following key contributions. First, the cloud-edge hybrid model is materialized by a system design where clients maintain replicas of the data and synchronize in a peer-to-peer fashion, and servers are used to assist clientsā€™ operation. We study how to bring most of the application logic to the client-side, us- ing the centralized service primarily for durability, access control, discovery, and overcoming internetwork limitations. Second, we dene protocols for weakly consistent data replication, including a novel CRDT model (āˆ†-CRDTs). We provide a study on partial replication, exploring the challenges and fundamental limitations in providing causal consistency, and the diculty in supporting client- side replicas due to their ephemeral nature. Third, we study how client misbehaviour can impact the guarantees of causal consistency. We propose new secure weak consistency models for insecure settings, and algorithms to enforce such consistency models. The experimental evaluation of our contributions have shown their specic benets and limitations compared with the state-of-the-art. In general, the cloud-edge hybrid model leads to faster application response times, lower client-to-client latency, higher system scalability as fewer clients need to connect to servers at the same time, the possibility to work oine or disconnected from the server, and reduced server bandwidth usage. In summary, we propose a hybrid of cloud-and-edge which provides lower user-to-user la- tency, availability under server disconnections, and improved server scalability ā€“ while being ecient, reliable, and secure.Muitas aplicaƧƵes modernas sĆ£o criadas para fornecer interaƧƵes entre utilizadores, incluindo jogos multiutilizador, redes sociais e ferramentas colaborativas. Os utilizadores esperam que o tempo de resposta nas aplicaƧƵes seja da ordem de milissegundos, promovendo a interaĆ§Ć£o e interatividade. A arquitetura dessas aplicaƧƵes normalmente adota um modelo cliente-servidor, onde todas as interaƧƵes sĆ£o mediadas por um componente centralizado. Essa abordagem apresenta problemas de disponibilidade e tolerĆ¢ncia a falhas, que podem ser mitigadas com replicaĆ§Ć£o no componente do servidor, atĆ© com a utilizaĆ§Ć£o de soluƧƵes replicadas geogracamente em infraestruturas de computaĆ§Ć£o na nuvem. Mesmo neste caso, o modelo de comunicaĆ§Ć£o cliente-servidor leva a penalidades de latĆŖncia desnecessĆ”rias para clientes geogracamente prĆ³ximos e altos custos operacionais para o provedor das aplicaƧƵes. Esta dissertaĆ§Ć£o propƵe um modelo hĆ­brido cloud-edge com mecanismos seguros e ecientes de propagaĆ§Ć£o e consistĆŖncia. Esse modelo combina replicaĆ§Ć£o do lado do cliente e propagaĆ§Ć£o de cliente para cliente para fornecer baixa latĆŖncia e minimizar a dependĆŖncia na infraestrutura do servidor, promovendo a disponibilidade e tolerĆ¢ncia a falhas. Para realizar este modelo, este trabalho faz as seguintes contribuiƧƵes principais. Primeiro, o modelo hĆ­brido cloud-edge Ć© materializado por uma arquitetura do sistema em que os clientes mantĆŖm rĆ©plicas dos dados e sincronizam de maneira ponto a ponto e onde os servidores sĆ£o usados para auxiliar na operaĆ§Ć£o dos clientes. Estudamos como trazer a maior parte da lĆ³gica das aplicaƧƵes para o lado do cliente, usando o serviƧo centralizado principalmente para durabilidade, controlo de acesso, descoberta e superaĆ§Ć£o das limitaƧƵes inter-rede. Em segundo lugar, denimos protocolos para replicaĆ§Ć£o de dados fracamente consistentes, incluindo um novo modelo de CRDTs (āˆ†-CRDTs). Fornecemos um estudo sobre replicaĆ§Ć£o parcial, explorando os desaos e limitaƧƵes fundamentais em fornecer consistĆŖncia causal e a diculdade em suportar rĆ©plicas do lado do cliente devido Ć  sua natureza efĆ©mera. Terceiro, estudamos como o mau comportamento da parte do cliente pode afetar as garantias da consistĆŖncia causal. Propomos novos modelos seguros de consistĆŖncia fraca para conguraƧƵes inseguras e algoritmos para impor tais modelos de consistĆŖncia. A avaliaĆ§Ć£o experimental das nossas contribuiƧƵes mostrou os benefĆ­cios e limitaƧƵes em comparaĆ§Ć£o com o estado da arte. Em geral, o modelo hĆ­brido cloud-edge leva a tempos de resposta nas aplicaƧƵes mais rĆ”pidos, a uma menor latĆŖncia de cliente para cliente e Ć  possibilidade de trabalhar oine ou desconectado do servidor. Adicionalmente, obtemos uma maior escalabilidade do sistema, visto que menos clientes precisam de estar conectados aos servidores ao mesmo tempo e devido Ć  reduĆ§Ć£o na utilizaĆ§Ć£o da largura de banda no servidor. Em resumo, propomos um modelo hĆ­brido entre a orla (edge) e a nuvem (cloud) que fornece menor latĆŖncia entre utilizadores, disponibilidade durante desconexƵes do servidor e uma melhor escalabilidade do servidor ā€“ ao mesmo tempo que Ć© eciente, conĆ”vel e seguro

    Ontology engineering and routing in distributed knowledge management applications

    Get PDF
    • ā€¦
    corecore