172 research outputs found
UniWiki: A Collaborative P2P System for Distributed Wiki Applications
International audienceThe ever growing request for digital information raises the need for content distribution architectures providing high storage capacity, data availability and good performance. While many simple solutions for scalable distribution of quasi-static content exist, there are still no approaches that can ensure both scalability and consistency for the case of highly dynamic content, such as the data managed inside wikis. We propose a peer to peer solution for distributing and managing dynamic content, that combines two widely studied technologies: distributed hash tables (DHT) and optimistic replication. In our "universal wiki" engine architecture (UniWiki), on top of a reliable, inexpensive and consistent DHT-based storage, any number of front-ends can be added, ensuring both read and write scalability, as well as suitability for large-scale scenarios. The implementation is based on a Distributed Interception Middleware, thus separating distribution, replication, and consistency responsibilities, and also making our system transparently usable by third party wiki engines
UniWiki: A Reliable and Scalable Peer-to-Peer System for Distributing Wiki Applications
The ever growing request for digital information raises the need for content distribution architectures providing high storage capacity, data availability and good performance. While many simple solutions for scalable distribution of quasi-static content exist, there are still no approaches that can ensure both scalability and consistency for the case of highly dynamic content, such as the data managed inside wikis. In this paper, we propose a peer to peer solution for distributing and managing dynamic content, that combines two widely studied technologies: distributed hash tables (DHT) and optimistic replication. In our ``universal wiki'' engine architecture (UniWiki), on top of a reliable, inexpensive and consistent DHT-based storage, any number of front-ends can be added, ensuring both read and write scalability. The implementation is based on a Distributed Interception Middleware, thus separating distribution, replication, and consistency responsibilities, and also making our system usable by third party wiki engines in a transparent way. UniWiki has been proved viable and fairly efficient in large-scale scenarios
Building a collaborative peer-to-peer wiki system on a structured overlay
International audienceThe ever growing request for digital information raises the need for content distribution architectures providing high storage capacity, data availability and good performance. While many simple solutions for scalable distribution of quasi-static content exist, there are still no approaches that can ensure both scalability and consistency for the case of highly dynamic content, such as the data managed inside wikis. We propose a peer-to-peer solution for distributing and managing dynamic content, that combines two widely studied technologies: Distributed Hash Tables (DHT) and optimistic replication. In our āuniversal wikiā engine architecture (UniWiki), on top of a reliable, inexpensive and consistent DHT-based storage, any number of front-ends can be added, ensuring both read and write scalability, as well as suitability for large-scale scenarios. The implementation is based on Damon, a distributed AOP middleware, thus separating distribution, replication, and consistency responsibilities, and also making our system transparently usable by third party wiki engines. Finally, UniWiki has been proved viable and fairly efficient in large-scale scenarios
Multi-synchronous Collaborative Semantic Wikis
Semantic wikis have opened an interesting way to mix Web 2.0 advantages with the Semantic Web approach. However, compared to other collaborative tools, wikis do not support all collaborative editing mode such as offline work or multi-synchronous editing. The lack of multi-synchronous supports in wikis is a problematic, especially, when working with semantic wikis. In these systems, it is often important to change multiple pages simultaneous in order to refactor the semantic wiki structure. In this paper, we present a new model of semantic wiki called Multi-Synchronous Semantic Wiki (MS2W). This model extends semantic wikis with multi-synchronous support that allows to create a P2P network of semantic wikis. Semantic wiki pages can be replicated on several semantic servers. The MS2W ensures CCI consistency on these pages relying on the Logoot algorithm
Access Control in Weakly Consistent Systems
Eventually consistent models have become popular in the last years in data storage
systems for cloud environments, allowing to give users better availability and lower
latency. In this model, it is possible for replicas to be temporarily inconsistent, having
been proposed various solutions to deal with this inconsistency and ensure the final
convergence of data. However, defining and enforcing access control policies under this model is still an open challenge.
The implementation of access control policies for these systems raises itās own challenges, given the information about the permissions is itself kept in a weakly consistent form. In this dissertation, a solution for this problem is proposed, that allows to prevent the non authorized access and modification of data.
The proposed solution allows concurrent modifications on the security policies, ensuring their convergence when they are used to verify and enforce access control the associated data. In this dissertation we present an evaluation of the proposed model, showing the solution respects the correct functioning over possible challenging situations, also discussing its application on scenarios that feature peer-to-peer communication between clients and additional replicas on the clients, with the goal of providing a lower latency and reduce the load on centralized components
Knowledge extraction from unstructured data and classification through distributed ontologies
The World Wide Web has changed the way humans use and share any kind of information. The Web removed several access barriers to the information published and has became an enormous space where users can easily navigate through heterogeneous resources (such as linked documents) and can easily edit, modify, or produce them. Documents implicitly enclose information and relationships among them which become only accessible to human beings. Indeed, the Web of documents evolved towards a space of data silos, linked each other only through untyped references (such as hypertext references) where only humans were able to understand. A growing desire to programmatically access to pieces of data implicitly enclosed in documents has characterized the last efforts of the Web research community. Direct access means structured data, thus enabling computing machinery to easily exploit the linking of different data sources. It has became crucial for the Web community to provide a technology stack for easing data integration at large scale, first structuring the data using standard ontologies and afterwards linking them to external data. Ontologies became the best practices to define axioms and relationships among classes and the Resource Description Framework (RDF) became the basic data model chosen to represent the ontology instances (i.e. an instance is a value of an axiom, class or attribute). Data becomes the new oil, in particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. In the literature these problems have been addressed with several proposals and standards, that mainly focus on technologies to access the data and on formats to represent the semantics of the data and their relationships. With the increasing of the volume of interconnected and serialized RDF data, RDF repositories may suffer from data overloading and may become a single point of failure for the overall Linked Data vision. One of the goals of this dissertation is to propose a thorough approach to manage the large scale RDF repositories, and to distribute them in a redundant and reliable peer-to-peer RDF architecture. The architecture consists of a logic to distribute and mine the knowledge and of a set of physical peer nodes organized in a ring topology based on a Distributed Hash Table (DHT). Each node shares the same logic and provides an entry point that enables clients to query the knowledge base using atomic, disjunctive and conjunctive SPARQL queries. The consistency of the results is increased using data redundancy algorithm that replicates each RDF triple in multiple nodes so that, in the case of peer failure, other peers can retrieve the data needed to resolve the queries. Additionally, a distributed load balancing algorithm is used to maintain a uniform distribution of the data among the participating peers by dynamically changing the key space assigned to each node in the DHT. Recently, the process of data structuring has gained more and more attention when applied to the large volume of text information spread on the Web, such as legacy data, news papers, scientific papers or (micro-)blog posts. This process mainly consists in three steps: \emph{i)} the extraction from the text of atomic pieces of information, called named entities; \emph{ii)} the classification of these pieces of information through ontologies; \emph{iii)} the disambigation of them through Uniform Resource Identifiers (URIs) identifying real world objects. As a step towards interconnecting the web to real world objects via named entities, different techniques have been proposed. The second objective of this work is to propose a comparison of these approaches in order to highlight strengths and weaknesses in different scenarios such as scientific and news papers, or user generated contents. We created the Named Entity Recognition and Disambiguation (NERD) web framework, publicly accessible on the Web (through REST API and web User Interface), which unifies several named entity extraction technologies. Moreover, we proposed the NERD ontology, a reference ontology for comparing the results of these technologies. Recently, the NERD ontology has been included in the NIF (Natural language processing Interchange Format) specification, part of the Creating Knowledge out of Interlinked Data (LOD2) project. Summarizing, this dissertation defines a framework for the extraction of knowledge from unstructured data and its classification via distributed ontologies. A detailed study of the Semantic Web and knowledge extraction fields is proposed to define the issues taken under investigation in this work. Then, it proposes an architecture to tackle the single point of failure issue introduced by the RDF repositories spread within the Web. Although the use of ontologies enables a Web where data is structured and comprehensible by computing machinery, human users may take advantage of it especially for the annotation task. Hence, this work describes an annotation tool for web editing, audio and video annotation in a web front end User Interface powered on the top of a distributed ontology. Furthermore, this dissertation details a thorough comparison of the state of the art of named entity technologies. The NERD framework is presented as technology to encompass existing solutions in the named entity extraction field and the NERD ontology is presented as reference ontology in the field. Finally, this work highlights three use cases with the purpose to reduce the amount of data silos spread within the Web: a Linked Data approach to augment the automatic classification task in a Systematic Literature Review, an application to lift educational data stored in Sharable Content Object Reference Model (SCORM) data silos to the Web of data and a scientific conference venue enhancer plug on the top of several data live collectors. Significant research efforts have been devoted to combine the efficiency of a reliable data structure and the importance of data extraction techniques. This dissertation opens different research doors which mainly join two different research communities: the Semantic Web and the Natural Language Processing community. The Web provides a considerable amount of data where NLP techniques may shed the light within it. The use of the URI as a unique identifier may provide one milestone for the materialization of entities lifted from a raw text to real world object
Mapster: A Peer-to-Peer Data Sharing Environment
This paper describes a system called Mapster that allows users in a P2P network to share their databases. The research addresses problems of heterogeneity and scalability in P2P databases. To provide fine-grained access to usersā databases, schema matching and a super-peer topology are used. The schema matching component allows information to be translated by semi-automatically determining the mappings between the databases within the P2P network. A super-peer topology enables the schema matching techniques to operate effectively in large, dynamic, heterogeneous networks
Conflict-Free Replicated Data Types in Dynamic Environments
Over the years, mobile devices have become increasingly popular and gained improved
computation capabilities allowing them to perform more complex tasks such as
collaborative applications. Given the weak characteristic properties of mobile networks,
which represent highly dynamic environments where users may experience regular involuntary
disconnection periods, the big question arises of how to maintain data consistency.
This issue is most pronounced in collaborative environments where multiple users interact
with each other, sharing a replicated state that may diverge due to concurrency
conflicts and loss of updates.
To maintain consistency, one of todayās best solutions is Conflict-Free Replicated Data
Types (CRDTs), which ensure low latency values and automatic conflict resolution, guaranteeing
eventual consistency of the shared data. However, a limitation often found on
CRDTs and the systems that employ them is the need for the knowledge of the replicas
whom the state changes must be disseminated to. This constitutes a problem since it is
inconceivable to maintain said knowledge in an environment where clients may leave
and join at any given time and consequently get disconnected due to mobile network
communications unreliability.
In this thesis, we present the study and extension of the CRDT concept to dynamic
environments by introducing the developed P/S-CRDTs model, where CRDTs are coupled
with the publisher/subscriber interaction scheme and additional mechanisms to
ensure users are able to cooperate and maintain consistency whilst accounting for the
consequent volatile behaviors of mobile networks. The experimental results show that
in volatile scenarios of disconnection, mobile users in collaborative activity maintain
consistency among themselves and when compared to other available CRDT models, the
P/S-CRDTs model is able to decouple the required knowledge of whom the updates must
be disseminated to, while ensuring appropriate network traffic values
Cloud-edge hybrid applications
Many modern applications are designed to provide interactions among users, including multi-
user games, social networks and collaborative tools. Users expect application response time to
be in the order of milliseconds, to foster interaction and interactivity.
The design of these applications typically adopts a client-server model, where all interac-
tions are mediated by a centralized component. This approach introduces availability and fault-
tolerance issues, which can be mitigated by replicating the server component, and even relying on
geo-replicated solutions in cloud computing infrastructures. Even in this case, the client-server
communication model leads to unnecessary latency penalties for geographically close clients and
high operational costs for the application provider.
This dissertation proposes a cloud-edge hybrid model with secure and ecient propagation
and consistency mechanisms. This model combines client-side replication and client-to-client
propagation for providing low latency and minimizing the dependency on the server infras-
tructure, fostering availability and fault tolerance. To realize this model, this works makes the
following key contributions.
First, the cloud-edge hybrid model is materialized by a system design where clients maintain
replicas of the data and synchronize in a peer-to-peer fashion, and servers are used to assist
clientsā operation. We study how to bring most of the application logic to the client-side, us-
ing the centralized service primarily for durability, access control, discovery, and overcoming
internetwork limitations.
Second, we dene protocols for weakly consistent data replication, including a novel CRDT
model (ā-CRDTs). We provide a study on partial replication, exploring the challenges and
fundamental limitations in providing causal consistency, and the diculty in supporting client-
side replicas due to their ephemeral nature.
Third, we study how client misbehaviour can impact the guarantees of causal consistency.
We propose new secure weak consistency models for insecure settings, and algorithms to enforce
such consistency models.
The experimental evaluation of our contributions have shown their specic benets and
limitations compared with the state-of-the-art. In general, the cloud-edge hybrid model leads to
faster application response times, lower client-to-client latency, higher system scalability as fewer clients need to connect to servers at the same time, the possibility to work oine or disconnected
from the server, and reduced server bandwidth usage.
In summary, we propose a hybrid of cloud-and-edge which provides lower user-to-user la-
tency, availability under server disconnections, and improved server scalability ā while being
ecient, reliable, and secure.Muitas aplicaƧƵes modernas sĆ£o criadas para fornecer interaƧƵes entre utilizadores, incluindo
jogos multiutilizador, redes sociais e ferramentas colaborativas. Os utilizadores esperam que o
tempo de resposta nas aplicaƧƵes seja da ordem de milissegundos, promovendo a interaĆ§Ć£o e
interatividade.
A arquitetura dessas aplicaƧƵes normalmente adota um modelo cliente-servidor, onde todas as
interaƧƵes sĆ£o mediadas por um componente centralizado. Essa abordagem apresenta problemas
de disponibilidade e tolerĆ¢ncia a falhas, que podem ser mitigadas com replicaĆ§Ć£o no componente
do servidor, atĆ© com a utilizaĆ§Ć£o de soluƧƵes replicadas geogracamente em infraestruturas de
computaĆ§Ć£o na nuvem. Mesmo neste caso, o modelo de comunicaĆ§Ć£o cliente-servidor leva a
penalidades de latĆŖncia desnecessĆ”rias para clientes geogracamente prĆ³ximos e altos custos
operacionais para o provedor das aplicaƧƵes.
Esta dissertaĆ§Ć£o propƵe um modelo hĆbrido cloud-edge com mecanismos seguros e ecientes
de propagaĆ§Ć£o e consistĆŖncia. Esse modelo combina replicaĆ§Ć£o do lado do cliente e propagaĆ§Ć£o
de cliente para cliente para fornecer baixa latĆŖncia e minimizar a dependĆŖncia na infraestrutura
do servidor, promovendo a disponibilidade e tolerĆ¢ncia a falhas. Para realizar este modelo, este
trabalho faz as seguintes contribuiƧƵes principais.
Primeiro, o modelo hĆbrido cloud-edge Ć© materializado por uma arquitetura do sistema em
que os clientes mantĆŖm rĆ©plicas dos dados e sincronizam de maneira ponto a ponto e onde os
servidores sĆ£o usados para auxiliar na operaĆ§Ć£o dos clientes. Estudamos como trazer a maior
parte da lĆ³gica das aplicaƧƵes para o lado do cliente, usando o serviƧo centralizado principalmente
para durabilidade, controlo de acesso, descoberta e superaĆ§Ć£o das limitaƧƵes inter-rede.
Em segundo lugar, denimos protocolos para replicaĆ§Ć£o de dados fracamente consistentes,
incluindo um novo modelo de CRDTs (ā-CRDTs). Fornecemos um estudo sobre replicaĆ§Ć£o parcial,
explorando os desaos e limitaƧƵes fundamentais em fornecer consistĆŖncia causal e a diculdade
em suportar rĆ©plicas do lado do cliente devido Ć sua natureza efĆ©mera.
Terceiro, estudamos como o mau comportamento da parte do cliente pode afetar as garantias
da consistĆŖncia causal. Propomos novos modelos seguros de consistĆŖncia fraca para conguraƧƵes
inseguras e algoritmos para impor tais modelos de consistĆŖncia.
A avaliaĆ§Ć£o experimental das nossas contribuiƧƵes mostrou os benefĆcios e limitaƧƵes em comparaĆ§Ć£o com o estado da arte. Em geral, o modelo hĆbrido cloud-edge leva a tempos de resposta
nas aplicaƧƵes mais rĆ”pidos, a uma menor latĆŖncia de cliente para cliente e Ć possibilidade de
trabalhar oine ou desconectado do servidor. Adicionalmente, obtemos uma maior escalabilidade
do sistema, visto que menos clientes precisam de estar conectados aos servidores ao mesmo tempo
e devido Ć reduĆ§Ć£o na utilizaĆ§Ć£o da largura de banda no servidor.
Em resumo, propomos um modelo hĆbrido entre a orla (edge) e a nuvem (cloud) que fornece
menor latĆŖncia entre utilizadores, disponibilidade durante desconexƵes do servidor e uma melhor
escalabilidade do servidor ā ao mesmo tempo que Ć© eciente, conĆ”vel e seguro
- ā¦