    Scalable XML Collaborative Editing with Undo short paper

    Commutative Replicated Data-Type (CRDT) is a new class of algorithms that ensures scalable consistency of replicated data. It has been successfully applied to collaborative editing of texts without complex concurrency control. In this paper, we present a CRDT to edit XML data. Compared to existing approaches for XML collaborative editing, our approach is more scalable and handles all the XML editing aspects : elements, contents, attributes and undo. Indeed, undo is recognized as an important feature for collaborative editing that allows to overcome system complexity through error recovery or collaborative conflict resolution

    A Framework for XML-based Integration of Data, Visualization and Analysis in a Biomedical Domain

    Biomedical data are becoming increasingly complex and heterogeneous in nature. The data are stored in distributed information systems, using a variety of data models, and are processed by increasingly more complex tools that analyze and visualize them. We present in this paper our framework for integrating biomedical research data and tools into a unique Web front end. Our framework is applied to the University of Washington’s Human Brain Project. Specifically, we present solutions to four integration tasks: definition of complex mappings from relational sources to XML, distributed XQuery processing, generation of heterogeneous output formats, and the integration of heterogeneous data visualization and analysis tools

    Integrating XQuery and P2P in MonetDB/XQuery*

    MonetDB/XQuery* is a fully functional publicly available XML DBMS that has been extended with distributed and P2P data management functionality. Our (minimal) XQuery language extension XRPC adds the concept of RPC to XQuery, and exploits the set-at-a-time database processing model to optimize the networking cost through a technique called Bulk RPC. We describe our approach to include the services offered by diverse P2P network structures (such as DHTs), in a way that avoids any further intrusion in the XQuery language and semantics, and show how this, similarly to Bulk RPC, will lead to further query optimization opportunities where the XDBMS interacts with the underlying P2P network. We also discuss some P2P data management applications were MonetDB/XQuery* is being used (an in-home small scenario and a wide-area collaborative application). As this research is work-in-progress, we outline some research questions on our path towards defining and realizing P2P XDBMS technology

    Document replication strategies for geographically distributed web search engines

    Cataloged from PDF version of article.Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine. (C) 2012 Elsevier Ltd. All rights reserved

    Providing packages of relevant ATM information: An ontology-based approach

    ATM information providers publish reports and notifications of different types using standardized information exchange models. For a typical information user, e.g., an aircraft pilot, only a fraction of the published information is relevant for a particular task. Filtering out irrelevant information from different information sources is in itself a challenging task, yet it is only a first step in providing relevant information, the challenges concerning maintenance, auditability, availability, integration, comprehensibility, and traceability. This paper presents the Semantic Container approach, which employs ontology-based faceted information filtering and allows for the packaging of filtered information and associated metadata in semantic containers, thus facilitating reuse of filtered information at different levels. The paper formally defines an abstract model of ontology-based information filtering and the structure of semantic containers, their composition, versioning, discovery, and replicated physical allocation. The paper further discusses different usage scenarios, the role of semantic containers in SWIM, an architecture for a semantic container management system, as well as a proof-of-concept prototype. Finally the paper discusses a blockchain-based notary service to realize tamper-proof version histories for semantic containers.acceptedVersio

    Grid Data Management: Open Problems and New Issues

    International audienceInitially developed for the scientific community, Grid computing is now gaining much interest in important areas such as enterprise information systems. This makes data management critical since the techniques must scale up while addressing the autonomy, dynamicity and heterogeneity of the data sources. In this paper, we discuss the main open problems and new issues related to Grid data management. We first recall the main principles behind data management in distributed systems and the basic techniques. Then we make precise the requirements for Grid data management. Finally, we introduce the main techniques needed to address these requirements. This implies revisiting distributed database techniques in major ways, in particular, using P2P techniques

    An Ontology-Oriented Architecture for Dealing With Heterogeneous Data Applied to Telemedicine Systems

    Current trends in medicine regarding issues of accessibility to and the quantity and quality of information and quality of service are very different compared to former decades. The current state requires new methods for addressing the challenge of dealing with enormous amounts of data present and growing on the Web and other heterogeneous data sources such as sensors and social networks and unstructured data, normally referred to as big data. Traditional approaches are not enough, at least on their own, although they were frequently used in hybrid architectures in the past. In this paper, we propose an architecture to process big data, including heterogeneous sources of information. We have defined an ontology-oriented architecture, where a core ontology has been used as a knowledge base and allows data integration of different heterogeneous sources. We have used natural language processing and artificial intelligence methods to process and mine data in the health sector to uncover the knowledge hidden in diverse data sources. Our approach has been applied to the field of personalized medicine (study, diagnosis, and treatment of diseases customized for each patient) and it has been used in a telemedicine system. A case study focused on diabetes is presented to prove the validity of the proposed model.This work was supported in part by the Spanish Ministry of Economy and Competitiveness (MINECO) under Project SEQUOIA-UA (TIN2015-63502-C3-3-R) and Project RESCATA (TIN2015-65100-R) and in part by the Spanish Research Agency (AEI) and the European Regional Development Fund (FEDER) under Project CloudDriver4Industry (TIN2017-89266-R)

    Query processing in P2P systems

    Peer-to-peer (P2P) computing offers new opportunities for building highly distributed data systems. Unlike client-server computing, P2P is a very dynamic environment where peers can join and leave the network at any time. This yields important advantages such as operation without central coordination, peers autonomy, and scale up to large number of peers. However, providing high-level data management services is difficult. Most techniques designed in distributed database systems which statically exploit schema and network information no longer apply. New techniques are needed which should be decentralized, dynamic and self-adaptive. In this paper, we survey the techniques which have been developed for query processing in P2P systems. We first give an overview of the existing P2P networks, and com-pare their properties from the perspective of data management. Then, we discuss the ap-proaches which are used for schema mapping. Then, we describe the algorithms which have been proposed for query routing. In particular, we focus on query routing in unstructured net-works and DHTs. Finally, we present the techniques which have been proposed for processing complex queries, e.g. top-k queries, in P2P systems, in particular in DHTs

    PEWS Editor, um front-end para a linguagem PEWS

    Orientador: Martin A. MusicanteInclui apendiceDissertaçao (mestrado) - Universidade Federal do Paraná, Setor de Ciencias Exatas, Programa de Pós-Graduaçao em Informática. Defesa: Curitiba, 2006Inclui bibliografiaResumo: PEWS 'e uma linguagem de composi¸c˜ao de servi¸cos web. Composi¸c˜oes PEWS podem ser utilizadas para a descri¸c˜ao de servi¸cos web simples tanto quanto compostos. Servi¸cos web simples s˜ao constru'ýdos a partir de programas em Java1 enquanto que servi¸cos web compostos s˜ao constru'ýdos a partir da composi¸c˜ao de servi¸cos web j'a existentes. PEWS possui uma vers˜ao XML chamada XPEWS, permitindo que a linguagem possa ser utilizada tamb'em no n'ývel de interface. Com o objetivo de facilitar a utiliza¸c˜ao de PEWS, 'e apresentado neste trabalho o desenvolvimento de um front-end na forma de um plug-in para a plataforma Eclipse, permitindo uma maior integra¸c˜ao com outras ferramentas como editores XML, WSDL e Java. O uso do plug-in pode ajudar na redu¸c˜ao do tempo de desenvolvimento das composi¸c˜oes, permitindo verifica¸c˜ao de erros de codifica¸c˜ao e gera¸c˜ao de c'odigo XPEWS, aumentando assim a produtividade do desenvolvedor. Finalmente, um estudo de caso 'e elaborado, analisando a linguagem PEWS do ponto de vista da sua expressividade, mediante a avalia¸c˜ao da linguagem, com base em um framework composto por padr˜oes para workflow. Este estudo de caso nos permite apresentar uma compara¸c˜ao com outras linguagens de composi¸c˜ao de servi¸cos web, baseada no mesmo framewor

    Semantos : a semantically smart information query language

    Enterprise Information Integration (EII) is rapidly becoming one of the pillars of modern corporate information systems. Given the spread and diversity of information sources in an enterprise, it has become increasingly difficult for decision makers to have access to relevant and accurate information at the opportune time. It has therefore become critical to seamlessly integrate the diverse information stores found in an organization into a single coherent data source. This is the job of EII and one of the key components to making it work is harnessing the implied meaning or semantics hidden within data sources. Modern EII systems are capable of harnessing semantic information and ontologies to make integration across data stores possible. These systems do not, however, allow a consumer of the integration service to build queries with semantic meaning. This is due to the fact that most EII systems make use of XQuery, SQL, or both, as query languages, neither of which has the capability to build semantically rich queries. In this thesis Semantos (from the Greek word sema for “sign or token”) is proposed as a viable alternative: an information query language based in XML, which is capable of exploiting ontologies, enabling consumers to build semantically enriched queries. An exploration is made into the characteristics or requirements that Semantos needs to satisfy as a semantically smart information query language. From these requirements we design and develop a software implementation. The benefit of Semantos is that it possesses a query structure that allows automated processes to decompose and restructure the queries without human intervention. We demonstrate the applicability of Semantos using two realistic examples: a query enhancement- and a query translation service. Both expound the ability of a Semantos query to be manipulated by automated services to achieve Information Integration goals.Dissertation (MSc)--University of Pretoria, 2009.Computer Scienceunrestricte