4 research outputs found

    Compacting XML Structures Using a Dynamic Labeling Scheme

    Full text link
    Abstract. Due to the growing popularity of XML as a data exchange and storage format, the need to develop efficient techniques for stor-ing and querying XML documents has emerged. A common approach to achieve this is to use labeling techniques. However, their main prob-lem is that they either do not support updating XML data dynamically or impose huge storage requirements. On the other hand, with the ver-bosity and redundancy problem of XML, which can lead to increased cost for processing XML documents, compaction of XML documents has be-come an increasingly important research issue. In this paper, we propose an approach called CXDLS combining the strengths of both, labeling and compaction techniques. Our approach exploits repetitive consecu-tive subtrees and tags for compacting the structure of XML documents by taking advantage of the ORDPATH labeling scheme. In addition it stores the compacted structure and the data values separately. Using our proposed approach, it is possible to support efficient query and update processing on compacted XML documents and to reduce storage space dramatically. Results of a comprehensive performance study are provided to show the advantages of CXDLS.

    Building a Data Warehouse for Twitter Stream Exploration

    Full text link
    In the recent year Twitter has evolved into an extremely popular social network and has revolutionized the ways of interacting and exchanging information on the Internet. By making its public stream available through a set of APIs Twitter has triggered a wave of research initiatives aimed at analysis and knowledge discovery from the data about its users and their messaging activities. While most of the projects and tools are tailored towards solving specific tasks, we pursue a goal of providing an application in dependent and universal analytical platform for supporting any kind of analysis and knowledge discovery. We employ the well established data warehousing technology with its underlying multidimensional data model, ETL routine for loading and consolidating data from different sources, OLAP functionality for exploring the data and data mining tools for more sophisticated analysis. In this work we describe the process of transforming the original stream into a set of related multidimensional cubes and demonstrate how the resulting data warehouse can be used for solving a variety of analytical tasks. We expect our proposed approach to be applicable for analyzing the data of other social networks as well

    Repositório de registos electrónicos de saúde baseado em OpenEHR

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaAn Electronic Health Record (EHR) aggregates all relevant medical information regarding a single patient, allowing a patient centric storage approach. This way the complete medical history of a patient is stored together in one record, making it possible to save time and work by allowing the sharing of information between health care institutions. To make this sharing possible there has to be agreed on the format in which the information is saved. There are many standards to de ne the way health information is stored, exchanged and retrieved. One of this standards is the Open Electronic Health Record (OpenEHR). The goal of this thesis is to create a repository which allows to store and manage patient records which follow the OpenEHR standard. The result of the implementation consists in three software parts, being them a Extensible Markup Language (XML) repository to store health information, a set of services allowing to manage and query the information stored and a web interface to demonstrate the implemented functionalities.Um registo electrónico de saúde agrega toda a informação médica relevante de um paciente, permitindo uma filosofia de armazenamento orientada ao mesmo. Desta forma todo o historial médico do paciente encontra-se armazenado num único registo, permitindo a optimização de custos e tempo gasto nas diferentes tarefas, através de partilha de informação entre diferentes instituições médicas. Para possibilitar esta partilha é necessário definir um formato comum em que a informação é armazenada. Para tal foram definidas diversas normas que ditam as regras de armazenamento, troca e recuperação de informação médica. Uma destas normas é o Open Electronic Health Record (OpenEHR). O objectivo desta dissertação e criar um reposit orio que permite o armazenamento de registos médicos que sigam a norma OpenEHR. A implementação dá origem a três componentes de software, sendo eles uma base de dados Extensible Markup Language (XML) para armazenamento de registos médicos, um conjunto de serviços para gestão e pesquisa da informação armazenada e uma interface web para demonstração das funcionalidades implementadas

    Pushing XPath Accelerator to its Limits

    No full text
    Two competing encoding concepts are known to scale well with growing amounts of XML data: XPath Accelerator encoding implemented by MonetDB for in-memory documents and X-Hive’s Persistent DOM for on-disk storage. We identified two ways to improve XPath Accelerator and present prototypes for the respective techniques: BaseX boosts inmemory performance with optimized data and value index structures while Idefix introduces native block-oriented persistence with logarithmic update behavior for true scalability, overcoming main-memory constraints. An easy-to-use Java-based benchmarking framework was developed and used to consistently compare these competing techniques and perform scalability measurements. The established XMark benchmark was applied to all four systems under test. Additional fulltext-sensitive queries against the well-known DBLP database complement the XMark results. Not only did the latest version of X-Hive finally surprise with good scalability and performance numbers. Also, both BaseX and Idefix hold their promise to push XPath Accelerator to its limits: BaseX efficiently exploits available main memory to speedup XML queries while Idefix surpasses main-memory constraints and rivals the on-disk leadership of X-Hive. The competition between XPath Accelerator and Persistent DOM definitely is relaunched
    corecore