10 research outputs found

    Storing and Querying Probabilistic XML Using a Probabilistic Relational DBMS

    Get PDF
    This work explores the feasibility of storing and querying probabilistic XML in a probabilistic relational database. Our approach is to adapt known techniques for mapping XML to relational data such that the possible worlds are preserved. We show that this approach can work for any XML-to-relational technique by adapting a representative schema-based (inlining) as well as a representative schemaless technique (XPath Accelerator). We investigate the maturity of probabilistic rela- tional databases for this task with experiments with one of the state-of- the-art systems, called Trio

    Management of medical images through the Web

    Get PDF
    En los últimos años, el uso de imágenes digitales para investigación y diagnóstico médico ha aumentado considerablemente. Debido a esto, es necesario desarrollar nuevas y mejores aplicaciones para gestionar grandes cantidades de información médica de forma eficiente. Los formatos más utilizados para codificar las imágenes médicas son DICOM y Analyze 7.5. DICOM es un estándar para imágenes digitales y comunicaciones en medicina y Analyze es un formato propietario diseñado para el mismo propósito. Sin embargo, es difícil intercambiar e integrar la información DICOM o Analyze fuera del ámbito del equipamiento médico específico. Este inconveniente dificulta su uso e integración en un ámbito más amplio como es la Web. Por un lado, XML es el estándar para el intercambio de información y transporte de datos entre múltiples aplicaciones y, por otro, las bases de datos XML surgen como la mejor alternativa para almacenar y gestionar la información XML. En este trabajo presentamos un sistema de información Web para almacenar, de forma integrada, ficheros DICOM y Analyze en una base de datos XML. Para este desarrollo se han obtenido los esquemas XML de los formatos DICOM y Analyze y se ha definido la arquitectura para la integración de documentos XML en la base de datos.In recent years, the use of digital imaging for medical diagnostics and research has increased considerably. Because of this, it is necessary to develop new and better applications to manage large amounts of medical information efficiently. The The most widely used formats for encoding medical images are DICOM and Analyze 7.5. DICOM is a standard for digital imaging and communications in medicine and Analyze is a proprietary format designed for the same purpose. However, it is difficult to exchange and integrate DICOM or Analyze information outside the scope of specific medical equipment. This inconvenience hinders its use and integration in a broader environment such as the Web. For On the one hand, XML is the standard for the exchange of information and transport of data between multiple applications and, on the other hand, XML databases emerge as the best alternative for store and manage XML information. In this work we present a system of Web information to store, in an integrated way, DICOM and Analyze files in a database of XML data. For this development, the XML schemas of the DICOM formats have been obtained. and Analyze, and the architecture for the integration of XML documents in the database has been defined. data

    Gestión de imágenes médicas a través de la Web

    Get PDF
    En los ultimos años, el uso de imágenes digitales para investigación y diagnóstico médico ha aumentado considerablemente. Debido a esto, es necesario desarrollar nuevas y mejores aplicaciones para gestionar grandes cantidades de información médica de forma eficiente.Palabras clave: Imágenes médicas, XML, DICOM, Sistemas de información Web

    Efficient processing of complex XSD using Hive and Spark

    Get PDF
    The eXtensible Markup Language (XML) files are widely used by the industry due to their flexibility in representing numerous kinds of data. Multiple applications such as financial records, social networks, and mobile networks use complex XML schemas with nested types, contents, and/or extension bases on existing complex elements or large real-world files. A great number of these files are generated each day and this has influenced the development of Big Data tools for their parsing and reporting, such as Apache Hive and Apache Spark. For these reasons, multiple studies have proposed new techniques and evaluated the processing of XML files with Big Data systems. However, a more usual approach in such works involves the simplest XML schemas, even though, real data sets are composed of complex schemas. Therefore, to shed light on complex XML schema processing for real-life applications with Big Data tools, we present an approach that combines three techniques. This comprises three main methods for parsing XML files: cataloging, deserialization, and positional explode. For cataloging, the elements of the XML schema are mapped into root, arrays, structures, values, and attributes. Based on these elements, the deserialization and positional explode are straightforwardly implemented. To demonstrate the validity of our proposal, we develop a case study by implementing a test environment to illustrate the methods using real data sets provided from performance management of two mobile network vendors. Our main results state the validity of the proposed method for different versions of Apache Hive and Apache Spark, obtain the query execution times for Apache Hive internal and external tables and Apache Spark data frames, and compare the query performance in Apache Hive with that of Apache Spark. Another contribution made is a case study in which a novel solution is proposed for data analysis in the performance management systems of mobile networks.Unidad de Gestión de Investigación y Proyección Social from the Escuela Politécnica Naciona

    Polymorphic Data Modeling

    Get PDF
    There are currently no data modeling standards for modeling NoSQL document store databases. This work proposes a standard to fill the void. The proposed standard is based on our new data modeling pattern named The Polymorphic Table Pattern. The pattern embraces the “schemaless” nature of document store NoSQL while allowing the data modeler to use his or her existing skillsets. The concepts of our proposed modeling have been demonstrated against MongoDB

    Integration of large datasets for plant model organisms

    Get PDF
    This dissertation is concerned with bioinformatics data integration. The first chapter illustrates the current state of biological pathway databases in general, and in particular, plant pathway databases. Key studies are cited to illustrate the potential benefits that may come from further research into integration methods. Different models are explored to interface with the various stakeholders of biological data repositories. A public website (http://www.metnetonline.org) was built to address the role of a bioinformatics data warehouse as a server for external third parties. A dedicated API (MetNetAPI: http://www.metnetonline.org/api) accommodates bioinformaticians (and software developers in general) who wish to build advanced applications on top of MetNet. The API (implemented as .NET and Java libraries) was designed to be as user-friendly to programmers, as the public website is to end-users. Finally, a hybrid model is examined: the use of XML as a repository for information integration, downstream processing, and data manipulation. An overview of the use of XML in biological applications is included. MetNetAPI functions according to certain principles; a subset of the API is abstracted and implemented to interface with a range of other public databases. This results in a new bioinformatics toolkit that can be used to mix and match data from heterogeneous sources in a transparent manner. An example would be the grafting of protein-protein interaction data on top of araCyc pathways. Biological network data is often distributed over a variety of independently modeled databases. This dissertation makes two contributions to the field of bioinformatics: A new service - MetNet Online - is now operating which offers access to the earlier created and integrated MetNetDB data repository. The service is geared toward end-users, students and researchers alike, as well as seasoned bioinformatics software developers who wish to build their own applications on top of an already integrated datasource. Furthermore, integrated databases are only useful when they can be synchronized with their respective external sources. Thus, a framework was created that allows for a systematic approach to such integration efforts. In closing, this work provides a roadmap to maintain current as well as prepare for future integrated biological database projects

    An XML-based framework for electronic business document integration with relational databases

    Full text link
    Small and medium enterprises (SMEs) are becoming increasingly engaged in B2B interactions. The ubiquitousness of the Internet and the quasi-reliance on electronic document exchanges with larger trading partners have fostered this move. The main technical challenge that this brings to SMEs is that of business document integration: they need to exchange business documents with heterogeneous document formats and also integrate these documents with internal information systems. Often they can not afford using expensive, customized and proprietary solutions for document exchange and storage. Rather they need cost-effective approaches designed based on open standards and backed with easy-to-use information systems. In this dissertation, we investigate the problem of business document integration for SMEs following a design science methodology. We propose a framework and conceptual architecture for a business document integration system (BDIS). By studying existing business document formats, we recommend using the GS1 XML standard format as the intermediate format for business documents in BDIS. The GS1 standards are widely used in supply chains and logistics globally. We present an architecture for BDIS consisting of two layers: one for the design of internal information system based on relational databases, capable of storing XML business documents, and the other enabling the exchange of heterogeneous business documents at runtime. For the design layer, we leverage existing XML schema conversion approaches, and extend them, to propose a customized and novel approach for converting GS1 XML document schemas into relational schemas. For the runtime layer, we propose wrappers as architectural components for the conversion of various electronic documents formats into the GS1 XML format. We demonstrate our approach through a case study involving a GS1 XML business document. We have implemented a prototype BDIS. We have evaluated and compared it with existing research and commercial tools for XML to relational schema conversion. The results show that it generates operational and simpler relational schemas for GS1 XML documents. In conclusion, the proposed framework enables SMEs to engage effectively in electronic business

    Uma proposta de mapeamento do modelo XML Schema para o modelo relacional

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Ciência da Computação.O uso da XML como padrão para intercâmbio de dados gera a necessidade de um esquema comum a ser seguido pelos sistemas envolvidos. Os mecanismos mais usados para a definição de esquemas XML são a DTD e a XML Schema. Com estas tecnologias, é possível definir a estrutura a ser seguida pelos documentos XML a serem intercambiados, estabelecendo um protocolo de troca de informações independente do mecanismo de armazenamento de dados usado pelos sistemas. Neste contexto, é necessário que os sistemas comunicantes sejam capazes de transformar seu modelo de dados em XML para o modelo de dados utilizado pelo sistema e vice-versa. O modelo de dados relacional é utilizado por muitos destes sistemas, considerando a grande disponibilidade de Sistemas Gerenciadores de Bancos de Dados (SGBDs) que adotam este modelo. Para isso, estes sistemas devem desenvolver mecanismos para exportar os dados de suas tabelas no formato XML e também para decompor documentos XML e armazená-los no SGBD. Estes mecanismos devem ser genéricos, dinâmicos e eficientes para garantir uma atividade adequada de intercâmbio de dados. Assim sendo, este trabalho propõe um mecanismo baseado em regras para transformar um esquema de dados XML, definido com o uso de XML Schema, para um esquema de dados relacional, que pode ser usado por SGBDs relacionais existentes no mercado. Como contribuição específica, este trabalho realiza uma análise detalhada dos conceitos do modelo XML Schema, considerando estes conceitos nas regras para transformação do XML Schema em um esquema de dados relacional

    The Fourth International VLDB Workshop on Management of Uncertain Data

    Get PDF
    corecore