15 research outputs found

    Model-Driven Integration of Compression Algorithms in Column-Store Database Systems

    Get PDF
    Abstract. Modern database systems are very often in the position to store their entire data in main memory. Aside from increased main memory capacities, a further driver for in-memory database systems was the shift to a decomposition storage model in combination with lightweight data compression algorithms. Using both mentioned storage design concepts, large datasets can be held and processed in main memory with a low memory footprint. In recent years, a large corpus of lightweight data compression algorithms has been developed to efficiently support different data characteristics. In this paper, we present our novel model-driven concept to integrate this large and evolving corpus of lightweight data compression algorithms in column-store database systems. Core components of our concept are (i) a unified conceptual model for lightweight compression algorithms, (ii) specifying algorithms as platform-independent model instances, (iii) transforming model instances into low-level system code, and (iv) integrating low-level system code into a storage layer

    TOPAZ:a tool kit for the assembly of transaction managers for non-standard applications

    Full text link
    'Advanced database applications', such as CAD/CAM, CASE, large AI applications or image and voice processing, place demands on transaction management which differ substantially from those in traditional database applications. In particular, there is a need to support 'enriched' data models (which include, for example, complex objects or version and configuration management), 'synergistic' cooperative work, and application- or user-supported consistency. Unfortunately, the demands are not only sophisticated but also diversified, which means that different application areas might even place contradictory demands on transaction management. This paper deals with these problems and offers a solution by introducing a flexible and adaptable tool kit approach for transaction management

    Uma proposta de arquitetura de alto desempenho para sistemas PACS baseada em extensões de banco de dados

    Get PDF
    Orientador : Prof. Dr. Aldo Von WangenheimTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 25/07/2014Inclui referênciasResumo: O uso de imagens digitais no processo de diagnóstico médico é observável em diferentes escalas e cenários de aplicação, tendo evoluído em termos de volume de dados adquiridos e número de modalidades de exame atendidas. A organização desse conteúdo digital, comumente representado por conjuntos de imagens no pa-drão DICOM (Digital Imaging and Communications in Medicine), costuma ser dele-gada a sistemas PACS (Picture Archiving and Communication System) baseados na agregação de componentes heterogêneos de hardware e software. Parte desses componentes interage de forma a compor a camada de armazenamento do PACS, responsável pela persistência de toda e qualquer imagem digital que, em algum momento, foi adquirida ou visualizada/manipulada via sistema. Apesar de emprega-rem recursos altamente especializados como SGBDs (Sistemas Gerenciadores de Banco de Dados), as camadas de armazenamento PACS atuais são visualizadas e utilizadas como simples repositórios de dados, assumindo um comportamento pas-sivo (ou seja, sem a agregação de regras de negócio) quando comparadas a outros componentes do sistema. Neste trabalho, propõe-se uma nova arquitetura PACS simplificada baseada em alterações na sua camada de armazenamento. As alterações previstas baseiam-se na troca do perfil passivo assumido atualmente por essa camada por um perfil ativo, utilizando-se de recursos de extensibilidade e de distribuição de dados (hoje não empregados) disponibilizados por seus componentes. A arquitetura proposta concentra-se na comunicação e no armazenamento de dados, utilizando-se de ex-tensões de SGBDs e de estruturas heterogêneas para armazenamento de dados convencionais e não convencionais, provendo alto desempenho em termos de es-calabilidade, suporte a grandes volumes de conteúdo e processamento descentrali-zado de consultas. Estruturalmente, a arquitetura proposta é formada por um con-junto de módulos projetados de forma a explorar as opções de extensibilidade pre-sentes em SGBDs, incorporando características e funcionalidades originalmente dis-tribuídas entre outros componentes do PACS (na forma de regras de negócio). Em nível de protótipo, resultados obtidos a partir de experimentos indicam a viabilidade de uso da arquitetura proposta, explicitando ganhos de desempenho na pesquisa de metadados e na recuperação de imagens DICOM quando comparados a arquiteturas PACS convencionais. A flexibilidade da proposta quanto à adoção de tecnologias de armazenamento heterogêneas também é avaliada positivamente, permitindo estender a camada de armazenamento PACS em termos de escalabili-dade, poder de processamento, tolerância a falhas e representação de conteúdo. Palavras-chave: PACS, DICOM, SGBD, extensibilidade, alto desempenho.Abstract: The use of digital images on medical diagnosis is observable in a number of application scenarios and in different scales, growing in terms of volume of data and contemplated medical specialties. To organize this digital content composed by image datasets in DICOM (Digital Imaging and Communications in Medicine), it is usual to adopt PACS (Picture Archiving and Communication System), an architecture built as an aggregation of hardware and software components. Some of these components compose the so-called PACS's storage layer, responsible for the persistence of every digital image acquired or visualized/manipulated through the system. Despite their high-specialized components (e.g., DBMS - Database Management System), PACS storage layers used today are visualized as simple data repositories, assuming a passive role (i.e., without the implementation of business rules) when compared to other components. In this work, a simplified, new architecture is proposed for PACS, based in modifications on its storage layer. The modifications are based in the replacement of the current passive role by an active one, using extensibility and data distribution resources available on its components. The proposed architecture focuses on communication and data storage, using DBMS extensions and heterogeneous structures for the storage of conventional and non-conventional data, providing high-performance in terms of scalability, support to large volumes of data and decentralized query processing. Structurally, the proposed architecture is composed by a set of modules designed to explore extensibility options available in DBMSs, incorporating characteristics and functionalities originally distributed as business rules among other components of PACS. At prototype level, results obtained through experiments indicate the viability of the proposal, making explicit the performance gains in the search for metadata and image retrieval when compared to conventional PACS architectures. The flexibility of the proposal regarding the adoption of heterogeneous storage technologies is also positively evaluated, allowing the extension of the PACS storage layer in terms of scalability, processing power, fault tolerance and content representation. Keywords: PACS, DICOM, DBMS, extensibility, High-Performance Computing

    Exploring parallelism with object oriented database management system

    Get PDF
    The object oriented approach to database management systems aims to remove the limitations of the current systems by providing enhanced semantic capabilities and more flexible facilities, including the encapsulation of operations as well as data in the specification of an object. Such systems are certainly more complex than existing database management systems. Although, they are complex, the current object oriented database management systems are built for Von-Neumann (purely sequential) machines. Such implementation inevitably leads to major problems involving efficiency and performance. So, new techniques for implementation need to be investigated. One possible solution for the efficiency, and performance problems is to use parallel processing techniques. Thus, the aim of this research is to propose aspects in which parallel processing can be introduced within the scope of object oriented database management systems and identify ways in which the performance can be improved. A prototype of the main components of an object oriented database system called KBZ has been implemented to test out some of the parallel processing aspects. The thesis starts with an introduction and background to the research. It then describes major parallel system architectures for an object oriented database management system. Techniques such as distributing a large volume of data among various processors (transputers), performing processing in the background of the system to reduce response time, and performing input/output parallel processing are presented. The initial prototype, PKBZ version-1, is then described; in particular, the logical and physical representation of object classes, how they communicate through message sending, and the different types of message supported. Two prototype versions exist. The initial prototype was designed to investigate the parallel implementation and general functionality of the system. The second version provides greater flexibility and incorporates enhanced functionality to allow experimentation. The enhancements in the second version are also discussed in the thesis, and the experimental results using different transputer configurations are illustrated and analyzed

    Function-based indexing for object-oriented databases

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.Includes bibliographical references (p. 167-171).by Deborah Jing-Hwa Hwang.Ph.D

    Just-in-time Analytics Over Heterogeneous Data and Hardware

    Get PDF
    Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of datasets to gain insights. At the same time, data variety increases continuously across multiple axes. First, data comes in multiple formats, such as the binary tabular data of a DBMS, raw textual files, and domain-specific formats. Second, different datasets follow different data models, such as the relational and the hierarchical one. Data location also varies: Some datasets reside in a central "data lake", whereas others lie in remote data sources. In addition, users execute widely different analysis tasks over all these data types. Finally, the process of gathering and integrating diverse datasets introduces several inconsistencies and redundancies in the data, such as duplicate entries for the same real-world concept. In summary, heterogeneity significantly affects the way data analysis is performed. In this thesis, we aim for data virtualization: Abstracting data out of its original form and manipulating it regardless of the way it is stored or structured, without a performance penalty. To achieve data virtualization, we design and implement systems that i) mask heterogeneity through the use of heterogeneity-aware, high-level building blocks and ii) offer fast responses through on-demand adaptation techniques. Regarding the high-level building blocks, we use a query language and algebra to handle multiple collection types, such as relations and hierarchies, express transformations between these collection types, as well as express complex data cleaning tasks over them. In addition, we design a location-aware compiler and optimizer that masks away the complexity of accessing multiple remote data sources. Regarding on-demand adaptation, we present a design to produce a new system per query. The design uses customization mechanisms that trigger runtime code generation to mimic the system most appropriate to answer a query fast: Query operators are thus created based on the query workload and the underlying data models; the data access layer is created based on the underlying data formats. In addition, we exploit emerging hardware by customizing the system implementation based on the available heterogeneous processors â CPUs and GPGPUs. We thus pair each workload with its ideal processor type. The end result is a just-in-time database system that is specific to the query, data, workload, and hardware instance. This thesis redesigns the data management stack to natively cater for data heterogeneity and exploit hardware heterogeneity. Instead of centralizing all relevant datasets, converting them to a single representation, and loading them in a monolithic, static, suboptimal system, our design embraces heterogeneity. Overall, our design decouples the type of performed analysis from the original data layout; users can perform their analysis across data stores, data models, and data formats, but at the same time experience the performance offered by a custom system that has been built on demand to serve their specific use case

    An Architecture for the Compilation of Persistent Polymorphic Reflective Higher-Order Languages

    Get PDF
    Persistent Application Systems are potentially very large and long-lived application systems which use information technology: computers, communications, networks, software and databases. They are vital to the organisations that depend on them and have to be adaptable to organisational and technological changes and evolvable without serious interruption of service. Persistent Programming Languages are a promising technology that facilitate the task of incrementally building and maintaining persistent application systems. This thesis identifies a number of technical challenges in making persistent programming languages scalable, with adequate performance and sufficient longevity and in amortising costs by providing general services. A new architecture to support the compilation of long-lived, large-scale applications is proposed. This architecture comprises an intermediate language to be used by front-ends, high-level and machine independent optimisers, low-level optimisers and code generators of target machine code. The intermediate target language, TPL, has been designed to allow compiler writers to utilise common technology for several different orthogonally persistent higher-order reflective languages. The goal is to reuse optimisation and code-generation or interpretation technology with a variety of front-ends. A subsidiary goal is to provide an experimental framework for those investigating optimisation and code generation. TPL has a simple, clean type system and will support orthogonally persistent, reflective, higher-order, polymorphic languages. TPL allows code generation and the abstraction over details of the underlying software and hardware layers. An experiment to build a prototype of the proposed architecture was designed, developed and evaluated. The experimental work includes a language processor and examples of its use are presented in this dissertation. The design space was covered by describing the implications of the goals of supporting the class of languages anticipated while ensuring long-term persistence of data and programs, and sufficient efficiency. For each of the goals, the design decisions were evaluated in face of the results
    corecore