175 research outputs found

    Schema Vacuuming in Temporal Databases

    Get PDF
    Temporal databases facilitate the support of historical information by providing functions for indicating the intervals during which a tuple was applicable (along one or more temporal dimensions). Because data are never deleted, only superceded, temporal databases are inherently append-only resulting, over time, in a large historical sequence of database states. Data vacuuming in temporal databases allows for this sequence to be shortened by strategically, and irrevocably, deleting obsolete data. Schema versioning allows users to maintain a history of database schemata without compromising the semantics of the data or the ability to view data through historical schemata. While the techniques required for data vacuuming in temporal databases have been relatively well covered, the associated area of vacuuming schemata has received less attention. This paper discusses this issue and proposes a mechanism that fits well with existing methods for data vacuuming and schema versioning

    Upstream database and digital asset management in variable data printing

    Get PDF
    This study outlines the upstream database and digital asset management issues for variable data printing. The goal is to clarify what work environment and processes are needed during digital asset and data preparation. A literature review was conducted and complemented with the experiential experience of establishing and using a variable data preparation and testing platform

    Transactional and analytical data management on persistent memory

    Get PDF
    Die zunehmende Anzahl von Smart-Geräten und Sensoren, aber auch die sozialen Medien lassen das Datenvolumen und damit die geforderte Verarbeitungsgeschwindigkeit stetig wachsen. Gleichzeitig müssen viele Anwendungen Daten persistent speichern oder sogar strenge Transaktionsgarantien einhalten. Die neuartige Speichertechnologie Persistent Memory (PMem) mit ihren einzigartigen Eigenschaften scheint ein natürlicher Anwärter zu sein, um diesen Anforderungen effizient nachzukommen. Sie ist im Vergleich zu DRAM skalierbarer, günstiger und dauerhaft. Im Gegensatz zu Disks ist sie deutlich schneller und direkt adressierbar. Daher wird in dieser Dissertation der gezielte Einsatz von PMem untersucht, um den Anforderungen moderner Anwendung gerecht zu werden. Nach der Darlegung der grundlegenden Arbeitsweise von und mit PMem, konzentrieren wir uns primär auf drei Aspekte der Datenverwaltung. Zunächst zerlegen wir mehrere persistente Daten- und Indexstrukturen in ihre zugrundeliegenden Entwurfsprimitive, um Abwägungen für verschiedene Zugriffsmuster aufzuzeigen. So können wir ihre besten Anwendungsfälle und Schwachstellen, aber auch allgemeine Erkenntnisse über das Entwerfen von PMem-basierten Datenstrukturen ermitteln. Zweitens schlagen wir zwei Speicherlayouts vor, die auf analytische Arbeitslasten abzielen und eine effiziente Abfrageausführung auf beliebigen Attributen ermöglichen. Während der erste Ansatz eine verknüpfte Liste von mehrdimensionalen gruppierten Blöcken verwendet, handelt es sich beim zweiten Ansatz um einen mehrdimensionalen Index, der Knoten im DRAM zwischenspeichert. Drittens zeigen wir unter Verwendung der bisherigen Datenstrukturen und Erkenntnisse, wie Datenstrom- und Ereignisverarbeitungssysteme mit transaktionaler Zustandsverwaltung verbessert werden können. Dabei schlagen wir ein neuartiges Transactional Stream Processing (TSP) Modell mit geeigneten Konsistenz- und Nebenläufigkeitsprotokollen vor, die an PMem angepasst sind. Zusammen sollen die diskutierten Aspekte eine Grundlage für die Entwicklung noch ausgereifterer PMem-fähiger Systeme bilden. Gleichzeitig zeigen sie, wie Datenverwaltungsaufgaben PMem ausnutzen können, indem sie neue Anwendungsgebiete erschließen, die Leistung, Skalierbarkeit und Wiederherstellungsgarantien verbessern, die Codekomplexität vereinfachen sowie die ökonomischen und ökologischen Kosten reduzieren.The increasing number of smart devices and sensors, but also social media are causing the volume of data and thus the demanded processing speed to grow steadily. At the same time, many applications need to store data persistently or even comply with strict transactional guarantees. The novel storage technology Persistent Memory (PMem), with its unique properties, seems to be a natural candidate to meet these requirements efficiently. Compared to DRAM, it is more scalable, less expensive, and durable. In contrast to disks, it is significantly faster and directly addressable. Therefore, this dissertation investigates the deliberate employment of PMem to fit the needs of modern applications. After presenting the fundamental work of and with PMem, we focus primarily on three aspects of data management. First, we disassemble several persistent data and index structures into their underlying design primitives to reveal the trade-offs for various access patterns. It allows us to identify their best use cases and vulnerabilities but also to gain general insights into the design of PMem-based data structures. Second, we propose two storage layouts that target analytical workloads and enable an efficient query execution on arbitrary attributes. While the first approach employs a linked list of multi-dimensional clustered blocks that potentially span several storage layers, the second approach is a multi-dimensional index that caches nodes in DRAM. Third, we show how to improve stream and event processing systems involving transactional state management using the preceding data structures and insights. In this context, we propose a novel Transactional Stream Processing (TSP) model with appropriate consistency and concurrency protocols adapted to PMem. Together, the discussed aspects are intended to provide a foundation for developing even more sophisticated PMemenabled systems. At the same time, they show how data management tasks can take advantage of PMem by opening up new application domains, improving performance, scalability, and recovery guarantees, simplifying code complexity, plus reducing economic and environmental costs

    Translating Data Between Xml Schema and 6Nf Conceptual Models

    Get PDF
    Sixth Normal Form (6NF) is a term used in relational database theory by Christopher Date to describe databases which decompose relational variables to irreducible elements. While this form may be unimportant for non-temporal data, it is certainly important for data containing temporal variables of a point-in-time or interval nature. With the advent of Data Warehousing 2.0 (DW 2.0), there is now an increased emphasis on using fully-temporalized databases in the context of data warehousing, in particular with approaches such as the Anchor Model and Data Vault. In this work, we will explore the concepts of temporal data, 6NF conceptual database models, and their relationship with DW 2.0. Further, we will evaluate the Anchor Model and Data Vault as design methods in which to capture temporal data. Using these methods, we will define a process for translating 6NF-compliant data into a standard eXtensible Markup Language (XML) Schema which can be used to describe and present such data to disparate systems in a structured format suitable for data exchange. Further, we will discuss benefits, advantages, and limitations to using XML representations of 6NF models for the transfer of data into a data warehouse

    Towards a big data reference architecture

    Get PDF

    Attribute-Level Versioning: A Relational Mechanism for Version Storage and Retrieval

    Get PDF
    Data analysts today have at their disposal a seemingly endless supply of data and repositories hence, datasets from which to draw. New datasets become available daily thus making the choice of which dataset to use difficult. Furthermore, traditional data analysis has been conducted using structured data repositories such as relational database management systems (RDBMS). These systems, by their nature and design, prohibit duplication for indexed collections forcing analysts to choose one value for each of the available attributes for an item in the collection. Often analysts discover two or more datasets with information about the same entity. When combining this data and transforming it into a form that is usable in an RDBMS, analysts are forced to deconflict the collisions and choose a single value for each duplicated attribute containing differing values. This deconfliction is the source of a considerable amount of guesswork and speculation on the part of the analyst in the absence of professional intuition. One must consider what is lost by discarding those alternative values. Are there relationships between the conflicting datasets that have meaning? Is each dataset presenting a different and valid view of the entity or are the alternate values erroneous? If so, which values are erroneous? Is there a historical significance of the variances? The analysis of modern datasets requires the use of specialized algorithms and storage and retrieval mechanisms to identify, deconflict, and assimilate variances of attributes for each entity encountered. These variances, or versions of attribute values, contribute meaning to the evolution and analysis of the entity and its relationship to other entities. A new, distinct storage and retrieval mechanism will enable analysts to efficiently store, analyze, and retrieve the attribute versions without unnecessary complexity or additional alterations of the original or derived dataset schemas. This paper presents technologies and innovations that assist data analysts in discovering meaning within their data and preserving all of the original data for every entity in the RDBMS
    corecore