18 research outputs found

    Transactional and analytical data management on persistent memory

    Get PDF
    Die zunehmende Anzahl von Smart-Geräten und Sensoren, aber auch die sozialen Medien lassen das Datenvolumen und damit die geforderte Verarbeitungsgeschwindigkeit stetig wachsen. Gleichzeitig müssen viele Anwendungen Daten persistent speichern oder sogar strenge Transaktionsgarantien einhalten. Die neuartige Speichertechnologie Persistent Memory (PMem) mit ihren einzigartigen Eigenschaften scheint ein natürlicher Anwärter zu sein, um diesen Anforderungen effizient nachzukommen. Sie ist im Vergleich zu DRAM skalierbarer, günstiger und dauerhaft. Im Gegensatz zu Disks ist sie deutlich schneller und direkt adressierbar. Daher wird in dieser Dissertation der gezielte Einsatz von PMem untersucht, um den Anforderungen moderner Anwendung gerecht zu werden. Nach der Darlegung der grundlegenden Arbeitsweise von und mit PMem, konzentrieren wir uns primär auf drei Aspekte der Datenverwaltung. Zunächst zerlegen wir mehrere persistente Daten- und Indexstrukturen in ihre zugrundeliegenden Entwurfsprimitive, um Abwägungen für verschiedene Zugriffsmuster aufzuzeigen. So können wir ihre besten Anwendungsfälle und Schwachstellen, aber auch allgemeine Erkenntnisse über das Entwerfen von PMem-basierten Datenstrukturen ermitteln. Zweitens schlagen wir zwei Speicherlayouts vor, die auf analytische Arbeitslasten abzielen und eine effiziente Abfrageausführung auf beliebigen Attributen ermöglichen. Während der erste Ansatz eine verknüpfte Liste von mehrdimensionalen gruppierten Blöcken verwendet, handelt es sich beim zweiten Ansatz um einen mehrdimensionalen Index, der Knoten im DRAM zwischenspeichert. Drittens zeigen wir unter Verwendung der bisherigen Datenstrukturen und Erkenntnisse, wie Datenstrom- und Ereignisverarbeitungssysteme mit transaktionaler Zustandsverwaltung verbessert werden können. Dabei schlagen wir ein neuartiges Transactional Stream Processing (TSP) Modell mit geeigneten Konsistenz- und Nebenläufigkeitsprotokollen vor, die an PMem angepasst sind. Zusammen sollen die diskutierten Aspekte eine Grundlage für die Entwicklung noch ausgereifterer PMem-fähiger Systeme bilden. Gleichzeitig zeigen sie, wie Datenverwaltungsaufgaben PMem ausnutzen können, indem sie neue Anwendungsgebiete erschließen, die Leistung, Skalierbarkeit und Wiederherstellungsgarantien verbessern, die Codekomplexität vereinfachen sowie die ökonomischen und ökologischen Kosten reduzieren.The increasing number of smart devices and sensors, but also social media are causing the volume of data and thus the demanded processing speed to grow steadily. At the same time, many applications need to store data persistently or even comply with strict transactional guarantees. The novel storage technology Persistent Memory (PMem), with its unique properties, seems to be a natural candidate to meet these requirements efficiently. Compared to DRAM, it is more scalable, less expensive, and durable. In contrast to disks, it is significantly faster and directly addressable. Therefore, this dissertation investigates the deliberate employment of PMem to fit the needs of modern applications. After presenting the fundamental work of and with PMem, we focus primarily on three aspects of data management. First, we disassemble several persistent data and index structures into their underlying design primitives to reveal the trade-offs for various access patterns. It allows us to identify their best use cases and vulnerabilities but also to gain general insights into the design of PMem-based data structures. Second, we propose two storage layouts that target analytical workloads and enable an efficient query execution on arbitrary attributes. While the first approach employs a linked list of multi-dimensional clustered blocks that potentially span several storage layers, the second approach is a multi-dimensional index that caches nodes in DRAM. Third, we show how to improve stream and event processing systems involving transactional state management using the preceding data structures and insights. In this context, we propose a novel Transactional Stream Processing (TSP) model with appropriate consistency and concurrency protocols adapted to PMem. Together, the discussed aspects are intended to provide a foundation for developing even more sophisticated PMemenabled systems. At the same time, they show how data management tasks can take advantage of PMem by opening up new application domains, improving performance, scalability, and recovery guarantees, simplifying code complexity, plus reducing economic and environmental costs

    Architectural Principles for Database Systems on Storage-Class Memory

    Get PDF
    Database systems have long been optimized to hide the higher latency of storage media, yielding complex persistence mechanisms. With the advent of large DRAM capacities, it became possible to keep a full copy of the data in DRAM. Systems that leverage this possibility, such as main-memory databases, keep two copies of the data in two different formats: one in main memory and the other one in storage. The two copies are kept synchronized using snapshotting and logging. This main-memory-centric architecture yields nearly two orders of magnitude faster analytical processing than traditional, disk-centric ones. The rise of Big Data emphasized the importance of such systems with an ever-increasing need for more main memory. However, DRAM is hitting its scalability limits: It is intrinsically hard to further increase its density. Storage-Class Memory (SCM) is a group of novel memory technologies that promise to alleviate DRAM’s scalability limits. They combine the non-volatility, density, and economic characteristics of storage media with the byte-addressability and a latency close to that of DRAM. Therefore, SCM can serve as persistent main memory, thereby bridging the gap between main memory and storage. In this dissertation, we explore the impact of SCM as persistent main memory on database systems. Assuming a hybrid SCM-DRAM hardware architecture, we propose a novel software architecture for database systems that places primary data in SCM and directly operates on it, eliminating the need for explicit IO. This architecture yields many benefits: First, it obviates the need to reload data from storage to main memory during recovery, as data is discovered and accessed directly in SCM. Second, it allows replacing the traditional logging infrastructure by fine-grained, cheap micro-logging at data-structure level. Third, secondary data can be stored in DRAM and reconstructed during recovery. Fourth, system runtime information can be stored in SCM to improve recovery time. Finally, the system may retain and continue in-flight transactions in case of system failures. However, SCM is no panacea as it raises unprecedented programming challenges. Given its byte-addressability and low latency, processors can access, read, modify, and persist data in SCM using load/store instructions at a CPU cache line granularity. The path from CPU registers to SCM is long and mostly volatile, including store buffers and CPU caches, leaving the programmer with little control over when data is persisted. Therefore, there is a need to enforce the order and durability of SCM writes using persistence primitives, such as cache line flushing instructions. This in turn creates new failure scenarios, such as missing or misplaced persistence primitives. We devise several building blocks to overcome these challenges. First, we identify the programming challenges of SCM and present a sound programming model that solves them. Then, we tackle memory management, as the first required building block to build a database system, by designing a highly scalable SCM allocator, named PAllocator, that fulfills the versatile needs of database systems. Thereafter, we propose the FPTree, a highly scalable hybrid SCM-DRAM persistent B+-Tree that bridges the gap between the performance of transient and persistent B+-Trees. Using these building blocks, we realize our envisioned database architecture in SOFORT, a hybrid SCM-DRAM columnar transactional engine. We propose an SCM-optimized MVCC scheme that eliminates write-ahead logging from the critical path of transactions. Since SCM -resident data is near-instantly available upon recovery, the new recovery bottleneck is rebuilding DRAM-based data. To alleviate this bottleneck, we propose a novel recovery technique that achieves nearly instant responsiveness of the database by accepting queries right after recovering SCM -based data, while rebuilding DRAM -based data in the background. Additionally, SCM brings new failure scenarios that existing testing tools cannot detect. Hence, we propose an online testing framework that is able to automatically simulate power failures and detect missing or misplaced persistence primitives. Finally, our proposed building blocks can serve to build more complex systems, paving the way for future database systems on SCM

    Aikajanojen analysointiohjelmiston toteutus tietoturvapoikkeamien tutkintaan

    Get PDF
    Organizations today are trying to manage the many risks they percieve to be threatening the security of their valuable information assets, but often these risks realize into security incidents. Managing risks proactively is important, but equally important and challenging is to efficiently respond to the incidents that have already occurred, to minimize their impact on business processes. A part of managing security incidents is the technical analysis of any related computer systems, also known as digital forensic investigations. As a result of collecting evidence such as log files from these systems, the analysts end up with large amounts of data, which can form a timeline of events. These events describe different actions performed on the system in question. Analysing the timelines to find any events of interest is challenging due to the vast amount of data available on modern systems. The goal of this thesis is to create a software program to support the analysis of very large timelines as a part of digital forensic investigations. As a result, we have implemented software with an efficient query interface, which supports iterative exploration of the data and more complex analytical queries. Furthermore, we use a timeline visualization to compactly represent different properties of the data, which enables analysts to detect potential anomalies in an efficient way. This software also serves as a platform for future work, to experiment with more automated analysis techniques. We evaluated the software in a case study, in which it proved to show a great level of flexibility and performance compared to more traditional ways of working.Tärkeä osa nykypäivän organisaatioiden riskienhallintaa on tietopääoman turvaamiseen liittyvien riskien tunnistaminen. Näitä riskejä ei kuitenkaan usein oteta tarpeeksi vakavasti, sillä monesti ne myös realisoituvat tietoturvapoikkeamina. Kattava etukäteisvalmistautuminen on tärkeää, mutta poikkeamien vaikutusten minimoimisen kannalta oleellista on myös valmius tehokkaaseen poikkeamatilanteiden hallintaan. Osana tietoturvapoikkeamien hallintaa toteutetaan siihen liittyvien järjestelmien tekninen analyysi. Todistusaineiston, kuten erilaisten lokitiedostojen, keruun tuloksena tutkijat muodostavat aikajanan järjestelmässä suoritetuista toiminnoista. Koska modernien järjestelmien sisältämä tiedon määrä on poikkeuksetta suuri, on aikajanan analysointi mielenkiintoisten jälkien löytämiseksi erityisen haastavaa. Tämän diplomityön tavoitteena onkin luoda ohjelmisto tukemaan kooltaan erityisen suurten aikajanojen analysointia. Työn tuloksena luotiin ohjelmisto, joka tarjoaa tehokkaan kyselyrajapinnan, tukee tutkimukselle tyypillistä iteratiivista tiedon etsintää ja monimutkaisempia analyyttisia kyselyitä. Lisaksi ohjelmisto mahdollistaa monipuolisen aikajanan visualisoimisen, mikä helpottaa huomattavasti käytöspoikkeamien löytämistä. Tavoitteena oli myös tuottaa alusta, jota voidaan käyttää jatkossa uusien automaattisten analyysitekniikoiden kehittämisessä. Ohjelmiston toimivuus todennettiin tapaustutkimuksessa, joka osoitti ohjelmiston olevan erityisen joustava ja suorituskykyinen verrattuna aikaisempiin toimintatapoihin

    A distributed rule-based expert system for large event stream processing

    Get PDF
    Rule-based expert systems (RBSs) provide an efficient solution to many problems that involve event stream processing. With today’s needs to process larger streams, many approaches have been proposed to distribute the rule engines behind RBSs. However, there are some issues which limit the potential of distributed RBSs in the current big data era, such as the load imbalance due to their distribution methods, and low parallelism originated from the continuous operator model. To address these issues, we propose a new architecture for distributing rule engines. This architecture adopts the dynamic job assignment and the micro-batching strategies, which have recently arisen in the big data community, to remove the load imbalance and increase parallelism of distributed rule engines. An automated transformation framework based on Model-driven Architecture (MDA) is presented, which can be used to transform the current rule engines to work on the proposed architecture. This work is validated by a 2-step verification. In addition, we propose a generic benchmark for evaluating the performance of distributed rule engines. The performance of the proposed architecture is discussed and directions for future research are suggested. The contribution of this study can be viewed from two different angles: for the rule-based system community, this thesis documents an improvement to the rule engines by fully adopting big data technologies; for the big data community, it is an early proposal to process large event streams using a well crafted rule-based system. Our results show the proposed approach can benefit both research communities
    corecore