39 research outputs found

    BlendDB : blending table layouts to support efficient browsing of relational databases

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 63-65).The physical implementation of most relational databases follows their logical description, where each relation is stored in its own file or collection of files on disk. Such an implementation is good for queries that filter or aggregate large portions of a single table, and provides reasonable performance for queries that join many records from one table to another. It is much less ideal, however, for join queries that follow paths from a small number of tuples in one table to small collections of tuples in other tables to accumulate facts about a related collection of objects (e.g., co-authors of a particular author in a publications database), since answering such queries involves one or more random I/Os per table involved in the path. If the primary workload of a database consists of many such path queries, as is likely to be the case when supporting browsing-oriented applications, performance will be quite poor. This thesis focuses on optimizing the performance of these kinds of path queries in a system called BlendDB, a relational database that supports on-disk co-location of tuples from different relations. To make BlendDB efficient, the thesis will propose a clustering algorithm that, given knowledge of the database workload, co-locates the tuples of multiple relations if they join along common paths. To support the claim of improved performance, the thesis will include experiments in which BlendDB provides better performance than traditional relational databases on queries against the IMDB movie dataset. Additionally, this thesis will show that BlendDB provides commensurate performance to materialized views while using less disk space, and can achieve better performance than materialized views in exchange for more disk space when users navigate between related items in the database.by Adam Marcus.S.M

    Transactional Client-Server Cache Consistency: Alternatives and Performance

    Get PDF
    Client-server database systems based on a page server model can exploit client memory resources by caching copies of pages across transaction boundaries. Caching reduces the need to obtain data from servers or other sites on the network. In order to ensure that such caching does not result in the violation of transaction semantics, a cache consistency maintenance algorithm is required. Many such algorithms have been proposed in the literature and, as all provide the same functionality, performance is a primary concern in choosing among them. In this paper we provide a taxonomy that describes the design space for transactional cache consistency maintenance algorithms and show how proposed algorithms relate to one another. We then investigate the performance of six of these algorithms, and use these results to examine the tradeoffs inherent in the design choices identified in the taxonomy. The insight gained in this manner is then used to reflect upon the characteristics of other algorithms that have been proposed. The results show that the interactions among dimensions of the design space can impact performance in many ways, and that classifications of algorithms as simply Pessimistic" or Optimistic" do not accurately characterize the similarities and differences among the many possible cache consistency algorithms. (Also cross-referenced as UMIACS-TR-95-84

    Adaptive Caching of Distributed Components

    Get PDF
    Die Zugriffslokalität referenzierter Daten ist eine wichtige Eigenschaft verteilter Anwendungen. Lokales Zwischenspeichern abgefragter entfernter Daten (Caching) wird vielfach bei der Entwicklung solcher Anwendungen eingesetzt, um diese Eigenschaft auszunutzen. Anschliessende Zugriffe auf diese Daten können so beschleunigt werden, indem sie aus dem lokalen Zwischenspeicher bedient werden. Gegenwärtige Middleware-Architekturen bieten dem Anwendungsprogrammierer jedoch kaum Unterstützung für diesen nicht-funktionalen Aspekt. Die vorliegende Arbeit versucht deshalb, Caching als separaten, konfigurierbaren Middleware-Dienst auszulagern. Durch die Einbindung in den Softwareentwicklungsprozess wird die frühzeitige Modellierung und spätere Wiederverwendung caching-spezifischer Metadaten gewährleistet. Zur Laufzeit kann sich das entwickelte System außerdem bezüglich der Cachebarkeit von Daten adaptiv an geändertes Nutzungsverhalten anpassen.Locality of reference is an important property of distributed applications. Caching is typically employed during the development of such applications to exploit this property by locally storing queried data: Subsequent accesses can be accelerated by serving their results immediately form the local store. Current middleware architectures however hardly support this non-functional aspect. The thesis at hand thus tries outsource caching as a separate, configurable middleware service. Integration into the software development lifecycle provides for early capturing, modeling, and later reuse of cachingrelated metadata. At runtime, the implemented system can adapt to caching access characteristics with respect to data cacheability properties, thus healing misconfigurations and optimizing itself to an appropriate configuration. Speculative prefetching of data probably queried in the immediate future complements the presented approach

    Client-based Logging: A New Paradigm of Distributed Transaction Management

    Full text link
    The proliferation of inexpensive workstations and networks has created a new era in distributed computing. At the same time, non-traditional applications such as computer-aided design (CAD), computer-aided software engineering (CASE), geographic-information systems (GIS), and office-information systems (OIS) have placed increased demands for high-performance transaction processing on database systems. The combination of these factors gives rise to significant challenges in the design of modern database systems. In this thesis, we propose novel techniques whose aim is to improve the performance and scalability of these new database systems. These techniques exploit client resources through client-based transaction management. Client-based transaction management is realized by providing logging facilities locally even when data is shared in a global environment. This thesis presents several recovery algorithms which utilize client disks for storing recovery related information (i.e., log records). Our algorithms work with both coarse and fine-granularity locking and they do not require the merging of client logs at any time. Moreover, our algorithms support fine-granularity locking with multiple clients permitted to concurrently update different portions of the same database page. The database state is recovered correctly when there is a complex crash as well as when the updates performed by different clients on a page are not present on the disk version of the page, even though some of the updating transactions have committed. This thesis also presents the implementation of the proposed algorithms in a memory-mapped storage manager as well as a detailed performance study of these algorithms using the OO1 database benchmark. The performance results show that client-based logging is superior to traditional server-based logging. This is because client-based logging is an effective way to reduce dependencies on server CPU and disk resources and, thus, prevents the server from becoming a performance bottleneck as quickly when the number of clients accessing the database increases

    Views and concerns and interrelationships : Lessons learned from developing the multi-View software engineering environment PIROL

    Get PDF
    Software-Entwicklungsumgebungen sind komplexe Systeme mit besonderen Anforderungen an Modularität und Anpaßbarkeit. Diese Arbeit beschreibt die Entwicklung der Umgebung PIROL. Die Beschreibung ist dabei in eine Abfolge der folgenden 12 Themen gegliedert. (1) Metamodellierung ist das Grundkonzept, nach dem PIROL seine Daten gemäß einem objektorientierten Datenmodell zerlegt, so daß beliebige Werkzeuge auch auf die Daten anderer Werkzeuge auf sinnvolle Art und Weise zuzugreifen können. (2) Das Metamodell wird zur persistenten Speicherung der Daten auf Konzepte des Repositories H-PCTE abgebildet. (3) Die Granularität eines Metamodells ist für die Effektivität und Effizienz des Gesamtsystems entscheidend. PIROL unterstützt hybride Modellierung als Kompromiß beider Extreme. (4) Durch Methoden des Metamodells wird Verhaltensmodellierung für verschiedenste Aufgaben unterstützt. (5) Ausnahmebehandlung wird systematisch unterstützt. (6) Verschiedene Mechanismen zur Wahrung der Datenintegrität sind enthalten. (7) Das System wurde nach einer Client-Server Architektur entwickelt, deren zentrale Komponente eine "Workbench" ist, die die Repository-Sprache Lua/P ausführt. (8) Steuerungsintegration erlaubt durch verteilte Steuerflüsse das enge Zusammenspiel lose gekoppelter Komponenten. (9) Die koordinierte Zusammenarbeit mehrerer Benutzer wird unterstützt. (10) Die logische Unabhängigkeit von Werkzeugen wird durch das neue Konzept der Dynamic View Connectors erreicht. (11) Allgemeine Dienste sind in der Umgebung einheitlich verfügbar. (12) Das System unterstützt die Weiterentwicklung. All diese Themengebiete sind sehr eng miteinander verzahnt und die Darstellung ist zu großem Teil der gegenseitigen Beeinflussung gewidmet. Es wird gezeigt, wie eine Großzahl der Entwurfsentscheidungen genau aus diesen Beeinflussungen motiviert sind. Die Beschreibung folgt damit dem Konzept der "Concern Interaction Matrix", das hier zur Bewältigung von Komplexität vorgeschlagen wird. Dabei werden Charakteristika einzelner Anliegen und einzelner Zusammenhänge herausgearbeitet. Die Beschreibung PIROLs wird durch die Liste der integrierten Werkzeuge, Ansätze von Laufzeit-Messungen und einige Betrachtungen zur Beurteilung abgerundet. Abschließend werden verschiedene Konzepte rund um den Begriff "Sichten" erörtert. Sichten sind ein zentrale Anliegen von PIROL. Außerdem generalisiert die Diskussion über die mehrdimensionale Darstellung des Hauptteiles. Es werden Begrifflichkeit, Konzepte und Techniken für Sichten in der Softwaretechnik vorgestellt und diskutiert. Dabei wird die Brücke geschlagen von Sichten in objektorientierten Datenbanken, über aspekt-orientierte Softwareentwicklung bis hin zum allgemeinen "Concern Modeling", zu dem die o.g. Methode einen Beitrag leisten soll. Sichten werden dabei als ein zentrales Konzept der Softwaretechnik neben Abstraktion und Zerlegung beurteilt. Dynamic View Connectors sind ein wesentlicher Beitrag von PIROL, durch den Datenbanksichten und aspektorientierte Programmierung zusammengeführt werden. Zwar ist der Sichten-Begriff längst nicht so scharf definiert, wie die Begriffe Abstraktion und Zerlegung, aber gerade die Überlappungen und Diskrepanzen, die durch Sichten abgebildet werden können, machen dies Konzept zu einem starken Strukturierungsprinzip, das zwar einigen Aufwand zur Behandlung von Inkonsistenzen erfordert, aber andererseits hilft, komplexe Systeme handhabbar und wartbar zu gestalten.Software engineering environments are complex systems with special requirements regarding modularity and adaptability. This thesis describes the development of the environment PIROL. The description is structured as a sequence of the following 12 concerns: (1) Meta modeling is the basic concept by which PIROL decomposes its data in accordance to an object-oriented data model. This allows arbitrary tools to access data of other tools in a meaningful way. (2) For persistent storage the model is mapped to the concepts of the repository H-PCTE. (3) The granularity of a meta model determines effectiveness and efficiency of the system. PIROL supports hybrid modeling as a compromise between extremes. (4) By methods of the meta model behavior modeling is supported for a wide range of tasks. (5) Exception handling is supported systematically. (6) Several mechanisms for preserving data integrity are integrated. (7) The system follows a client-server architecture. As its, central component the "workbench" executes the repository language LuaP. (8) Control integration allows for close cooperation of loosely coupled components by means of distributed control flows. (9) The coordinated cooperation of multiple users is supported. (10) Logical independence of tools is achieved by the novel concept of Dynamic View Connectors. (11) Common services are available throughout the environment in a uniform way. (12) The system is prepared for evolution. All these concerns are tightly interlocked. A considerable share of the presentation is dedicated to such mutual interactions. Is is shown, how a large number of design decisions is motivated exactly by these interactions. The description follows the concept of a "Concern Interaction Matrix" which is proposed for managing complexity. Characteristics of concerns and their interactions are elaborated. The description of PIROL is completed by a list of integrated tools, initial performance measurements and evaluation. Finally, several concepts relating to the notion of "views" are discussed. Views are a central concern of PIROL. Furthermore, the discussion generalizes over the multi-dimensional presentation in the body of this thesis. Notions, concepts and techniques for views in software engineering are presented and discussed. This discussion connects views in object-oriented databases, aspect-oriented software development and general "concern modeling", to which the method of "Concern Interaction Matrices" contributes. Views are regarded as a central concept of software engineering at the same level as abstraction and decomposition. Dynamic View Connectors are a significant contribution of PIROL that combines database views and aspect-oriented programming. The notion of "views" is defined with far less precision than abstraction and decomposition, but indeed by the overlap and mismatches, which can be captured by views, this concept is a strong principle for structuring software and information. Effort is needed for handling inconsistencies as they may arise, but after all, views are a suitable means for managing the complexity of systems and for designing these systems for evolution

    IDEAS-1997-2021-Final-Programs

    Get PDF
    This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)

    Atomic Transfer for Distributed Systems

    Get PDF
    Building applications and information systems increasingly means dealing with concurrency and faults stemming from distribution of system components. Atomic transactions are a well-known method for transferring the responsibility for handling concurrency and faults from developers to the software\u27s execution environment, but incur considerable execution overhead. This dissertation investigates methods that shift some of the burden of concurrency control into the network layer, to reduce response times and increase throughput. It anticipates future programmable network devices, enabling customized high-performance network protocols. We propose Atomic Transfer (AT), a distributed algorithm to prevent race conditions due to messages crossing on a path of network switches. Switches check request messages for conflicts with response messages traveling in the opposite direction. Conflicting requests are dropped, obviating the request\u27s receiving host from detecting and handling the conflict. AT is designed to perform well under high data contention, as concurrency control effort is balanced across a network instead of being handled by the contended endpoint hosts themselves. We use AT as the basis for a new optimistic transactional cache consistency algorithm, supporting execution of atomic applications caching shared data. We then present a scalable refinement, allowing hierarchical consistent caches with predictable performance despite high data update rates. We give detailed I/O Automata models of our algorithms along with correctness proofs. We begin with a simplified model, assuming static network paths and no message loss, and then refine it to support dynamic network paths and safe handling of message loss. We present a trie-based data structure for accelerating conflict-checking on switches, with benchmarks suggesting the feasibility of our approach from a performance stand-point
    corecore