12 research outputs found

    B-tree indexes for high update rates

    Get PDF
    In some applications, data capture dominates query processing. For example, monitoring moving objects often requires more insertions and updates than queries. Data gathering using automated sensors often exhibits this imbalance. More generally, indexing streams apparently is considered an unsolved problem. For those applications, B-tree indexes are reasonable choices if some trade-off decisions are tilted towards optimization of updates rather than of queries. This paper surveys techniques that let B-trees sustain very high update rates, up to multiple orders of magnitude higher than tradi-tional B-trees, at the expense of query processing performance. Perhaps not surprisingly, some of these techniques are reminiscent of those employed during index creation, index rebuild, etc., while others are derived from other well known technologies such as differential files and log-structured file systems

    Template B+ trees: an index scheme for fast data streams with distributed append-only stores

    Get PDF
    Distributed systems are now commonly used to manage massive data flooding from the physical world, such as user-generated content from online social media and communication records from mobile phones. The new generation of distributed data management systems, such as HBase, Cassandra and Riak, are designed to perform queries and tuple insertions only. Other database operations such as deletions and updates are simulated by appending the keys associated with the target tuples to operation logs. Such an append-only store architecture maximizes the processing throughput on incoming data, but potentially incurs higher costs during query processing, because additional computation is needed to generate consistent snapshots of the database. Indexing is the key to enable efficient query processing by fast data retrieval and aggregation under such a system architecture. This thesis presents a new in-memory indexing scheme for distributed append-only stores. Our new scheme utilizes traditional index structures based on B+ trees and their variants to create an efficient in-memory template-based tree without the overhead of expensive node splits. We also propose the use of optimized domain partitioning and multi-thread insertion techniques to exploit the advantages of the template B+ tree structure. Our empirical evaluations show that insertion throughput is five times higher with template B+ trees than with HBase, on a variety of real and synthetic workloads

    Flash Aware Database Management System

    Get PDF
    Flash Memory is valued in many application as a storage media due to its fast access speed, low power, nonvolatile characteristics.Our survey report will contain characteristics of ash disk, architechture of ash disk, various indexing structure for magnetic disk and ash disk, storage techniques for hard disk and magnetic disk and query processing techniques for ash disk. we have explored detail survey of storage, indexing and query processing techniques developed to make database system ash aware. We have implemented most of the techniques in a database system prototype named Mubase developed at IITH. We present some experimental results on TPC-H dataset demonstrating the benets due to the ash aware storage query processing techniques. We have implemented FD - Tree index structure for ash disk on prototyped named Mubase

    Insert-aware partitioning and indexing techniques for skewed database workloads

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 65-68).Many data-intensive websites are characterized by a dataset that grows much faster than the rate that users access the data and possibly high insertion rates. In such systems, the growing size of the dataset leads to a larger overhead for maintaining and accessing indexes even while the query workload becomes increasingly skewed. Additionally, the database index update costs can be a non-trivial proportion of the overall system cost. Shinobi introduces a cost model that takes index update costs account, and proposes database design algorithms that optimally partition tables and drop indexes from partitions that are not queried often, and that maintain these partitions as workloads change. We show a 60x performance improvement over traditionally indexed tables using a real-world query workload derived from a traffic monitoring application and over 8x improvement for a Wikipedia workload.by Eugene Wu.S.M

    Redesign of database algorithms for next generation non-volatile memory technology

    Get PDF
    Master'sMASTER OF SCIENC

    Tietuekimppujen indeksointi flash-muistilla

    Get PDF
    In database applications, bulk operations which affect multiple records at once are common. They are performed when operations on single records at a time are not efficient enough. They can occur in several ways, both by applications naturally having bulk operations (such as a sales database which updates daily) and by applications performing them routinely as part of some other operation. While bulk operations have been studied for decades, their use with flash memory has been studied less. Flash memory, an increasingly popular alternative/complement to magnetic hard disks, has far better seek times, low power consumption and other desirable characteristics for database applications. However, erasing data is a costly operation, which means that designing index structures specifically for flash disks is useful. This thesis will investigate flash memory on data structures in general, identifying some common design traits, and incorporate those traits into a novel index structure, the bulk index. The bulk index is an index structure for bulk operations on flash memory, and was experimentally compared to a flash-based index structure that has shown impressive results, the Lazy Adaptive Tree (LA-tree for short). The bulk insertion experiments were made with varying-sized elementary bulks, i.e. maximal sets of inserted keys that fall between two consecutive keys in the existing data. The bulk index consistently performed better than the LA-tree, and especially well on bulk insertion experiments with many very small or a few very large elementary bulks, or with large inserted bulks. It was more than 4 times as fast at best. On range searches, it performed up to 50 % faster than the LA-tree, performing better on large ranges. Range deletions were also shown to be constant-time on the bulk index.Tietokantasovelluksissa kimppuoperaatiot jotka vaikuttavat useampaan alkioon kerralla ovat yleisiä, ja niitä käytetään tehostamaan tietokannan toimintaa. Niitä voi käyttää kun data lisätään tietokantaan suuressa erässä (esimerkiksi myyntidata jota päivitetään kerran päivässä)tai osana muita tietokantaoperaatioita. Kimppuoperaatioita on tutkittu jo vuosikymmeniä, mutta niiden käyttöä flash-muistilla on tutkittu vähemmän. Flash-muisti on yleistyvä muistiteknologiajota käytetään magneettisten kiintolevyjen sijaan tai niiden rinnalla. Sen tietokannoille hyödyllisiin ominaisuuksiin kuuluvat mm. nopeat hakuajat ja alhainen sähkönkulutus. Kuitenkin datan poisto levyltä on työläs operaatio flash-levyillä, mistä johtuen tietorakenteet kannattaa suunnitella erikseen flash-levyille. Tämä työ tutkii flashin käyttöä tietorakenteissa ja koostaa niistä flashille soveltuvia suunnitteluperiaatteita. Näitä periaatteita edustaa myös työssä esitetty uusi rakenne, kimppuhakemisto (bulk index). Kimppuhakemisto on tietorakenne kimppuoperaatioille flash-muistilla, ja sitä verrataan kokeellisesti LA-puuhun (Lazy Adaptive Tree, suom. laiska adaptiivinen puu), joka on suoriutunut hyvin kokeissa flash-muistilla. Kokeissa käytettiin vaihtelevan kokoisia alkeiskimppuja, eli maksimaalisia joukkoja lisätyssä datassa jotka sijoittuvat kahden olemassaolevan avaimen väliin. Kimppuhakemisto oli nopeampi kuin LA-puu, ja erityisen paljon nopeampi kimppulisäyksissä pienellä määrällä hyvin suuria tai suurella määrällä hyvin pieniä alkeiskimppuja, tai suurilla kimppulisäyksillä. Parhaimmillaan se oli yli neljä kertaa nopeampi. Välihauissa se oli jopa 50 % nopeampi kuin LA-puu, ja parempi suurten välien kanssa. Välipoistot näytettiin vakioaikaisiksi kimppuhakemistossa

    Robust Stream Indexing

    Get PDF
    Kontinuierliche Datenströme stehen im Zentrum von vielen anspruchsvollen und komplexen Anwendungen. Neben der Online-Verarbeitung durch ein Datenstromsystem müssen Datenströme auch langfristig in einer Datenbank gespeichert werden. Moderne Hardware kann Datenströme meist mit sehr hohem Durchsatz und geringer Latenz persistieren. Allerdings müssen Teile des Datenstroms auch effizient abgerufen werden können, um Wissen aus den Daten zu extrahieren. Jahrzehntelange Forschung hat zu einer unglaublichen Vielfalt an Indexstrukturen geführt, die für viele spezifische Anwendungen Anfragekosten reduzieren können. Obwohl die Effizienz von Datenstrom-Indexstrukturen erheblich verbessert wurde, ist die Steigerung ihrer Robustheit nach wie vor eine große Herausforderung, da das kontinuierliche Eintreffen von Daten eine ständige Wartung von Indexstrukturen zur Folge hat. Diese Wartung verbraucht Ressourcen, was zu einer geringeren oder schwankenden Leistung von regulären Einfüge- und Anfrageoperationen führt. Eine Steigerung der Robustheit kann die Betriebskosten erheblich senken und die Benutzbarkeit verbessern. Das Hauptziel dieser Arbeit ist daher, die Robustheit von Datenstrom-Indexierung zu verbessern. B-Bäume sind gut erforschte und weit verbreitete Indexstrukturen. Da sie ein zentraler Bestandteil vieler Datenbanksysteme sind, hat die Verbesserung der Robustheit von B-Bäumen eine weitreichende Wirkung. Wenn kontinuierlich neue Daten in B-Bäume eingefügt werden, kommt es zur Aufspaltung von Knoten. Für einen durch Bulk-Loading neu erstellten B-Baum treten diese Aufspaltung in Wellen auf, welche sich auf Einfügeoperation und Anfragen auswirken. In dieser Arbeit wird gezeigt, dass durch Anpassungen an Bulk-Loading-Algorithmen diese Wellen reduziert oder eliminiert werden können. Auf Datenströme optimierte Indexstrukturen, wie Log-Structured Merge-Trees, vermeiden Wellen von Knotenaufspaltungen, die in B-Bäumen auftreten. Da diese Indexstrukturen jedoch aus mehreren Komponenten bestehen, müssen die Komponenten durch eine Merge-Operation zusammengeführt werden, um Anfragekosten gering zu halten. Dies führt zu periodisch auftretender Reorganisationsaktivität. Als Alternative wird in dieser Arbeit Continuous Merging vorgestellt. Die Hauptidee ist ein kontinuierlicher Mergesort-Algorithmus, der zu einer robusteren Leistung von Datenstrom-Indexierung führt. Datenstrom-Indexstrukturen sind oft Teil eines komplexeren Datenbanksystems. ChronicleDB ist ein Ereignisdatenbanksystem, welches für das Schreiben von zeitlichen Datenströmen optimiert ist. Die Verbesserungen an B-Bäumen und Continuous Merging werden mit dem Gesamtdesign von ChronicleDB in Verbindung gebracht. Darüber hinaus werden in dieser Arbeit allgemeine Verbesserungen an ChronicleDB vorgenommen, welche die Besonderheiten von zeitlichen Daten ausnutzen. Die Ergebnisse führen zu einem robusteren Ereignisdatenbanksystem
    corecore