    Asymptotically optimal declustering schemes for 2-dim range queries

    AbstractDeclustering techniques have been widely adopted in parallel storage systems (e.g. disk arrays) to speed up bulk retrieval of multidimensional data. A declustering scheme distributes data items among multiple disks, thus enabling parallel data access and reducing query response time. We measure the performance of any declustering scheme as its worst case additive deviation from the ideal scheme. The goal thus is to design declustering schemes with as small an additive error as possible. We describe a number of declustering schemes with additive error O(logM) for 2-dimensional range queries, where M is the number of disks. These are the first results giving O(logM) upper bound for all values of M. Our second result is a lower bound on the additive error. It is known that except for a few stringent cases, additive error of any 2-dimensional declustering scheme is at least one. We strengthen this lower bound to Ω((logM)(d−1/2)) for d-dimensional schemes and to Ω(logM) for 2-dimensional schemes, thus proving that the 2-dimensional schemes described in this paper are (asymptotically) optimal. These results are obtained by establishing a connection to geometric discrepancy. We also present simulation results to evaluate the performance of these schemes in practice

    Efficient Processing of Range Queries in Main Memory

    Datenbanksysteme verwenden Indexstrukturen, um Suchanfragen zu beschleunigen. Im Laufe der letzten Jahre haben Forscher verschiedene Ansätze zur Indexierung von Datenbanktabellen im Hauptspeicher entworfen. Hauptspeicherindexstrukturen versuchen möglichst häufig Daten zu verwenden, die bereits im Zwischenspeicher der CPU vorrätig sind, anstatt, wie bei traditionellen Datenbanksystemen, die Zugriffe auf den externen Speicher zu optimieren. Die meisten vorgeschlagenen Indexstrukturen für den Hauptspeicher beschränken sich jedoch auf Punktabfragen und vernachlässigen die ebenso wichtigen Bereichsabfragen, die in zahlreichen Anwendungen, wie in der Analyse von Genomdaten, Sensornetzwerken, oder analytischen Datenbanksystemen, zum Einsatz kommen. Diese Dissertation verfolgt als Hauptziel die Fähigkeiten von modernen Hauptspeicherdatenbanksystemen im Ausführen von Bereichsabfragen zu verbessern. Dazu schlagen wir zunächst die Cache-Sensitive Skip List, eine neue aktualisierbare Hauptspeicherindexstruktur, vor, die für die Zwischenspeicher moderner Prozessoren optimiert ist und das Ausführen von Bereichsabfragen auf einzelnen Datenbankspalten ermöglicht. Im zweiten Abschnitt analysieren wir die Performanz von multidimensionalen Bereichsabfragen auf modernen Serverarchitekturen, bei denen Daten im Hauptspeicher hinterlegt sind und Prozessoren über SIMD-Instruktionen und Multithreading verfügen. Um die Relevanz unserer Experimente für praktische Anwendungen zu erhöhen, schlagen wir zudem einen realistischen Benchmark für multidimensionale Bereichsabfragen vor, der auf echten Genomdaten ausgeführt wird. Im letzten Abschnitt der Dissertation präsentieren wir den BB-Tree als neue, hochperformante und speichereffziente Hauptspeicherindexstruktur. Der BB-Tree ermöglicht das Ausführen von multidimensionalen Bereichs- und Punktabfragen und verfügt über einen parallelen Suchoperator, der mehrere Threads verwenden kann, um die Performanz von Suchanfragen zu erhöhen.Database systems employ index structures as means to accelerate search queries. Over the last years, the research community has proposed many different in-memory approaches that optimize cache misses instead of disk I/O, as opposed to disk-based systems, and make use of the grown parallel capabilities of modern CPUs. However, these techniques mainly focus on single-key lookups, but neglect equally important range queries. Range queries are an ubiquitous operator in data management commonly used in numerous domains, such as genomic analysis, sensor networks, or online analytical processing. The main goal of this dissertation is thus to improve the capabilities of main-memory database systems with regard to executing range queries. To this end, we first propose a cache-optimized, updateable main-memory index structure, the cache-sensitive skip list, which targets the execution of range queries on single database columns. Second, we study the performance of multidimensional range queries on modern hardware, where data are stored in main memory and processors support SIMD instructions and multi-threading. We re-evaluate a previous rule of thumb suggesting that, on disk-based systems, scans outperform index structures for selectivities of approximately 15-20% or more. To increase the practical relevance of our analysis, we also contribute a novel benchmark consisting of several realistic multidimensional range queries applied to real- world genomic data. Third, based on the outcomes of our experimental analysis, we devise a novel, fast and space-effcient, main-memory based index structure, the BB- Tree, which supports multidimensional range and point queries and provides a parallel search operator that leverages the multi-threading capabilities of modern CPUs


    This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)

    Distributed multimedia systems

    A distributed multimedia system (DMS) is an integrated communication, computing, and information system that enables the processing, management, delivery, and presentation of synchronized multimedia information with quality-of-service guarantees. Multimedia information may include discrete media data, such as text, data, and images, and continuous media data, such as video and audio. Such a system enhances human communications by exploiting both visual and aural senses and provides the ultimate flexibility in work and entertainment, allowing one to collaborate with remote participants, view movies on demand, access on-line digital libraries from the desktop, and so forth. In this paper, we present a technical survey of a DMS. We give an overview of distributed multimedia systems, examine the fundamental concept of digital media, identify the applications, and survey the important enabling technologies.published_or_final_versio

    An Agent-Based Variogram Modeller: Investigating Intelligent, Distributed-Component Geographical Information Systems

    Geo-Information Science (GIScience) is the field of study that addresses substantive questions concerning the handling, analysis and visualisation of spatial data. Geo- Information Systems (GIS), including software, data acquisition and organisational arrangements, are the key technologies underpinning GIScience. A GIS is normally tailored to the service it is supposed to perform. However, there is often the need to do a function that might not be supported by the GIS tool being used. The normal solution in these circumstances is to go out and look for another tool that can do the service, and often an expert to use that tool. This is expensive, time consuming and certainly stressful to the geographical data analyses. On the other hand, GIS is often used in conjunction with other technologies to form a geocomputational environment. One of the complex tools in geocomputation is geostatistics. One of its functions is to provide the means to determine the extent of spatial dependencies within geographical data and processes. Spatial datasets are often large and complex. Currently Agent system are being integrated into GIS to offer flexibility and allow better data analysis. The theis will look into the current application of Agents in within the GIS community, determine if they are used to representing data, process or act a service. The thesis looks into proving the applicability of an agent-oriented paradigm as a service based GIS, having the possibility of providing greater interoperability and reducing resource requirements (human and tools). In particular, analysis was undertaken to determine the need to introduce enhanced features to agents, in order to maximise their effectiveness in GIS. This was achieved by addressing the software agent complexity in design and implementation for the GIS environment and by suggesting possible solutions to encountered problems. The software agent characteristics and features (which include the dynamic binding of plans to software agents in order to tackle the levels of complexity and range of contexts) were examined, as well as discussing current GIScience and the applications of agent technology to GIS, agents as entities, objects and processes. These concepts and their functionalities to GIS are then analysed and discussed. The extent of agent functionality, analysis of the gaps and the use these technologies to express a distributed service providing an agent-based GIS framework is then presented. Thus, a general agent-based framework for GIS and a novel agent-based architecture for a specific part of GIS, the variogram, to examine the applicability of the agent- oriented paradigm to GIS, was devised. An examination of the current mechanisms for constructing variograms, underlying processes and functions was undertaken, then these processes were embedded into a novel agent architecture for GIS. Once the successful software agent implementation had been achieved, the corresponding tool was tested and validated - internally for code errors and externally to determine its functional requirements and whether it enhances the GIS process of dealing with data. Thereafter, its compared with other known service based GIS agents and its advantages and disadvantages analysed

    The Third NASA Goddard Conference on Mass Storage Systems and Technologies

    This report contains copies of nearly all of the technical papers and viewgraphs presented at the Goddard Conference on Mass Storage Systems and Technologies held in October 1993. The conference served as an informational exchange forum for topics primarily relating to the ingestion and management of massive amounts of data and the attendant problems involved. Discussion topics include the necessary use of computers in the solution of today's infinitely complex problems, the need for greatly increased storage densities in both optical and magnetic recording media, currently popular storage media and magnetic media storage risk factors, data archiving standards including a talk on the current status of the IEEE Storage Systems Reference Model (RM). Additional topics addressed System performance, data storage system concepts, communications technologies, data distribution systems, data compression, and error detection and correction

    Eight Biennial Report : April 2005 – March 2007

    Spatial Database Support for Virtual Engineering

    The development, design, manufacturing and maintenance of modern engineering products is a very expensive and complex task. Shorter product cycles and a greater diversity of models are becoming decisive competitive factors in the hard-fought automobile and plane market. In order to support engineers to create complex products when being pressed for time, systems are required which answer collision and similarity queries effectively and efficiently. In order to achieve industrial strength, the required specialized functionality has to be integrated into fully-fledged database systems, so that fundamental services of these systems can be fully reused, including transactions, concurrency control and recovery. This thesis aims at the development of theoretical sound and practical realizable algorithms which effectively and efficiently detect colliding and similar complex spatial objects. After a short introductory Part I, we look in Part II at different spatial index structures and discuss their integrability into object-relational database systems. Based on this discussion, we present two generic approaches for accelerating collision queries. The first approach exploits available statistical information in order to accelerate the query process. The second approach is based on a cost-based decompositioning of complex spatial objects. In a broad experimental evaluation based on real-world test data sets, we demonstrate the usefulness of the presented techniques which allow interactive query response times even for large data sets of complex objects. In Part III of the thesis, we discuss several similarity models for spatial objects. We show by means of a new evaluation method that data-partitioning similarity models yield more meaningful results than space-partitioning similarity models. We introduce a very effective similarity model which is based on a new paradigm in similarity search, namely the use of vector set represented objects. In order to guarantee efficient query processing, suitable filters are introduced for accelerating similarity queries on complex spatial objects. Based on clustering and the introduced similarity models we present an industrial prototype which helps the user to navigate through massive data sets.Ein schneller und reibungsloser Entwicklungsprozess neuer Produkte ist ein wichtiger Faktor für den wirtschaftlichen Erfolg vieler Unternehmen insbesondere aus der Luft- und Raumfahrttechnik und der Automobilindustrie. Damit Ingenieure in immer kürzerer Zeit immer anspruchsvollere Produkte entwickeln können, werden effektive und effiziente Kollisions- und Ähnlichkeitsanfragen auf komplexen räumlichen Objekten benötigt. Um den hohen Anforderungen eines produktiven Einsatzes zu genügen, müssen entsprechend spezialisierte Zugriffsmethoden in vollwertige Datenbanksysteme integriert werden, so dass zentrale Datenbankdienste wie Trans-aktionen, kontrollierte Nebenläufigkeit und Wiederanlauf sichergestellt sind. Ziel dieser Doktorarbeit ist es deshalb, effektive und effiziente Algorithmen für Kollisions- und Ähnlichkeitsanfragen auf komplexen räumlichen Objekten zu ent-wickeln und diese in kommerzielle Objekt-Relationale Datenbanksysteme zu integrieren. Im ersten Teil der Arbeit werden verschiedene räumliche Indexstrukturen zur effizienten Bearbeitung von Kollisionsanfragen diskutiert und auf ihre Integrationsfähigkeit in Objekt-Relationale Datenbanksysteme hin untersucht. Daran an-knüpfend werden zwei generische Verfahren zur Beschleunigung von Kollisionsanfragen vorgestellt. Das erste Verfahren benutzt statistische Informationen räumlicher Indexstrukturen, um eine gegebene Anfrage zu beschleunigen. Das zweite Verfahren beruht auf einer kostenbasierten Zerlegung komplexer räumlicher Datenbank- Objekte. Diese beiden Verfahren ergänzen sich gegenseitig und können unabhängig voneinander oder zusammen eingesetzt werden. In einer ausführlichen experimentellen Evaluation wird gezeigt, dass die beiden vorgestellten Verfahren interaktive Kollisionsanfragen auf umfangreichen Datenmengen und komplexen Objekten ermöglichen. Im zweiten Teil der Arbeit werden verschiedene Ähnlichkeitsmodelle für räum-liche Objekte vorgestellt. Es wird experimentell aufgezeigt, dass datenpartitionierende Modelle effektiver sind als raumpartitionierende Verfahren. Weiterhin werden geeignete Filtertechniken zur Beschleunigung des Anfrageprozesses entwickelt und experimentell untersucht. Basierend auf Clustering und den entwickelten Ähnlichkeitsmodellen wird ein industrietauglicher Prototyp vorgestellt, der Benutzern hilft, durch große Datenmengen zu navigieren

    The 1990 Goddard Conference on Space Applications of Artificial Intelligence

    The papers presented at the 1990 Goddard Conference on Space Applications of Artificial Intelligence are given. The purpose of this annual conference is to provide a forum in which current research and development directed at space applications of artificial intelligence can be presented and discussed. The proceedings fall into the following areas: Planning and Scheduling, Fault Monitoring/Diagnosis, Image Processing and Machine Vision, Robotics/Intelligent Control, Development Methodologies, Information Management, and Knowledge Acquisition