586 research outputs found

    Multi-Step Processing of Spatial Joins

    Get PDF
    Spatial joins are one of the most important operations for combining spatial objects of several relations. In this paper, spatial join processing is studied in detail for extended spatial objects in twodimensional data space. We present an approach for spatial join processing that is based on three steps. First, a spatial join is performed on the minimum bounding rectangles of the objects returning a set of candidates. Various approaches for accelerating this step of join processing have been examined at the last year’s conference [BKS 93a]. In this paper, we focus on the problem how to compute the answers from the set of candidates which is handled by the following two steps. First of all, sophisticated approximations are used to identify answers as well as to filter out false hits from the set of candidates. For this purpose, we investigate various types of conservative and progressive approximations. In the last step, the exact geometry of the remaining candidates has to be tested against the join predicate. The time required for computing spatial join predicates can essentially be reduced when objects are adequately organized in main memory. In our approach, objects are first decomposed into simple components which are exclusively organized by a main-memory resident spatial data structure. Overall, we present a complete approach of spatial join processing on complex spatial objects. The performance of the individual steps of our approach is evaluated with data sets from real cartographic applications. The results show that our approach reduces the total execution time of the spatial join by factors

    Data Processing with FPGAs on Modern Architectures

    Full text link
    Trends in hardware, the prevalence of the cloud, and the rise of highly demanding applications have ushered an era of specialization that quickly changes how data is processed at scale. These changes are likely to continue and accelerate in the next years as new technologies are adopted and deployed: smart NICs, smart storage, smart memory, disaggregated storage, disaggregated memory, specialized accelerators (GPUS, TPUs, FPGAs), and a wealth of ASICs specifically created to deal with computationally expensive tasks (e.g., cryptography or compression). In this tutorial, we focus on data processing on FPGAs, a technology that has received less attention than, e.g., TPUs or GPUs but that is, however, increasingly being deployed in the cloud for data processing tasks due to the architectural flexibility of FPGAs, along with their ability to process data at line rate, something not possible with other types of processors or accelerators. In the tutorial, we will cover what FPGAs are, their characteristics, their advantages and disadvantages, as well as examples from deployments in the industry and how they are used in various data processing tasks. We will introduce FPGA programming with high-level languages and describe hardware and software resources available to researchers. The tutorial includes case studies borrowed from research done in collaboration with companies that illustrate the potential of FPGAs in data processing and how software and hardware are evolving to take advantage of the possibilities offered by FPGAs. The use cases include: (1) approximated nearest neighbor search, which is relevant to databases and machine learning, (2) remote disaggregated memory, showing how the cloud architecture is evolving and demonstrating the potential for operator offloading and line rate data processing, and (3) recommendation system as an application with tight latency constraints

    Enhanced Query Processing on Complex Spatial and Temporal Data

    Get PDF
    Innovative technologies in the area of multimedia and mechanical engineering as well as novel methods for data acquisition in different scientific subareas, including geo-science, environmental science, medicine, biology and astronomy, enable a more exact representation of the data, and thus, a more precise data analysis. The resulting quantitative and qualitative growth of specifically spatial and temporal data leads to new challenges for the management and processing of complex structured objects and requires the employment of efficient and effective methods for data analysis. Spatial data denote the description of objects in space by a well-defined extension, a specific location and by their relationships to the other objects. Classical representatives of complex structured spatial objects are three-dimensional CAD data from the sector "mechanical engineering" and two-dimensional bounded regions from the area "geography". For industrial applications, efficient collision and intersection queries are of great importance. Temporal data denote data describing time dependent processes, as for instance the duration of specific events or the description of time varying attributes of objects. Time series belong to one of the most popular and complex type of temporal data and are the most important form of description for time varying processes. An elementary type of query in time series databases is the similarity query which serves as basic query for data mining applications. The main target of this thesis is to develop an effective and efficient algorithm supporting collision queries on spatial data as well as similarity queries on temporal data, in particular, time series. The presented concepts are based on the efficient management of interval sequences which are suitable for spatial and temporal data. The effective analysis of the underlying objects will be efficiently supported by adequate access methods. First, this thesis deals with collision queries on complex spatial objects which can be reduced to intersection queries on interval sequences. We introduce statistical methods for the grouping of subsequences. Involving the concept of multi-step query processing, these methods enable the user to accelerate the query process drastically. Furthermore, in this thesis we will develop a cost model for the multi-step query process of interval sequences in distributed systems. The proposed approach successfully supports a cost based query strategy. Second, we introduce a novel similarity measure for time series. It allows the user to focus specific time series amplitudes for the similarity measurement. The new similarity model defines two time series to be similar iff they show similar temporal behavior w.r.t. being below or above a specific threshold. This type of query is primarily required in natural science applications. The main goal of this new query method is the detection of anomalies and the adaptation to new claims in the area of data mining in time series databases. In addition, a semi-supervised cluster analysis method will be presented which is based on the introduced similarity model for time series. The efficiency and effectiveness of the proposed techniques will be extensively discussed and the advantages against existing methods experimentally proofed by means of datasets derived from real-world applications

    Spatial Database Support for Virtual Engineering

    Get PDF
    The development, design, manufacturing and maintenance of modern engineering products is a very expensive and complex task. Shorter product cycles and a greater diversity of models are becoming decisive competitive factors in the hard-fought automobile and plane market. In order to support engineers to create complex products when being pressed for time, systems are required which answer collision and similarity queries effectively and efficiently. In order to achieve industrial strength, the required specialized functionality has to be integrated into fully-fledged database systems, so that fundamental services of these systems can be fully reused, including transactions, concurrency control and recovery. This thesis aims at the development of theoretical sound and practical realizable algorithms which effectively and efficiently detect colliding and similar complex spatial objects. After a short introductory Part I, we look in Part II at different spatial index structures and discuss their integrability into object-relational database systems. Based on this discussion, we present two generic approaches for accelerating collision queries. The first approach exploits available statistical information in order to accelerate the query process. The second approach is based on a cost-based decompositioning of complex spatial objects. In a broad experimental evaluation based on real-world test data sets, we demonstrate the usefulness of the presented techniques which allow interactive query response times even for large data sets of complex objects. In Part III of the thesis, we discuss several similarity models for spatial objects. We show by means of a new evaluation method that data-partitioning similarity models yield more meaningful results than space-partitioning similarity models. We introduce a very effective similarity model which is based on a new paradigm in similarity search, namely the use of vector set represented objects. In order to guarantee efficient query processing, suitable filters are introduced for accelerating similarity queries on complex spatial objects. Based on clustering and the introduced similarity models we present an industrial prototype which helps the user to navigate through massive data sets.Ein schneller und reibungsloser Entwicklungsprozess neuer Produkte ist ein wichtiger Faktor für den wirtschaftlichen Erfolg vieler Unternehmen insbesondere aus der Luft- und Raumfahrttechnik und der Automobilindustrie. Damit Ingenieure in immer kürzerer Zeit immer anspruchsvollere Produkte entwickeln können, werden effektive und effiziente Kollisions- und Ähnlichkeitsanfragen auf komplexen räumlichen Objekten benötigt. Um den hohen Anforderungen eines produktiven Einsatzes zu genügen, müssen entsprechend spezialisierte Zugriffsmethoden in vollwertige Datenbanksysteme integriert werden, so dass zentrale Datenbankdienste wie Trans-aktionen, kontrollierte Nebenläufigkeit und Wiederanlauf sichergestellt sind. Ziel dieser Doktorarbeit ist es deshalb, effektive und effiziente Algorithmen für Kollisions- und Ähnlichkeitsanfragen auf komplexen räumlichen Objekten zu ent-wickeln und diese in kommerzielle Objekt-Relationale Datenbanksysteme zu integrieren. Im ersten Teil der Arbeit werden verschiedene räumliche Indexstrukturen zur effizienten Bearbeitung von Kollisionsanfragen diskutiert und auf ihre Integrationsfähigkeit in Objekt-Relationale Datenbanksysteme hin untersucht. Daran an-knüpfend werden zwei generische Verfahren zur Beschleunigung von Kollisionsanfragen vorgestellt. Das erste Verfahren benutzt statistische Informationen räumlicher Indexstrukturen, um eine gegebene Anfrage zu beschleunigen. Das zweite Verfahren beruht auf einer kostenbasierten Zerlegung komplexer räumlicher Datenbank- Objekte. Diese beiden Verfahren ergänzen sich gegenseitig und können unabhängig voneinander oder zusammen eingesetzt werden. In einer ausführlichen experimentellen Evaluation wird gezeigt, dass die beiden vorgestellten Verfahren interaktive Kollisionsanfragen auf umfangreichen Datenmengen und komplexen Objekten ermöglichen. Im zweiten Teil der Arbeit werden verschiedene Ähnlichkeitsmodelle für räum-liche Objekte vorgestellt. Es wird experimentell aufgezeigt, dass datenpartitionierende Modelle effektiver sind als raumpartitionierende Verfahren. Weiterhin werden geeignete Filtertechniken zur Beschleunigung des Anfrageprozesses entwickelt und experimentell untersucht. Basierend auf Clustering und den entwickelten Ähnlichkeitsmodellen wird ein industrietauglicher Prototyp vorgestellt, der Benutzern hilft, durch große Datenmengen zu navigieren
    • …
    corecore