119,220 research outputs found

    Schema architecture and their relationships to transaction processing in distributed database systems

    Get PDF
    We discuss the different types of schema architectures which could be supported by distributed database systems, making a clear distinction between logical, physical, and federated distribution. We elaborate on the additional mapping information required in architecture based on logical distribution in order to support retrieval as well as update operations. We illustrate the problems in schema integration and data integration in multidatabase systems and discuss their impact on query processing. Finally, we discuss different issues relevant to the cooperation (or noncooperation) of local database systems in a heterogeneous multidatabase system and their relationship to the schema architecture and transaction processing

    Query processing of geometric objects with free form boundarie sin spatial databases

    Get PDF
    The increasing demand for the use of database systems as an integrating factor in CAD/CAM applications has necessitated the development of database systems with appropriate modelling and retrieval capabilities. One essential problem is the treatment of geometric data which has led to the development of spatial databases. Unfortunately, most proposals only deal with simple geometric objects like multidimensional points and rectangles. On the other hand, there has been a rapid development in the field of representing geometric objects with free form curves or surfaces, initiated by engineering applications such as mechanical engineering, aviation or astronautics. Therefore, we propose a concept for the realization of spatial retrieval operations on geometric objects with free form boundaries, such as B-spline or Bezier curves, which can easily be integrated in a database management system. The key concept is the encapsulation of geometric operations in a so-called query processor. First, this enables the definition of an interface allowing the integration into the data model and the definition of the query language of a database system for complex objects. Second, the approach allows the use of an arbitrary representation of the geometric objects. After a short description of the query processor, we propose some representations for free form objects determined by B-spline or Bezier curves. The goal of efficient query processing in a database environment is achieved using a combination of decomposition techniques and spatial access methods. Finally, we present some experimental results indicating that the performance of decomposition techniques is clearly superior to traditional query processing strategies for geometric objects with free form boundaries

    On transformation of query scheduling strategies in distributed and heterogeneous database systems

    Get PDF
    This work considers a problem of optimal query processing in heterogeneous and distributed database systems. A global query sub- mitted at a local site is decomposed into a number of queries processed at the remote sites. The partial results returned by the queries are in- tegrated at a local site. The paper addresses a problem of an optimal scheduling of queries that minimizes time spend on data integration of the partial results into the final answer. A global data model defined in this work provides a unified view of the heterogeneous data structures located at the remote sites and a system of operations is defined to ex- press the complex data integration procedures. This work shows that the transformations of an entirely simultaneous query processing strate- gies into a hybrid (simultaneous/sequential) strategy may in some cases lead to significantly faster data integration. We show how to detect such cases, what conditions must be satisfied to transform the schedules, and how to transform the schedules into the more efficient ones

    Integrating analytics with relational databases

    Get PDF
    The database research community has made tremendous strides in developing powerful database engines that allow for efficient analytical query processing. However, these powerful systems have gone largely unused by analysts and data scientists. This poor adoption is caused primarily by the state of database-client integration. In this thesis we attempt to overcome this challenge by investigating how we can facilitate efficient and painless integration of analytical tools and relational database management systems. We focus our investigation on the three primary methods for database-client integration: client-server connections, in-database processing and embedding the database inside the client application.PROMIMOOCAlgorithms and the Foundations of Software technolog

    Building a P2P RDF Store for Edge Devices

    Full text link
    The Semantic Web technologies have been used in the Internet of Things (IoT) to facilitate data interoperability and address data heterogeneity issues. The Resource Description Framework (RDF) model is employed in the integration of IoT data, with RDF engines serving as gateways for semantic integration. However, storing and querying RDF data obtained from distributed sources across a dynamic network of edge devices presents a challenging task. The distributed nature of the edge shares similarities with Peer-to-Peer (P2P) systems. These similarities include attributes like node heterogeneity, limited availability, and resources. The nodes primarily undertake tasks related to data storage and processing. Therefore, the P2P models appear to present an attractive approach for constructing distributed RDF stores. Based on P-Grid, a data indexing mechanism for load balancing and range query processing in P2P systems, this paper proposes a design for storing and sharing RDF data on P2P networks of low-cost edge devices. Our design aims to integrate both P-Grid and an edge-based RDF storage solution, RDF4Led for building an P2P RDF engine. This integration can maintain RDF data access and query processing while scaling with increasing data and network size. We demonstrated the scaling behavior of our implementation on a P2P network, involving up to 16 nodes of Raspberry Pi 4 devices.Comment: Accepted to IoT Conference 202

    Query-Time Data Integration

    Get PDF
    Today, data is collected in ever increasing scale and variety, opening up enormous potential for new insights and data-centric products. However, in many cases the volume and heterogeneity of new data sources precludes up-front integration using traditional ETL processes and data warehouses. In some cases, it is even unclear if and in what context the collected data will be utilized. Therefore, there is a need for agile methods that defer the effort of integration until the usage context is established. This thesis introduces Query-Time Data Integration as an alternative concept to traditional up-front integration. It aims at enabling users to issue ad-hoc queries on their own data as if all potential other data sources were already integrated, without declaring specific sources and mappings to use. Automated data search and integration methods are then coupled directly with query processing on the available data. The ambiguity and uncertainty introduced through fully automated retrieval and mapping methods is compensated by answering those queries with ranked lists of alternative results. Each result is then based on different data sources or query interpretations, allowing users to pick the result most suitable to their information need. To this end, this thesis makes three main contributions. Firstly, we introduce a novel method for Top-k Entity Augmentation, which is able to construct a top-k list of consistent integration results from a large corpus of heterogeneous data sources. It improves on the state-of-the-art by producing a set of individually consistent, but mutually diverse, set of alternative solutions, while minimizing the number of data sources used. Secondly, based on this novel augmentation method, we introduce the DrillBeyond system, which is able to process Open World SQL queries, i.e., queries referencing arbitrary attributes not defined in the queried database. The original database is then augmented at query time with Web data sources providing those attributes. Its hybrid augmentation/relational query processing enables the use of ad-hoc data search and integration in data analysis queries, and improves both performance and quality when compared to using separate systems for the two tasks. Finally, we studied the management of large-scale dataset corpora such as data lakes or Open Data platforms, which are used as data sources for our augmentation methods. We introduce Publish-time Data Integration as a new technique for data curation systems managing such corpora, which aims at improving the individual reusability of datasets without requiring up-front global integration. This is achieved by automatically generating metadata and format recommendations, allowing publishers to enhance their datasets with minimal effort. Collectively, these three contributions are the foundation of a Query-time Data Integration architecture, that enables ad-hoc data search and integration queries over large heterogeneous dataset collections

    An XML Query Engine for Network-Bound Data

    Get PDF
    XML has become the lingua franca for data exchange and integration across administrative and enterprise boundaries. Nearly all data providers are adding XML import or export capabilities, and standard XML Schemas and DTDs are being promoted for all types of data sharing. The ubiquity of XML has removed one of the major obstacles to integrating data from widely disparate sources –- namely, the heterogeneity of data formats. However, general-purpose integration of data across the wide area also requires a query processor that can query data sources on demand, receive streamed XML data from them, and combine and restructure the data into new XML output -- while providing good performance for both batch-oriented and ad-hoc, interactive queries. This is the goal of the Tukwila data integration system, the first system that focuses on network-bound, dynamic XML data sources. In contrast to previous approaches, which must read, parse, and often store entire XML objects before querying them, Tukwila can return query results even as the data is streaming into the system. Tukwila is built with a new system architecture that extends adaptive query processing and relational-engine techniques into the XML realm, as facilitated by a pair of operators that incrementally evaluate a query’s input path expressions as data is read. In this paper, we describe the Tukwila architecture and its novel aspects, and we experimentally demonstrate that Tukwila provides better overall query performance and faster initial answers than existing systems, and has excellent scalability
    • …
    corecore