Search CORE

1,584 research outputs found

Definition of cross-domain indexes and ordering functions in relational algebra and its usage in relational database management systems

Author: Pinto Paulo Jorge Gonçalves
Publication venue: STRL
Publication date: 01/01/2010
Field of study

In this thesis, a mathematical model that describes a “Unique Constraint Domain” is defined. Following, the “Ordered Unique Constraint Domain” is also mathematically defined. With those definitions, a cross-domain ordering is also defined. Then it is shown that relationships between tables in a Relational Database Management System can be defined in other forms than the usual ways, using cross-domain indexes, based in cross-domain ordering. It is shown that all foreign keys in a database can be transformed in indexes with the benefit of speeding data access. It is also shown that this technique is consistent with actual modeling techniques. It is shown how the index structure, with indexes defined as functions, can provide support for relationship roles. In addition, it is also shown how this can provide support for more than two tables in one relationship and for supporting special sorting order. The addition of a mathematical function to a relation that could sort that relation, demonstrating that the closure property of relations are still kept, shows that this mathematical model can be used as extension of the base relational model. Next, it is shown that with this new technique, commercial database engines should not degrade performance because all supporting structures are already present and, in some cases, a better performance might be achieved. Code for a prototype based in a Commercial Database Engine has been added, as an annex, to show how this new technique can be used

De Montfort University Open Research Archive

OpenGrey Repository

A Survey on Array Storage, Query Languages, and Systems

Author: Cheng Yu
Rusu Florin
Publication venue
Publication date: 19/02/2013
Field of study

Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

arXiv.org e-Print Archive

CiteSeerX

Deductive Optimization of Relational Data Storage

Author: Feser John K.
Madden Samuel
Solar-Lezama Armando
Tang Nan
Publication venue
Publication date: 05/02/2020
Field of study

Optimizing the physical data storage and retrieval of data are two key database management problems. In this paper, we propose a language that can express a wide range of physical database layouts, going well beyond the row- and column-based methods that are widely used in database management systems. We use deductive synthesis to turn a high-level relational representation of a database query into a highly optimized low-level implementation which operates on a specialized layout of the dataset. We build a compiler for this language and conduct experiments using a popular database benchmark, which shows that the performance of these specialized queries is competitive with a state-of-the-art in memory compiled database system

arXiv.org e-Print Archive

DSpace@MIT

Query optimization in a memory-resident domain relational calculus database system

Author: AMMANN A. C.
BITTON D.
BLASGEN M. W.
CLOCKSIN W. F.
FRANKFORTH D
HAMMER U.
HILLER F. S.
IBM SP
KNUTH D.
Kyu-Young Whang
PECHERER R.M.
Ravi Krishnamurthy
REINER D. S.
ULLMAN J.D.
VANDER ZANDEN B. T.
WARREN D. H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Table Augmentation in Data Lakes

Author: DASSERETO FEDERICO
Publication venue: Università degli studi di Genova
Publication date: 25/05/2023
Field of study

Data lakes are centralized repositories that store large quantities of raw, unstructured, and structured data, allowing for ad-hoc data analysis, exploratory data analysis, and machine learning. However, the lack of metadata and schema in data lakes makes it challenging to work with tabular data and find related information stored in different tables. However, it is still an open problem how efficiently retrieve these tables at large scale when the settings of a data lake holds. The thesis introduces a novel approach to table augmentation that enables efficient data integration from multiple sources in a data lake. Table augmentation involves adding new data to an existing table in a horizontal fashion (by retrieving tables that can be horizontally concatenated to a query that serves as query table). The proposed approach consists of several components, including data lakes hashing, join search, similarity, and augmentation. The proposed approach is named TASH. TASH is a framework based on a spatial index in which tables are mapped and queried. Its goal is to identify the most useful columns for subsequent machine learning tasks. The table retrieval process employs a combination of set containment search and similarity search. Candidate tables are initially identified using set containment search and then ranked based on their similarity to the query. Experimental results demonstrate that TASH can effectively identify joinable tables and select the most relevant features, thereby enabling efficient table augmentation in data lakes. This research contributes to the field of big data by providing a practical solution to the challenges of data integration and analysis in data lake environments

Archivio istituzionale della ricerca - Università di Genova

Cost-based Optimization of Multistore Query Plans

Author: Forresi C
Francia M
Gallinucci E
Golfarelli M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Multistores are data management systems that enable query processing across different and heterogeneous databases; besides the distribution of data, complexity factors like schema heterogeneity and data replication must be resolved through integration and data fusion activities. Our multistore solution relies on a dataspace to provide the user with an integrated view of the available data and enables the formulation and execution of GPSJ queries. In this paper, we propose a technique to optimize the execution of GPSJ queries by formulating and evaluating different execution plans on the multistore. In particular, we outline different strategies to carry out joins and data fusion by relying on different schema representations; then, a self-learning black-box cost model is used to estimate execution times and select the most efficient plan. The experiments assess the effectiveness of the cost model in choosing the best execution plan for the given queries and exploit multiple multistore benchmarks to investigate the factors that influence the performance of different plans

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Content And Multimedia Database Management Systems

Author: Vries Arjen Paul de
Publication venue: University of Twente, Centre for Telematics and Information Technology (CTIT)
Publication date: 01/01/1999
Field of study

A database management system is a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. The main characteristic of the ‘database approach’ is that it increases the value of data by its emphasis on data independence. DBMSs, and in particular those based on the relational data model, have been very successful at the management of administrative data in the business domain. This thesis has investigated data management in multimedia digital libraries, and its implications on the design of database management systems. The main problem of multimedia data management is providing access to the stored objects. The content structure of administrative data is easily represented in alphanumeric values. Thus, database technology has primarily focused on handling the objects’ logical structure. In the case of multimedia data, representation of content is far from trivial though, and not supported by current database management systems

CiteSeerX

VU Research Portal

CWI's Institutional Repository

University of Twente Research Information