5,701 research outputs found
Profiling relational data: a survey
Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. Among the simpler results are statistics, such as the number of null values and distinct values in a column, its data type, or the most frequent patterns of its data values. Metadata that are more difficult to compute involve multiple columns, namely correlations, unique column combinations, functional dependencies, and inclusion dependencies. Further techniques detect conditional properties of the dataset at hand. This survey provides a classification of data profiling tasks and comprehensively reviews the state of the art for each class. In addition, we review data profiling tools and systems from research and industry. We conclude with an outlook on the future of data profiling beyond traditional profiling tasks and beyond relational databases
Relaxed Functional Dependencies - A Survey of Approaches
Recently, there has been a renovated interest in functional dependencies due to the possibility of employing them in several advanced database operations, such as data cleaning, query relaxation, record matching, and so forth. In particular, the constraints defined for canonical functional dependencies have been relaxed to capture inconsistencies in real data, patterns of semantically related data, or semantic relationships in complex data types. In this paper, we have surveyed 35 of such functional dependencies, providing a classification criteria, motivating examples, and a systematic analysis of them
Conceptual Based Hidden Data Analytics and Reduction Method for System Interface Enhancement Through Handheld devices
With the increasing demand placed on online systems by users, many organizations and companies are seeking to enhance their online interfaces to facilitate the search process on their hidden databases. Usually, users issue queries to a hidden database by using the search template provided by the system. In this thesis, a new approach based mainly on hidden database reduction preserving functional dependencies is developed for enhancing the online system interface through a small screen device. The developed approach is applied to online market systems like eBay. Offline hidden data analysis is used to discover attributes and their domains and different functional dependencies. In this thesis, a comparative study between several methods for mining functional dependencies shows the advantage of conceptual methods for data reduction. In addition, by using online consecutive reductions on search results, we adopted a method of displaying results in order of decreasing relevance. The validation of the proposed designed and developed methods prove their generality and suitability for system interfacing through continuous data reductions.NPRP-07-794-1-145 grant from the Qatar National Research Fund (a member of Qatar foundation
From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back
In this work we establish and investigate connections between causes for
query answers in databases, database repairs wrt. denial constraints, and
consistency-based diagnosis. The first two are relatively new research areas in
databases, and the third one is an established subject in knowledge
representation. We show how to obtain database repairs from causes, and the
other way around. Causality problems are formulated as diagnosis problems, and
the diagnoses provide causes and their responsibilities. The vast body of
research on database repairs can be applied to the newer problems of computing
actual causes for query answers and their responsibilities. These connections,
which are interesting per se, allow us, after a transition -inspired by
consistency-based diagnosis- to computational problems on hitting sets and
vertex covers in hypergraphs, to obtain several new algorithmic and complexity
results for database causality.Comment: To appear in Theory of Computing Systems. By invitation to special
issue with extended papers from ICDT 2015 (paper arXiv:1412.4311
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
- …