Search CORE

60 research outputs found

Representation Independent Analytics Over Structured Data

Author: Chodpathumwan Yodsawalai
Fern Alan
Picado Jose
Sun Yizhou
Termehchy Arash
Publication venue
Publication date: 08/09/2014
Field of study

Database analytics algorithms leverage quantifiable structural properties of the data to predict interesting concepts and relationships. The same information, however, can be represented using many different structures and the structural properties observed over particular representations do not necessarily hold for alternative structures. Thus, there is no guarantee that current database analytics algorithms will still provide the correct insights, no matter what structures are chosen to organize the database. Because these algorithms tend to be highly effective over some choices of structure, such as that of the databases used to validate them, but not so effective with others, database analytics has largely remained the province of experts who can find the desired forms for these algorithms. We argue that in order to make database analytics usable, we should use or develop algorithms that are effective over a wide range of choices of structural organizations. We introduce the notion of representation independence, study its fundamental properties for a wide range of data analytics algorithms, and empirically analyze the amount of representation independence of some popular database analytics algorithms. Our results indicate that most algorithms are not generally representation independent and find the characteristics of more representation independent heuristics under certain representational shifts

arXiv.org e-Print Archive

CiteSeerX

Schema Independent Relational Learning

Author: Abiteboul S.
Anderson M.
Arias M.
Kraska T.
Muggleton S.
Muggleton S.
Muggleton S.
Yin X.
Publication venue
Publication date: 06/11/2017
Field of study

Learning novel concepts and relations from relational databases is an important problem with many applications in database systems and machine learning. Relational learning algorithms learn the definition of a new relation in terms of existing relations in the database. Nevertheless, the same data set may be represented under different schemas for various reasons, such as efficiency, data quality, and usability. Unfortunately, the output of current relational learning algorithms tends to vary quite substantially over the choice of schema, both in terms of learning accuracy and efficiency. This variation complicates their off-the-shelf application. In this paper, we introduce and formalize the property of schema independence of relational learning algorithms, and study both the theoretical and empirical dependence of existing algorithms on the common class of (de) composition schema transformations. We study both sample-based learning algorithms, which learn from sets of labeled examples, and query-based algorithms, which learn by asking queries to an oracle. We prove that current relational learning algorithms are generally not schema independent. For query-based learning algorithms we show that the (de) composition transformations influence their query complexity. We propose Castor, a sample-based relational learning algorithm that achieves schema independence by leveraging data dependencies. We support the theoretical results with an empirical study that demonstrates the schema dependence/independence of several algorithms on existing benchmark and real-world datasets under (de) compositions

arXiv.org e-Print Archive

Crossref

Master of Science

Author: Pollock Susan Elizabeth
Publication venue: University of Utah
Publication date: 01/08/2012
Field of study

thesisData quality has become a significant issue in healthcare as large preexisting databases are integrated to provide greater depth for research and process improvement. Large scale data integration exposes and compounds data quality issues latent in source systems. Although the problems related to data quality in transactional databases have been identified and well-addressed, the application of data quality constraints to large scale data repositories has not and requires novel applications of traditional concepts and methodologies. Despite an abundance of data quality theory, tools and software, there is no consensual technique available to guide developers in the identification of data integrity issues and the application of data quality rules in warehouse-type applications. Data quality measures are frequently developed on an ad hoc basis or methods designed to assure data quality in transactional systems are loosely applied to analytic data stores. These measures are inadequate to address the complex data quality issues in large, integrated data repositories particularly in the healthcare domain with its heterogeneous source systems. This study derives a taxonomy of data quality rules from relational database theory. It describes the development and implementation of data quality rules in the Analytic Health Repository at Intermountain Healthcare and situates the data quality rules in the taxonomy. Further, it identifies areas in which more rigorous data quality iv should be explored. This comparison demonstrates the superiority of a structured approach to data quality rule identification

The University of Utah: J. Willard Marriott Digital Library

The normalization of frames as a superclass of relations

Author: Jonker Jacob
Publication venue
Publication date: 17/11/2014
Field of study

M.Sc. (Computer science)Knowledge representation suffers from certain problems, which is not a result of the inadequacies of knowledge representation schemes, but of the way in which they are used and implemented. In the first part of this dissertation we examine the relational model (as used in relational database management systems) and we examine frames (a knowledge representation scheme used in expert systems), as proposed by M. Minsky [MIN75]. We then provide our own definition of frames. In the second part, we examine similarities between the two models (the relational model and our frame model), establishing frames as a superclass of relations. We then define normalization for frames and examine how normalization might solve some of the problems we have identified. We then examine the integration of knowledge-based systems and database management systems and classify our normalization of frames as such an attempt. We conclude by examining the place of normalization within the expert system development life cycl

University of Johannesburg Institutional Repository

Database design: A practical methodology.

Author: Nemovicher Kerry
Publication venue: Lehigh Preserve
Publication date
Field of study

Lehigh University: Lehigh Preserve

A SQL front-end semantic data model

Author: Lodico Marc Richard
Publication venue: RIT Scholar Works
Publication date: 01/01/1989
Field of study

SQLSDM is a front end semantic data model to a SQL relational database management system (RDBMS). SQLSDM provides a more semantically complete RDBMS through the implementation of a Domain and Relational Integrity scheme. SQLSDM provides integrity definition functions and a sub-system to interpret SQL commands . Integrity system tables are created through the use of SQLSDM \u27 s domain definition command and SQL \u27 s CREATE TABLE command. As SQL database update commands are interpreted, SQLSDM uses these integrity tables to enforce domain and referential integrity. SQLSDM operates virtually transparent to the user and provides for greater database consistency and semantic control. Furthermore, SQLSDM is designed and engineered to be a portable front-end that may be implemented on any SQL relational database management system

RIT Scholar Works