494 research outputs found
Representation Independent Analytics Over Structured Data
Database analytics algorithms leverage quantifiable structural properties of
the data to predict interesting concepts and relationships. The same
information, however, can be represented using many different structures and
the structural properties observed over particular representations do not
necessarily hold for alternative structures. Thus, there is no guarantee that
current database analytics algorithms will still provide the correct insights,
no matter what structures are chosen to organize the database. Because these
algorithms tend to be highly effective over some choices of structure, such as
that of the databases used to validate them, but not so effective with others,
database analytics has largely remained the province of experts who can find
the desired forms for these algorithms. We argue that in order to make database
analytics usable, we should use or develop algorithms that are effective over a
wide range of choices of structural organizations. We introduce the notion of
representation independence, study its fundamental properties for a wide range
of data analytics algorithms, and empirically analyze the amount of
representation independence of some popular database analytics algorithms. Our
results indicate that most algorithms are not generally representation
independent and find the characteristics of more representation independent
heuristics under certain representational shifts
Worst-case Optimal Query Answering for Greedy Sets of Existential Rules and Their Subclasses
The need for an ontological layer on top of data, associated with advanced
reasoning mechanisms able to exploit the semantics encoded in ontologies, has
been acknowledged both in the database and knowledge representation
communities. We focus in this paper on the ontological query answering problem,
which consists of querying data while taking ontological knowledge into
account. More specifically, we establish complexities of the conjunctive query
entailment problem for classes of existential rules (also called
tuple-generating dependencies, Datalog+/- rules, or forall-exists-rules. Our
contribution is twofold. First, we introduce the class of greedy
bounded-treewidth sets (gbts) of rules, which covers guarded rules, and their
most well-known generalizations. We provide a generic algorithm for query
entailment under gbts, which is worst-case optimal for combined complexity with
or without bounded predicate arity, as well as for data complexity and query
complexity. Secondly, we classify several gbts classes, whose complexity was
unknown, with respect to combined complexity (with both unbounded and bounded
predicate arity) and data complexity to obtain a comprehensive picture of the
complexity of existential rule fragments that are based on diverse guardedness
notions. Upper bounds are provided by showing that the proposed algorithm is
optimal for all of them
Schema Independent Relational Learning
Learning novel concepts and relations from relational databases is an
important problem with many applications in database systems and machine
learning. Relational learning algorithms learn the definition of a new relation
in terms of existing relations in the database. Nevertheless, the same data set
may be represented under different schemas for various reasons, such as
efficiency, data quality, and usability. Unfortunately, the output of current
relational learning algorithms tends to vary quite substantially over the
choice of schema, both in terms of learning accuracy and efficiency. This
variation complicates their off-the-shelf application. In this paper, we
introduce and formalize the property of schema independence of relational
learning algorithms, and study both the theoretical and empirical dependence of
existing algorithms on the common class of (de) composition schema
transformations. We study both sample-based learning algorithms, which learn
from sets of labeled examples, and query-based algorithms, which learn by
asking queries to an oracle. We prove that current relational learning
algorithms are generally not schema independent. For query-based learning
algorithms we show that the (de) composition transformations influence their
query complexity. We propose Castor, a sample-based relational learning
algorithm that achieves schema independence by leveraging data dependencies. We
support the theoretical results with an empirical study that demonstrates the
schema dependence/independence of several algorithms on existing benchmark and
real-world datasets under (de) compositions
CREOLE: a Universal Language for Creating, Requesting, Updating and Deleting Resources
In the context of Service-Oriented Computing, applications can be developed
following the REST (Representation State Transfer) architectural style. This
style corresponds to a resource-oriented model, where resources are manipulated
via CRUD (Create, Request, Update, Delete) interfaces. The diversity of CRUD
languages due to the absence of a standard leads to composition problems
related to adaptation, integration and coordination of services. To overcome
these problems, we propose a pivot architecture built around a universal
language to manipulate resources, called CREOLE, a CRUD Language for Resource
Edition. In this architecture, scripts written in existing CRUD languages, like
SQL, are compiled into Creole and then executed over different CRUD interfaces.
After stating the requirements for a universal language for manipulating
resources, we formally describe the language and informally motivate its
definition with respect to the requirements. We then concretely show how the
architecture solves adaptation, integration and coordination problems in the
case of photo management in Flickr and Picasa, two well-known service-oriented
applications. Finally, we propose a roadmap for future work.Comment: In Proceedings FOCLASA 2010, arXiv:1007.499
Datalog± Ontology Consolidation
Knowledge bases in the form of ontologies are receiving increasing attention as they allow to clearly represent both the available knowledge, which includes the knowledge in itself and the constraints imposed to it by the domain or the users. In particular, Datalog ± ontologies are attractive because of their property of decidability and the possibility of dealing with the massive amounts of data in real world environments; however, as it is the case with many other ontological languages, their application in collaborative environments often lead to inconsistency related issues. In this paper we introduce the notion of incoherence regarding Datalog± ontologies, in terms of satisfiability of sets of constraints, and show how under specific conditions incoherence leads to inconsistent Datalog ± ontologies. The main contribution of this work is a novel approach to restore both consistency and coherence in Datalog± ontologies. The proposed approach is based on kernel contraction and restoration is performed by the application of incision functions that select formulas to delete. Nevertheless, instead of working over minimal incoherent/inconsistent sets encountered in the ontologies, our operators produce incisions over non-minimal structures called clusters. We present a construction for consolidation operators, along with the properties expected to be satisfied by them. Finally, we establish the relation between the construction and the properties by means of a representation theorem. Although this proposal is presented for Datalog± ontologies consolidation, these operators can be applied to other types of ontological languages, such as Description Logics, making them apt to be used in collaborative environments like the Semantic Web.Fil: Deagustini, Cristhian Ariel David. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Martinez, Maria Vanina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Falappa, Marcelo Alejandro. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Simari, Guillermo Ricardo. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; Argentin
Teaching an RDBMS about ontological constraints
International audienceIn the presence of an ontology, query answers must reflect not only data explicitly present in the database, but also implicit data, which holds due to the ontology, even though it is not present in the database. A large and useful set of ontology languages enjoys FOL reducibility of query answering: answering a query can be reduced to evaluating a certain first-order logic (FOL) formula (obtained from the query and ontology) against only the explicit facts. We present a novel query optimization framework for ontology-based data access settings enjoying FOL reducibility. Our framework is based on searching within a set of alternative equivalent FOL queries, i.e., FOL reformulations, one with minimal evaluation cost when evaluated through a relational database system. We apply this framework to the DL-LiteR Description Logic underpinning the W3C's OWL2 QL ontology language, and demonstrate through experiments its performance benefits when two leading SQL systems, one open-source and one commercial, are used for evaluating the FOL query reformulations
- …