6,317 research outputs found
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
Spontaneous Analogy by Piggybacking on a Perceptual System
Most computational models of analogy assume they are given a delineated
source domain and often a specified target domain. These systems do not address
how analogs can be isolated from large domains and spontaneously retrieved from
long-term memory, a process we call spontaneous analogy. We present a system
that represents relational structures as feature bags. Using this
representation, our system leverages perceptual algorithms to automatically
create an ontology of relational structures and to efficiently retrieve analogs
for new relational structures from long-term memory. We provide a demonstration
of our approach that takes a set of unsegmented stories, constructs an ontology
of analogical schemas (corresponding to plot devices), and uses this ontology
to efficiently find analogs within new stories, yielding significant
time-savings over linear analog retrieval at a small accuracy cost.Comment: Proceedings of the 35th Meeting of the Cognitive Science Society,
201
Automated schema matching techniques: an exploratory study
Manual schema matching is a problem for many database applications that use multiple data sources including data warehousing and e-commerce applications. Current research attempts to address this problem by developing algorithms to automate aspects of the schema-matching task. In this paper, an approach using an external dictionary facilitates automated discovery of the semantic meaning of database schema terms. An experimental study was conducted to evaluate the performance and accuracy of five schema-matching techniques with the proposed approach, called SemMA. The proposed approach and results are compared with two existing semi-automated schema-matching approaches and suggestions for future research are made
Relational Foundations For Functorial Data Migration
We study the data transformation capabilities associated with schemas that
are presented by directed multi-graphs and path equations. Unlike most
approaches which treat graph-based schemas as abbreviations for relational
schemas, we treat graph-based schemas as categories. A schema is a
finitely-presented category, and the collection of all -instances forms a
category, -inst. A functor between schemas and , which can be
generated from a visual mapping between graphs, induces three adjoint data
migration functors, -inst-inst, -inst -inst, and -inst -inst. We present an algebraic query
language FQL based on these functors, prove that FQL is closed under
composition, prove that FQL can be implemented with the
select-project-product-union relational algebra (SPCU) extended with a
key-generation operation, and prove that SPCU can be implemented with FQL
SODA: Generating SQL for Business Users
The purpose of data warehouses is to enable business analysts to make better
decisions. Over the years the technology has matured and data warehouses have
become extremely successful. As a consequence, more and more data has been
added to the data warehouses and their schemas have become increasingly
complex. These systems still work great in order to generate pre-canned
reports. However, with their current complexity, they tend to be a poor match
for non tech-savvy business analysts who need answers to ad-hoc queries that
were not anticipated. This paper describes the design, implementation, and
experience of the SODA system (Search over DAta Warehouse). SODA bridges the
gap between the business needs of analysts and the technical complexity of
current data warehouses. SODA enables a Google-like search experience for data
warehouses by taking keyword queries of business users and automatically
generating executable SQL. The key idea is to use a graph pattern matching
algorithm that uses the metadata model of the data warehouse. Our results with
real data from a global player in the financial services industry show that
SODA produces queries with high precision and recall, and makes it much easier
for business users to interactively explore highly-complex data warehouses.Comment: VLDB201
Functorial Data Migration
In this paper we present a simple database definition language: that of
categories and functors. A database schema is a small category and an instance
is a set-valued functor on it. We show that morphisms of schemas induce three
"data migration functors", which translate instances from one schema to the
other in canonical ways. These functors parameterize projections, unions, and
joins over all tables simultaneously and can be used in place of conjunctive
and disjunctive queries. We also show how to connect a database and a
functional programming language by introducing a functorial connection between
the schema and the category of types for that language. We begin the paper with
a multitude of examples to motivate the definitions, and near the end we
provide a dictionary whereby one can translate database concepts into
category-theoretic concepts and vice-versa.Comment: 30 page
Probabilistic Relational Model Benchmark Generation
The validation of any database mining methodology goes through an evaluation
process where benchmarks availability is essential. In this paper, we aim to
randomly generate relational database benchmarks that allow to check
probabilistic dependencies among the attributes. We are particularly interested
in Probabilistic Relational Models (PRMs), which extend Bayesian Networks (BNs)
to a relational data mining context and enable effective and robust reasoning
over relational data. Even though a panoply of works have focused, separately ,
on the generation of random Bayesian networks and relational databases, no work
has been identified for PRMs on that track. This paper provides an algorithmic
approach for generating random PRMs from scratch to fill this gap. The proposed
method allows to generate PRMs as well as synthetic relational data from a
randomly generated relational schema and a random set of probabilistic
dependencies. This can be of interest not only for machine learning researchers
to evaluate their proposals in a common framework, but also for databases
designers to evaluate the effectiveness of the components of a database
management system
- …