50,577 research outputs found
SODA: Generating SQL for Business Users
The purpose of data warehouses is to enable business analysts to make better
decisions. Over the years the technology has matured and data warehouses have
become extremely successful. As a consequence, more and more data has been
added to the data warehouses and their schemas have become increasingly
complex. These systems still work great in order to generate pre-canned
reports. However, with their current complexity, they tend to be a poor match
for non tech-savvy business analysts who need answers to ad-hoc queries that
were not anticipated. This paper describes the design, implementation, and
experience of the SODA system (Search over DAta Warehouse). SODA bridges the
gap between the business needs of analysts and the technical complexity of
current data warehouses. SODA enables a Google-like search experience for data
warehouses by taking keyword queries of business users and automatically
generating executable SQL. The key idea is to use a graph pattern matching
algorithm that uses the metadata model of the data warehouse. Our results with
real data from a global player in the financial services industry show that
SODA produces queries with high precision and recall, and makes it much easier
for business users to interactively explore highly-complex data warehouses.Comment: VLDB201
Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers
We aim to provide table answers to keyword queries against knowledge bases.
For queries referring to multiple entities, like "Washington cities population"
and "Mel Gibson movies", it is better to represent each relevant answer as a
table which aggregates a set of entities or entity-joins within the same table
scheme or pattern. In this paper, we study how to find highly relevant patterns
in a knowledge base for user-given keyword queries to compose table answers. A
knowledge base can be modeled as a directed graph called knowledge graph, where
nodes represent entities in the knowledge base and edges represent the
relationships among them. Each node/edge is labeled with type and text. A
pattern is an aggregation of subtrees which contain all keywords in the texts
and have the same structure and types on node/edges. We propose efficient
algorithms to find patterns that are relevant to the query for a class of
scoring functions. We show the hardness of the problem in theory, and propose
path-based indexes that are affordable in memory. Two query-processing
algorithms are proposed: one is fast in practice for small queries (with small
patterns as answers) by utilizing the indexes; and the other one is better in
theory, with running time linear in the sizes of indexes and answers, which can
handle large queries better. We also conduct extensive experimental study to
compare our approaches with a naive adaption of known techniques.Comment: VLDB 201
A Symbolic Execution Algorithm for Constraint-Based Testing of Database Programs
In so-called constraint-based testing, symbolic execution is a common
technique used as a part of the process to generate test data for imperative
programs. Databases are ubiquitous in software and testing of programs
manipulating databases is thus essential to enhance the reliability of
software. This work proposes and evaluates experimentally a symbolic ex-
ecution algorithm for constraint-based testing of database programs. First, we
describe SimpleDB, a formal language which offers a minimal and well-defined
syntax and seman- tics, to model common interaction scenarios between pro-
grams and databases. Secondly, we detail the proposed al- gorithm for symbolic
execution of SimpleDB models. This algorithm considers a SimpleDB program as a
sequence of operations over a set of relational variables, modeling both the
database tables and the program variables. By inte- grating this relational
model of the program with classical static symbolic execution, the algorithm
can generate a set of path constraints for any finite path to test in the
control- flow graph of the program. Solutions of these constraints are test
inputs for the program, including an initial content for the database. When the
program is executed with respect to these inputs, it is guaranteed to follow
the path with re- spect to which the constraints were generated. Finally, the
algorithm is evaluated experimentally using representative SimpleDB models.Comment: 12 pages - preliminary wor
Reasoning & Querying – State of the Art
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF
A generalized strategy for building resident database interfaces
A strategy for building resident interfaces to host heterogeneous distributed data base management systems is developed. The strategy is used to construct several interfaces. A set of guidelines is developed for users to construct their own interfaces
06472 Abstracts Collection - XQuery Implementation Paradigms
From 19.11.2006 to 22.11.2006, the Dagstuhl Seminar 06472 ``XQuery Implementation Paradigms'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available
- …