2,767 research outputs found
A Call to Arms: Revisiting Database Design
Good database design is crucial to obtain a sound, consistent database, and -
in turn - good database design methodologies are the best way to achieve the
right design. These methodologies are taught to most Computer Science
undergraduates, as part of any Introduction to Database class. They can be
considered part of the "canon", and indeed, the overall approach to database
design has been unchanged for years. Moreover, none of the major database
research assessments identify database design as a strategic research
direction.
Should we conclude that database design is a solved problem?
Our thesis is that database design remains a critical unsolved problem.
Hence, it should be the subject of more research. Our starting point is the
observation that traditional database design is not used in practice - and if
it were used it would result in designs that are not well adapted to current
environments. In short, database design has failed to keep up with the times.
In this paper, we put forth arguments to support our viewpoint, analyze the
root causes of this situation and suggest some avenues of research.Comment: Removed spurious column break. Nothing else was change
Encoding databases satisfying a given set of dependencies
Consider a relation schema with a set of dependency constraints. A fundamental question is what is the minimum space where the possible instances of the schema can be "stored". We study the following model. Encode the instances by giving a function which maps the set of possible instances into the set of words of a given length over the binary alphabet in a decodable way. The problem is to find the minimum length needed. This minimum is called the information content of the database. We investigate several cases where the set of dependency constraints consist of relatively simple sets of functional or multivalued dependencies. We also consider the following natural extension. Is it possible to encode the instances such a way that small changes in the instance cause a small change in the code. © 2012 Springer-Verlag
Twelve Theses on Reactive Rules for the Web
Reactivity, the ability to detect and react to events, is an
essential functionality in many information systems. In particular, Web
systems such as online marketplaces, adaptive (e.g., recommender) systems,
and Web services, react to events such as Web page updates or
data posted to a server.
This article investigates issues of relevance in designing high-level programming
languages dedicated to reactivity on the Web. It presents
twelve theses on features desirable for a language of reactive rules tuned
to programming Web and Semantic Web applications
An Expressive Language and Efficient Execution System for Software Agents
Software agents can be used to automate many of the tedious, time-consuming
information processing tasks that humans currently have to complete manually.
However, to do so, agent plans must be capable of representing the myriad of
actions and control flows required to perform those tasks. In addition, since
these tasks can require integrating multiple sources of remote information ?
typically, a slow, I/O-bound process ? it is desirable to make execution as
efficient as possible. To address both of these needs, we present a flexible
software agent plan language and a highly parallel execution system that enable
the efficient execution of expressive agent plans. The plan language allows
complex tasks to be more easily expressed by providing a variety of operators
for flexibly processing the data as well as supporting subplans (for
modularity) and recursion (for indeterminate looping). The executor is based on
a streaming dataflow model of execution to maximize the amount of operator and
data parallelism possible at runtime. We have implemented both the language and
executor in a system called THESEUS. Our results from testing THESEUS show that
streaming dataflow execution can yield significant speedups over both
traditional serial (von Neumann) as well as non-streaming dataflow-style
execution that existing software and robot agent execution systems currently
support. In addition, we show how plans written in the language we present can
represent certain types of subtasks that cannot be accomplished using the
languages supported by network query engines. Finally, we demonstrate that the
increased expressivity of our plan language does not hamper performance;
specifically, we show how data can be integrated from multiple remote sources
just as efficiently using our architecture as is possible with a
state-of-the-art streaming-dataflow network query engine
Normalization Theory for XML
Abstract. Specifications of XML documents typically consist of typing information (e.g., a DTD), and integrity constraints. Just like relational schema specifications, not all are good â some are prone to redundancies and update anomalies. In the relational world we have a well-developed theory of data design (also known as normalization). A few definitions of XML normal forms have been proposed, but the main question is why a particular design is good. In the XML world, we still lack universally accepted query languages such as relational algebra, or update languages that let us reason about storage redundancies, lossless decompositions, and update anomalies. A better approach, therefore, is to come up with notions of good design based on the intrinsic properties of the model itself. We present such an approach, based on Shannonâs information theory, and show how it applies to relational normal forms as well as to XML design, for both native and relational storage.
A 3d geoscience information system framework
Two-dimensional geographical information systems are extensively used in the geosciences to create and analyse maps. However, these systems are unable to represent the Earth's subsurface in three spatial dimensions. The objective of this thesis is to overcome this deficiency, to provide a general framework for a 3d geoscience information system (GIS), and to contribute to the public discussion about the development of an infrastructure for geological observation data, geomodels, and geoservices. Following the objective, the requirements for a 3d GIS are analysed. According to the requirements, new geologically sensible query functionality for geometrical, topological and geological properties has been developed and the integration of 3d geological modeling and data management system components in a generic framework has been accomplished. The 3d geoscience information system framework presented here is characterized by the following features: - Storage of geological observation data and geomodels in a XML-database server. According to a new data model, geological observation data can be referenced by a set of geomodels. - Functionality for querying observation data and 3d geomodels based on their 3d geometrical, topological, material, and geological properties were developed and implemented as plug-in for a 3d geomodeling user application. - For database queries, the standard XML query language has been extended with 3d spatial operators. The spatial database query operations are computed using a XML application server which has been developed for this specific purpose. This technology allows sophisticated 3d spatial and geological database queries. Using the developed methods, queries can be answered like: "Select all sandstone horizons which are intersected by the set of faults F". This request contains a topological and a geological material parameter. The combination of queries with other GIS methods, like visual and statistical analysis, allows geoscience investigations in a novel 3d GIS environment. More generally, a 3d GIS enables geologists to read and understand a 3d digital geomodel analogously as they read a conventional 2d geological map
Static and dynamic semantics of NoSQL languages
We present a calculus for processing semistructured data that spans
differences of application area among several novel query languages, broadly
categorized as "NoSQL". This calculus lets users define their own operators,
capturing a wider range of data processing capabilities, whilst providing a
typing precision so far typical only of primitive hard-coded operators. The
type inference algorithm is based on semantic type checking, resulting in type
information that is both precise, and flexible enough to handle structured and
semistructured data. We illustrate the use of this calculus by encoding a large
fragment of Jaql, including operations and iterators over JSON, embedded SQL
expressions, and co-grouping, and show how the encoding directly yields a
typing discipline for Jaql as it is, namely without the addition of any type
definition or type annotation in the code
- âŠ