176,873 research outputs found
Recommended from our members
A relational dataflow database
A model of a relational database system based on the principles of functional, data-driven computation is proposed. Relations (sets of data tuples) are represented as streams of values carried by independent tokens among operators of an unraveling dataflow network.Values may be “updated” by circulating the database through an update operator. To perform a query on the database, streams involved in that query are replicated and submitted as inputs to dataflow programs (graphs) obtained by translating relational algebra expressions.
Recommended from our members
A data-driven model for parallel interpretation of logic programms [sic]
The main objective of this paper is to present a model of computation which permits logic programs to be executed on a highly-parallel computer architecture. It demonstrates how logic programs may be converted into collections of dataflow graphs in which resolution is viewed as a process of finding matches between certain graph templates and portions of the dataflow graphs. This graph fitting process is carried out by tokens propogating asynchronously through the dataflow graph; thus computation is entirely data-driven, without the need for any centralized control. It is shown that at the implementation level the proposed model is very similar to a general dataflow system and hence a dataflow architecture could easily be extended to support the proposed model
Recommended from our members
AGM, a dataflow database machine
In recent years, a number of database machines consisting of large numbers of parallel processing elements have been proposed. Unfortunately, one of the main limitations to parallelism in database processing is the I/O bandwidth of the underlying storage devices. One way to solve this problem is to use multiple parallel disk units. The main problem with this approach, however, is the lack of a computational model capable of utilizing the potential of any significant number of such devices.This paper presents a database model which is based on the principles of data-driven computation. According to this model, the database is represented as a network in which each node is conceptually an independent processing element, capable of communicating with other nodes by exchanging messages along the network arcs. To answer a query, one or more such messages, called tokens, are created and injected into the network. These then propagate asynchronously through the network in the search of results satisfying the given query.To investigate the performance of the proposed system, we have implemented the model on a simulated computer architecture. The results of the simulation ex-periments indicate that the model is capable of exploiting the potential I/O band-width of a large number of disk units as well as the computational power of the associated processing elements
Recommended from our members
Set operations in semantic data models
Class creation by set operations has largely been ignored in the literature. Precise semantics of set operations on complex objects require a clear distinction between the dual notions of a set and a type, both of which are present in a class. Our paper fills this gap by presenting a framework for executing set-theoretic operations on the class construct. The proposed set operations determine both the type description of the derived class as well as its set membership. For the former, we develop inheritance rules for property characteristics such as single- versus multi-valued and required versus optional. For the later, we borrow the object identity concept from data modeling research. Our framework allows for property inheritance among classes that are not necessarily is-a related
Recommended from our members
Asynchronous data retrieval from an object-oriented database
We present an object-oriented semantic database model which, similar to other object-oriented systems, combines the virtues of four concepts: the functional data model, a property inheritance hierarchy, abstract data types and message-driven computation. The main emphasis is on the last of these four concepts. We describe generic procedures that permit queries to be processed in a purely message-driven manner. A database is represented as a network of nodes and directed arcs, in which each node is a logical processing element, capable of communicating with other nodes by exchanging messages. This eliminates the need for shared memory and for centralized control during query processing. Hence, the model is suitable for implementation on a multiprocessor computer architecture, consisting of large numbers of loosely coupled processing elements
Recommended from our members
Automatic view schema generation in object-oriented databases
An object-oriented data schema is a complex structure of classes interrelated via generalization and property decomposition relationships. We define an object-oriented view to be a virtual schema graph with possibly restructured generalization and decomposition hierarchies - rather than just one individual virtual class as proposed in the literature. In this paper, we propose a methodology, called MultiView, for supporting multiple such view schemata. MultiView is anchored on the following complementary ideas: (a) the view definer derives virtual classes and then integrates them into one consistent global schema graph and (b) the view definer specifies arbitrarily complex view schemata on this augmented global schema. The focus of this paper is, however, on the second, less explored, issue. This part of the view definition is performed using the following two steps: (1) view class selection and (2) view schema graph generation. For the first, we have developed a view definition language that can be used by the view definer to specify the selection of the desired view classes from the global schema. For the second, we have developed two algorithms that automatically augment the set of selected view classes to generate a complete, minimal and consistent view class generalization hierarchy. The first algorithm has linear complexity but it assumes that the global schema graph is a tree. The second algorithm overcomes this restricting assumption and thus allows for multiple inheritance, but it does so at the cost of a higher complexity
Bayesian Cluster Enumeration Criterion for Unsupervised Learning
We derive a new Bayesian Information Criterion (BIC) by formulating the
problem of estimating the number of clusters in an observed data set as
maximization of the posterior probability of the candidate models. Given that
some mild assumptions are satisfied, we provide a general BIC expression for a
broad class of data distributions. This serves as a starting point when
deriving the BIC for specific distributions. Along this line, we provide a
closed-form BIC expression for multivariate Gaussian distributed variables. We
show that incorporating the data structure of the clustering problem into the
derivation of the BIC results in an expression whose penalty term is different
from that of the original BIC. We propose a two-step cluster enumeration
algorithm. First, a model-based unsupervised learning algorithm partitions the
data according to a given set of candidate models. Subsequently, the number of
clusters is determined as the one associated with the model for which the
proposed BIC is maximal. The performance of the proposed two-step algorithm is
tested using synthetic and real data sets.Comment: 14 pages, 7 figure
Recommended from our members
Evaluating aggregate functions on possibilistic data
The need for extending information management systems to handle the imprecision of information found in the real world has been recognized. Fuzzy set theory together with possibility theory represent a uniform framework for extending the relational database model with these features. However, none of the existing proposals for handling imprecision in the literature has dealt with queries involving a functional evaluation of a set of items, traditionally referred to as aggregation. Two kinds of aggregate operators, namely, scalar aggregates and aggregate functions, exist. Both are important for most real-world applications, and are thus being supported by traditional languages like SQL or QUEL. This paper presents a framework for handling these two types of aggregates in the context of imprecise information. We consider three cases, specifically, aggregates within vague queries on precise data, aggregates within precisely specified queries on possibilistic data, and aggregates within vague queries on imprecise data. These extensions are based on fuzzy set-theoretical concepts such as the extension principle, the sigma-count operation, and the possibilistic expected value. The consistency and completeness of the proposed operations is shown
- …