17,642 research outputs found
Using Visualization to Support Data Mining of Large Existing Databases
In this paper. we present ideas how visualization technology can be used to improve the difficult process of querying very large databases. With our VisDB system, we try to provide visual support not only for the query specification process. but also for evaluating query results and. thereafter, refining the query accordingly. The main idea of our system is to represent as many data items as possible by the pixels of the display device. By arranging and coloring the pixels according to the relevance for the query, the user gets a visual impression of the resulting data set and of its relevance for the query. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. By using multiple windows for different parts of the query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. To support complex queries, we introduce the notion of approximate joins which allow the user to find data items that only approximately fulfill join conditions. We also present ideas how our technique may be extended to support the interoperation of heterogeneous databases. Finally, we discuss the performance problems that are caused by interfacing to existing database systems and present ideas to solve these problems by using data structures supporting a multidimensional search of the database
On Defining SPARQL with Boolean Tensor Algebra
The Resource Description Framework (RDF) represents information as
subject-predicate-object triples. These triples are commonly interpreted as a
directed labelled graph. We propose an alternative approach, interpreting the
data as a 3-way Boolean tensor. We show how SPARQL queries - the standard
queries for RDF - can be expressed as elementary operations in Boolean algebra,
giving us a complete re-interpretation of RDF and SPARQL. We show how the
Boolean tensor interpretation allows for new optimizations and analyses of the
complexity of SPARQL queries. For example, estimating the size of the results
for different join queries becomes much simpler
Numeric Input Relations for Relational Learning with Applications to Community Structure Analysis
Most work in the area of statistical relational learning (SRL) is focussed on
discrete data, even though a few approaches for hybrid SRL models have been
proposed that combine numerical and discrete variables. In this paper we
distinguish numerical random variables for which a probability distribution is
defined by the model from numerical input variables that are only used for
conditioning the distribution of discrete response variables. We show how
numerical input relations can very easily be used in the Relational Bayesian
Network framework, and that existing inference and learning methods need only
minor adjustments to be applied in this generalized setting. The resulting
framework provides natural relational extensions of classical probabilistic
models for categorical data. We demonstrate the usefulness of RBN models with
numeric input relations by several examples.
In particular, we use the augmented RBN framework to define probabilistic
models for multi-relational (social) networks in which the probability of a
link between two nodes depends on numeric latent feature vectors associated
with the nodes. A generic learning procedure can be used to obtain a
maximum-likelihood fit of model parameters and latent feature values for a
variety of models that can be expressed in the high-level RBN representation.
Specifically, we propose a model that allows us to interpret learned latent
feature values as community centrality degrees by which we can identify nodes
that are central for one community, that are hubs between communities, or that
are isolated nodes. In a multi-relational setting, the model also provides a
characterization of how different relations are associated with each community
Towards More Data-Aware Application Integration (extended version)
Although most business application data is stored in relational databases,
programming languages and wire formats in integration middleware systems are
not table-centric. Due to costly format conversions, data-shipments and faster
computation, the trend is to "push-down" the integration operations closer to
the storage representation.
We address the alternative case of defining declarative, table-centric
integration semantics within standard integration systems. For that, we replace
the current operator implementations for the well-known Enterprise Integration
Patterns by equivalent "in-memory" table processing, and show a practical
realization in a conventional integration system for a non-reliable,
"data-intensive" messaging example. The results of the runtime analysis show
that table-centric processing is promising already in standard, "single-record"
message routing and transformations, and can potentially excel the message
throughput for "multi-record" table messages.Comment: 18 Pages, extended version of the contribution to British
International Conference on Databases (BICOD), 2015, Edinburgh, Scotlan
Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms
Many, if not most network analysis algorithms have been designed specifically
for single-relational networks; that is, networks in which all edges are of the
same type. For example, edges may either represent "friendship," "kinship," or
"collaboration," but not all of them together. In contrast, a multi-relational
network is a network with a heterogeneous set of edge labels which can
represent relationships of various types in a single data structure. While
multi-relational networks are more expressive in terms of the variety of
relationships they can capture, there is a need for a general framework for
transferring the many single-relational network analysis algorithms to the
multi-relational domain. It is not sufficient to execute a single-relational
network analysis algorithm on a multi-relational network by simply ignoring
edge labels. This article presents an algebra for mapping multi-relational
networks to single-relational networks, thereby exposing them to
single-relational network analysis algorithms.Comment: ISSN:1751-157
- …