1,411 research outputs found
GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
From social networks to language modeling, the growing scale and importance
of graph data has driven the development of numerous new graph-parallel systems
(e.g., Pregel, GraphLab). By restricting the computation that can be expressed
and introducing new techniques to partition and distribute the graph, these
systems can efficiently execute iterative graph algorithms orders of magnitude
faster than more general data-parallel systems. However, the same restrictions
that enable the performance gains also make it difficult to express many of the
important stages in a typical graph-analytics pipeline: constructing the graph,
modifying its structure, or expressing computation that spans multiple graphs.
As a consequence, existing graph analytics pipelines compose graph-parallel and
data-parallel systems using external storage systems, leading to extensive data
movement and complicated programming model.
To address these challenges we introduce GraphX, a distributed graph
computation framework that unifies graph-parallel and data-parallel
computation. GraphX provides a small, core set of graph-parallel operators
expressive enough to implement the Pregel and PowerGraph abstractions, yet
simple enough to be cast in relational algebra. GraphX uses a collection of
query optimization techniques such as automatic join rewrites to efficiently
implement these graph-parallel operators. We evaluate GraphX on real-world
graphs and workloads and demonstrate that GraphX achieves comparable
performance as specialized graph computation systems, while outperforming them
in end-to-end graph pipelines. Moreover, GraphX achieves a balance between
expressiveness, performance, and ease of use
Framework for Live Synchronization of RDF Views of Relational Data
This Demo presents a framework for the live synchronization of an RDF view defined on top of relational database. In the proposed framework, rules are responsible for computing and publishing the changeset required for the RDB-RDF view to stay synchronized with the relational database. The computed changesets are then used for the incremental maintenance of the RDB_RDF views as well as application views. The Demo is based on the LinkedBrainz Live tool, developed to validate the proposed framework
A Potpourri of Reason Maintenance Methods
We present novel methods to compute changes to materialized
views in logic databases like those used by rule-based reasoners.
Such reasoners have to address the problem of changing axioms in the
presence of materializations of derived atoms. Existing approaches have
drawbacks: some require to generate and evaluate large transformed programs
that are in Datalog - while the source program is in Datalog and
significantly smaller; some recompute the whole extension of a predicate
even if only a small part of this extension is affected by the change.
The methods presented in this article overcome these drawbacks and derive
additional information useful also for explanation, at the price of an
adaptation of the semi-naive forward chaining
Twelve Theses on Reactive Rules for the Web
Reactivity, the ability to detect and react to events, is an
essential functionality in many information systems. In particular, Web
systems such as online marketplaces, adaptive (e.g., recommender) systems,
and Web services, react to events such as Web page updates or
data posted to a server.
This article investigates issues of relevance in designing high-level programming
languages dedicated to reactivity on the Web. It presents
twelve theses on features desirable for a language of reactive rules tuned
to programming Web and Semantic Web applications
Constructive Reasoning for Semantic Wikis
One of the main design goals of social software, such as wikis, is to
support and facilitate interaction and collaboration. This dissertation
explores challenges that arise from extending social software with
advanced facilities such as reasoning and semantic annotations and
presents tools in form of a conceptual model, structured tags, a rule
language, and a set of novel forward chaining and reason maintenance
methods for processing such rules that help to overcome the
challenges.
Wikis and semantic wikis were usually developed in an ad-hoc
manner, without much thought about the underlying concepts. A conceptual
model suitable for a semantic wiki that takes advanced features
such as annotations and reasoning into account is proposed. Moreover,
so called structured tags are proposed as a semi-formal knowledge
representation step between informal and formal annotations.
The focus of rule languages for the Semantic Web has been predominantly
on expert users and on the interplay of rule languages
and ontologies. KWRL, the KiWi Rule Language, is proposed as a
rule language for a semantic wiki that is easily understandable for
users as it is aware of the conceptual model of a wiki and as it
is inconsistency-tolerant, and that can be efficiently evaluated as it
builds upon Datalog concepts.
The requirement for fast response times of interactive software
translates in our work to bottom-up evaluation (materialization) of
rules (views) ahead of time – that is when rules or data change, not
when they are queried. Materialized views have to be updated when
data or rules change. While incremental view maintenance was intensively
studied in the past and literature on the subject is abundant,
the existing methods have surprisingly many disadvantages – they
do not provide all information desirable for explanation of derived
information, they require evaluation of possibly substantially larger
Datalog programs with negation, they recompute the whole extension
of a predicate even if only a small part of it is affected by a
change, they require adaptation for handling general rule changes.
A particular contribution of this dissertation consists in a set of
forward chaining and reason maintenance methods with a simple declarative
description that are efficient and derive and maintain information
necessary for reason maintenance and explanation. The reasoning
methods and most of the reason maintenance methods are described
in terms of a set of extended immediate consequence operators the
properties of which are proven in the classical logical programming
framework. In contrast to existing methods, the reason maintenance methods in this dissertation work by evaluating the original Datalog
program – they do not introduce negation if it is not present in the input
program – and only the affected part of a predicate’s extension is
recomputed. Moreover, our methods directly handle changes in both
data and rules; a rule change does not need to be handled as a special
case.
A framework of support graphs, a data structure inspired by justification
graphs of classical reason maintenance, is proposed. Support
graphs enable a unified description and a formal comparison of the
various reasoning and reason maintenance methods and define a notion
of a derivation such that the number of derivations of an atom is
always finite even in the recursive Datalog case.
A practical approach to implementing reasoning, reason maintenance,
and explanation in the KiWi semantic platform is also investigated. It
is shown how an implementation may benefit from using a graph
database instead of or along with a relational database
Modeling views in the layered view model for XML using UML
In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources. Conversely, since the introduction of Extensible Markup Language (XML), it is fast emerging as the dominant standard for storing, describing, and interchanging data among various web and heterogeneous data sources. In combination with XML Schema, XML provides rich facilities for defining and constraining user-defined data semantics and properties, a feature that is unique to XML. In this context, it is interesting to investigate traditional database features, such as view models and view design techniques for XML. However, traditional view formalisms are strongly coupled to the data language and its syntax, thus it proves to be a difficult task to support views in the case of semi-structured data models. Therefore, in this paper we propose a Layered View Model (LVM) for XML with conceptual and schemata extensions. Here our work is three-fold; first we propose an approach to separate the implementation and conceptual aspects of the views that provides a clear separation of concerns, thus, allowing analysis and design of views to be separated from their implementation. Secondly, we define representations to express and construct these views at the conceptual level. Thirdly, we define a view transformation methodology for XML views in the LVM, which carries out automated transformation to a view schema and a view query expression in an appropriate query language. Also, to validate and apply the LVM concepts, methods and transformations developed, we propose a view-driven application development framework with the flexibility to develop web and database applications for XML, at varying levels of abstraction
A unified view of data-intensive flows in business intelligence systems : a survey
Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft
- …