1,411 research outputs found

    GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

    Full text link
    From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use

    Framework for Live Synchronization of RDF Views of Relational Data

    Get PDF
    This Demo presents a framework for the live synchronization of an RDF view defined on top of relational database. In the proposed framework, rules are responsible for computing and publishing the changeset required for the RDB-RDF view to stay synchronized with the relational database. The computed changesets are then used for the incremental maintenance of the RDB_RDF views as well as application views. The Demo is based on the LinkedBrainz Live tool, developed to validate the proposed framework

    A Potpourri of Reason Maintenance Methods

    Get PDF
    We present novel methods to compute changes to materialized views in logic databases like those used by rule-based reasoners. Such reasoners have to address the problem of changing axioms in the presence of materializations of derived atoms. Existing approaches have drawbacks: some require to generate and evaluate large transformed programs that are in Datalog - while the source program is in Datalog and significantly smaller; some recompute the whole extension of a predicate even if only a small part of this extension is affected by the change. The methods presented in this article overcome these drawbacks and derive additional information useful also for explanation, at the price of an adaptation of the semi-naive forward chaining

    Twelve Theses on Reactive Rules for the Web

    Get PDF
    Reactivity, the ability to detect and react to events, is an essential functionality in many information systems. In particular, Web systems such as online marketplaces, adaptive (e.g., recommender) systems, and Web services, react to events such as Web page updates or data posted to a server. This article investigates issues of relevance in designing high-level programming languages dedicated to reactivity on the Web. It presents twelve theses on features desirable for a language of reactive rules tuned to programming Web and Semantic Web applications

    Constructive Reasoning for Semantic Wikis

    Get PDF
    One of the main design goals of social software, such as wikis, is to support and facilitate interaction and collaboration. This dissertation explores challenges that arise from extending social software with advanced facilities such as reasoning and semantic annotations and presents tools in form of a conceptual model, structured tags, a rule language, and a set of novel forward chaining and reason maintenance methods for processing such rules that help to overcome the challenges. Wikis and semantic wikis were usually developed in an ad-hoc manner, without much thought about the underlying concepts. A conceptual model suitable for a semantic wiki that takes advanced features such as annotations and reasoning into account is proposed. Moreover, so called structured tags are proposed as a semi-formal knowledge representation step between informal and formal annotations. The focus of rule languages for the Semantic Web has been predominantly on expert users and on the interplay of rule languages and ontologies. KWRL, the KiWi Rule Language, is proposed as a rule language for a semantic wiki that is easily understandable for users as it is aware of the conceptual model of a wiki and as it is inconsistency-tolerant, and that can be efficiently evaluated as it builds upon Datalog concepts. The requirement for fast response times of interactive software translates in our work to bottom-up evaluation (materialization) of rules (views) ahead of time – that is when rules or data change, not when they are queried. Materialized views have to be updated when data or rules change. While incremental view maintenance was intensively studied in the past and literature on the subject is abundant, the existing methods have surprisingly many disadvantages – they do not provide all information desirable for explanation of derived information, they require evaluation of possibly substantially larger Datalog programs with negation, they recompute the whole extension of a predicate even if only a small part of it is affected by a change, they require adaptation for handling general rule changes. A particular contribution of this dissertation consists in a set of forward chaining and reason maintenance methods with a simple declarative description that are efficient and derive and maintain information necessary for reason maintenance and explanation. The reasoning methods and most of the reason maintenance methods are described in terms of a set of extended immediate consequence operators the properties of which are proven in the classical logical programming framework. In contrast to existing methods, the reason maintenance methods in this dissertation work by evaluating the original Datalog program – they do not introduce negation if it is not present in the input program – and only the affected part of a predicate’s extension is recomputed. Moreover, our methods directly handle changes in both data and rules; a rule change does not need to be handled as a special case. A framework of support graphs, a data structure inspired by justification graphs of classical reason maintenance, is proposed. Support graphs enable a unified description and a formal comparison of the various reasoning and reason maintenance methods and define a notion of a derivation such that the number of derivations of an atom is always finite even in the recursive Datalog case. A practical approach to implementing reasoning, reason maintenance, and explanation in the KiWi semantic platform is also investigated. It is shown how an implementation may benefit from using a graph database instead of or along with a relational database

    Modeling views in the layered view model for XML using UML

    Get PDF
    In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources. Conversely, since the introduction of Extensible Markup Language (XML), it is fast emerging as the dominant standard for storing, describing, and interchanging data among various web and heterogeneous data sources. In combination with XML Schema, XML provides rich facilities for defining and constraining user-defined data semantics and properties, a feature that is unique to XML. In this context, it is interesting to investigate traditional database features, such as view models and view design techniques for XML. However, traditional view formalisms are strongly coupled to the data language and its syntax, thus it proves to be a difficult task to support views in the case of semi-structured data models. Therefore, in this paper we propose a Layered View Model (LVM) for XML with conceptual and schemata extensions. Here our work is three-fold; first we propose an approach to separate the implementation and conceptual aspects of the views that provides a clear separation of concerns, thus, allowing analysis and design of views to be separated from their implementation. Secondly, we define representations to express and construct these views at the conceptual level. Thirdly, we define a view transformation methodology for XML views in the LVM, which carries out automated transformation to a view schema and a view query expression in an appropriate query language. Also, to validate and apply the LVM concepts, methods and transformations developed, we propose a view-driven application development framework with the flexibility to develop web and database applications for XML, at varying levels of abstraction

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    Web ontology reasoning with logic databases [online]

    Get PDF
    corecore