5 research outputs found

    Storing and querying evolving knowledge graphs on the web

    Get PDF

    Syntactic and Semantic Interoperability Among Hydrologic Models

    Get PDF
    Development of integrated hydrologic models requires coupling of multidisciplinary, independent models and collaboration between different scientific communities. Component-based modeling provides an approach for the integration of models from different disciplines. A key advantage of component-based modeling is that it allows components to be created, tested, reused, extended, and maintained by a large group of model developers and end users. One significant challenge that must be addressed in creating an integrated hydrologic model using a component-based approach is enhancing the interoperability of components between different modeling communities and frameworks. The major goal of this work is to advance the integration of water related model components coming from different disciplines using the information underlying these models. This is achieved through addressing three specific research objectives. The first objective is to investigate the ability of component-based architecture to simulate feedback loops between hydrologic model components that share a boundary condition, and how data is transfered between temporally misaligned model components. The second objective is to promote the interoperability of components across water-related disciplinary boundaries and modeling frameworks by establishing an ontology for components\u27 metadata. The third study objective is to develop a domain-level ontology for defining hydrologic processes

    Near Data Processing for Efficient and Trusted Systems

    Full text link
    We live in a world which constantly produces data at a rate which only increases with time. Conventional processor architectures fail to process this abundant data in an efficient manner as they expend significant energy in instruction processing and moving data over deep memory hierarchies. Furthermore, to process large amounts of data in a cost effective manner, there is increased demand for remote computation. While cloud service providers have come up with innovative solutions to cater to this increased demand, the security concerns users feel for their data remains a strong impediment to their wide scale adoption. An exciting technique in our repertoire to deal with these challenges is near-data processing. Near-data processing (NDP) is a data-centric paradigm which moves computation to where data resides. This dissertation exploits NDP to both process the data deluge we face efficiently and design low-overhead secure hardware designs. To this end, we first propose Compute Caches, a novel NDP technique. Simple augmentations to underlying SRAM design enable caches to perform commonly used operations. In-place computation in caches not only avoids excessive data movement over memory hierarchy, but also significantly reduces instruction processing energy as independent sub-units inside caches perform computation in parallel. Compute Caches significantly improve the performance and reduce energy expended for a suite of data intensive applications. Second, this dissertation identifies security advantages of NDP. While memory bus side channel has received much attention, a low-overhead hardware design which defends against it remains elusive. We observe that smart memory, memory with compute capability, can dramatically simplify this problem. To exploit this observation, we propose InvisiMem which uses the logic layer in the smart memory to implement cryptographic primitives, which aid in addressing memory bus side channel efficiently. Our solutions obviate the need for expensive constructs like Oblivious RAM (ORAM) and Merkle trees, and have one to two orders of magnitude lower overheads for performance, space, energy, and memory bandwidth, compared to prior solutions. This dissertation also addresses a related vulnerability of page fault side channel in which the Operating System (OS) induces page faults to learn application's address trace and deduces application secrets from it. To tackle it, we propose Sanctuary which obfuscates page fault channel while allowing the OS to manage memory as a resource. To do so, we design a novel construct, Oblivious Page Management (OPAM) which is derived from ORAM but is customized for page management context. We employ near-memory page moves to reduce OPAM overhead and also propose a novel memory partition to reduce OPAM transactions required. For a suite of cloud applications which process sensitive data we show that page fault channel can be tackled at reasonable overheads.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144139/1/shaizeen_1.pd

    Behaviour on Linked Data - Specification, Monitoring, and Execution

    Get PDF
    People, organisations, and machines around the globe make use of web technologies to communicate. For instance, 4.16 bn people with access to the internet made 4.6 bn pages on the web accessible using the transfer protocol HTTP, organisations such as Amazon built ecosystems around the HTTP-based access to their businesses under the headline RESTful APIs, and the Linking Open Data movement has put billions of facts on the web available in the data model RDF via HTTP. Moreover, under the headline Web of Things, people use RDF and HTTP to access sensors and actuators on the Internet of Things. The necessary communication requires interoperable systems at a truly global scale, for which web technologies provide the necessary standards regarding the transfer and the representation of data: the HTTP protocol specifies how to transfer messages, besides defining the semantics of sending/receiving different types of messages, and the RDF family of languages specifies how to represent the data in the messages, besides providing means to elaborate the semantics of the data in the messages. The combination of HTTP and RDF -together with the shared assumption of HTTP and RDF to use URIs as identifiers- is called Linked Data. While the representation of static data in the context of Linked Data has been formally grounded in mathematical logic, a formal treatment of dynamics and behaviour on Linked Data is largely missing. We regard behaviour in this context as the way in which a system (e.g. a user agent or server) works, and this behaviour manifests itself in dynamic data. Using a formal treatment of behaviour on Linked Data, we could specify applications that use or provide Linked Data in a way that allows for formal analysis (e.g. expressivity, validation, verification). Using an experimental treatment of behaviour, or a treatment of the behaviour\u27s manifestation in dynamic data, we could better design the handling of Linked Data in applications. Hence, in this thesis, we investigate the notion of behaviour in the context of Linked Data. Specifically, we investigate the research question of how to capture the dynamics of Linked Data to inform the design of applications. The first contribution is a corpus that we built and analysed to monitor dynamic Linked Data on the web to study the update behaviour. We provide an extensive analysis to set up a long-term study of the dynamics of Linked Data on the web. We analyse data from the long-term study for dynamics on the level of accessing changing documents and on the level of changes within the documents. The second contribution is a model of computation for Linked Data that allows for expressing executable specifications of application behaviour. We provide a mapping from the conceptual foundations of the standards around Linked Data to Abstract State Machines, a Turing-complete model of computation rooted in mathematical logic. The third contribution is a workflow ontology and corresponding operational semantics to specify applications that execute and monitor behaviour in the context of Linked Data. Our approach allows for monitoring and executing behaviour specified in workflow models and respects the assumptions of the standards and practices around Linked Data. We evaluate our findings using the experimental corpus of dynamic Linked Data on the web and a synthetic benchmark from the Internet of Things, specifically the domain of building automation

    Rules and RDF Streams -A Position Paper

    No full text
    Abstract. We propose a minor extension of the Graph Store Protocol and the SPARQL syntax that should be sufficient to enable the application of rules to some kinds of RDF Streams (as defined by the the RDF Stream Processing W3C Community Group) to generate new RDF streams. The IRI identifying an RDF stream is extended with a query string field-value pair defining a sequence of windows of the stream. The extended IRI is passed to a data concentrator, which recasts the stream as a dynamic RDF dataset. A SPARQL query may then be applied to this dynamic dataset whenever it is updated, i.e. when the content of the next window as been fully received. The SPARQL query uses the CONSTRUCT form to generate a new element of the resultant RDF stream. The approach is illustrated with a prototypical example from the healthcare domain. Business Case RDF Graph Stores are increasingly used in the private and public sector to manage knowledge. In practice, although these Graph Stores can be dynamic, most individual transactions involve either complete replacement or small changes. When transactions are made by many users simultaneously or through streaming from data sources, the rate of change can be quite fast, relative to the timescale of query processing. Effectively delivering an up-to-date view (i.e. results of a stored query) of a dynamic Graph Store in real time requires a different approach to query processing than that implemented in most RDF query engines currently available. The RDF standards have not addressed dynamics to a significant extent, other than to informally propose a name: "We informally use the term RDF source to refer to a persistent yet mutable source or container of RDF graphs. An RDF source is a resource that may be said to have a state that can change over time. A snapshot of the state can be expressed as an RDF graph." sequence of RDF datasets called "timestamped RDF graphs", where a particular triple in the default graph has been designated as the timestamp triple. A sequence of overlapping windows (contiguous subsequences) can, in general, be considered as snapshots of a dynamic Graph Store 4 such that whenever the window shifts to a new position, a few stream elements expire from and a few new stream elements are added to the dynamic Graph Store. Business Drivers Autonomous, distributed sources will eventually yield data in quantities limited only by the aggregate bandwidth of wireless transmission media. It has been estimated, the market for devices and services in support of this processing will reach a value in excess of twelve digits by 2020 [2]. Based on the relative expenditures by established telecommunication firms -such as AT&T or Deutsche Telekom, where the software budget constitutes roughly one percent of their gross income The task to purposefully process this data promises to be formidable. A service which leverages the advantages of RDF data processing, to provide a mechanism for rule based integration, validation and evaluation of diverse streaming data sources, will offer significant advantage. Particular drivers in this setting are the following: -develop a mechanism to augment assertions during query processing of dynamic RDF Graph Stores, at basically the complexity of SPARQL construct -create means to express actions that should happen when a specified integrity constraint is violated by a dynamic RDF Graph Store -implement real-time processing of such features with good scaling properties in size and rate of change of the Graph Store -enable linked data to fullfil compliance requirements 4 Because blank nodes are shared between elements of an RDF stream, then the dynamic Graph Store must also share blank nodes between versions if this correspondence is to hold. Rules and RDF Streams 3 Due to the Open World Assumption, RDFS and OWL models cannot be used for validation of RDF according to a data model, except for the extreme case that results in inconsistency. SPARQL queries express validation and integrity constraints well, but the queries for these tasks are not intuitive, and thus are difficult to properly construct and maintain. It is imperative that extensions to incorporate rules and process data streams avoid compounding these difficulties. The challenge is to develop a language with the optimum satisfaction of the following characteristics: -syntax that is both intuitive for the audience (close to RDF) and sufficiently expressive -has well-defined execution semantics -can be integrated into remote service control and data flows. -permits real-time performance, which requires efficient target data-set generation (whether update diffs or inherent stream structure) and an activation network implementation. [6] RDF-Stream Options Numerous alternatives (see -treat queries as autonomous views in order to promote reuse; -separate query expressions from temporal attributes in order to permit combinations; -permit temporal attributes to be computed as an aspect of query processing; -remain compatible with standard SPARQL in order to permit sharing and simplify development -remain compatible with standard RDF data models -permit named graphs in order to fulfil application requirements and permit compatibility with standard encodings -restrict the processing model to just essential components in order to limit the deployment and management complexity This perspective eliminates those proposals which either reify statements in the data model or add a temporal term. It also argues against a processing model which involves query re-writing and delegation. It also eliminates language extensions which encode temporal attributes as static values in special clauses. It leads to the proposal for Dydra, to combine REVISION and WINDOW specifications for temporal properties to designate the target dataset, and to make the temporal state available through the function THEN (e.g. returns the time when the processor had received all the elements in a stream window) and permit the window or revision expression to contain variables, but otherwise to conform to SPARQL 1.1. Rule-based Options There are several existing languages for applying rules to RDF. SPARQL Inferencing Notation (SPIN), also called SPARQL Rules, is a W3C Member Submission 5 as well as a "defacto industry standard" 6 implemented by Top Braid. The Member Submission defines the SPIN syntax via a schema, but does not provide semantics. In particular, although the SPIN submission is claimed to be an alternative syntax for SPARQL, the transformation from SPIN to SPARQL has not been published. Further, although SPIN uses an RDF-based syntax, this alone is not sufficient to achieve rule semantics; a sufficiently expressive entailment regime must be defined (e.g. based on the as-yet unpublished transformation to SPARQL) that asserts the constructed source jointly with the queried sources. Shape Expressions Language (ShEx) is specified in a W3C member submission 7 . It is intended to fill the same role for RDF graphs that the schema language Relax NG fills for XML, and its design is inspired by Relax NG, particular in its foundation on regular expressions. ShEx has both a compact and an RDF-based syntax, in the same way that Relax NG has a compact and an XML-based syntax. The execution semantics of ShEx is well-defined, but is of limited expressivity. In particular, ShEx does not support named graphs or RDF Datasets. ShEx precedents are strictly targeted REST [19]; it is not clear whether ShEx is applicable to streams. ShEx does not support modularization, but this extension could be easily incorporated following the approach of Relax NG, or, better, a monotonic restriction of that approach [56]] G r a p h P a t t e r n N o t T r i p l e s ::= O p t i o n a l G r a p h P a t t e r n | G r o u p O r U n i o n G r a p h P a t t e r n | M i n u s G r a p h P a t t e r n | G r a p h G r a p h P a t t e r n | R e v i s i o n G r a p h P a t t e r n | W i n d o w s G r a p h P a t t e r n | S e r v i c e G r a p h P a t t e r n [[60 a ]] R e v i s i o n G r a p h P a t t e r Fig. 1. SPARQL grammar extension Due to the coverage of e.g. relations and functions with arbitrary arity, the syntax is not close to RDF triples. Rule Interchange Format [21] (RIF) is a W3C Recommendation. It includes a standard mapping from RDF triples to RIF frames 10 , allowing rules to be applied to RDF triples. This approach requires recasting RDF into RIF, violating the requirement to remain close to RDF. Also it is not clear if RIF is applicable to RDF datasets and Graph Stores, as the mapping of RDF triples to RIF frames does not address this form. Each of the existing languages described above has drawbacks that make it unsatisfactory relative to the evaluation criteria of the business case. Instead, we consider a minor extension to SPARQL and the Graph Store protocol. Figures 3 and 4 illustrates a query which combines static base data with a temporal clause -in this case in the "validity time" domain, with a clause for which the dataset constitutes a stream. In For simplicity, in the concatenation above, the IRI of the RDF stream is assumed to not already have a query string. It is straight-forward to accommodate this possibility in the query. Implementation Approach The proposed approach, to express temporal specifications in a SPARQL dataset designator, has its origin in the unique architecture of Dydra 12 , Datagraph's SPARQL query service. This comprises two independent components, the RDF data store and the SPARQL query processor. These communicate through a limited interface, which allows the processor just to associate a specific dataset revision with an access transaction as the target and to use statement patterns to perform count, match and scan operations on this target. The following changes will suffice to support streams based on this architecture: -Extend the RDF store SPARQL Graph Store Protocol implementation to interpret intra-request state to indicate transaction boundaries. This can be HTTP chunking boundaries, RDF graph boundaries, or some other marker. Based on current import rates, the store should be able to accept 10 5 statements per second per stream with the effective rate for sensor readings dependent on the number of statements per reading. -Extend the dataset designators to designate the dataset state which corresponds to some specific set of transactions. The specification will permit combinations of temporal and cardinality constraints. -Extend the query processor control model from one which performs a single reduction of an algebra graph to one in which individual branches can be reiterated according to their temporal attributes, while unchanged intermediate results are cached and re-used. -Extend the response encoding logic to permit alternative stream packaging options, among them, repeated http header/content encoding and http chunking with and without boundary significance. Conclusion and Future Work We have presented the grammar for a SPARQL extension and described a related Graph Store protocol extension that enables a sliding window operation on an RDF stream to be interpreted as a dynamic dataset. We illustrated the usage of this extension to describe a continuous query, with rules given in the SPARQL CONSTRUCT/WHERE form, on an RDF Stream in the distinct time-series profile with a sample query and a data propagation diagram. Future work includes generalizing the SPARQL and Graph Store protocol extensions to cover a larger class of RDF streams, and demonstrating the feasibility and performance of the extension with a prototype implementation
    corecore