249 research outputs found

    A Typed Model for Linked Data

    No full text
    The term Linked Data is used to describe ubiquitous and emerging semi-structured data formats on the Web. URIs in Linked Data allow diverse data sources to link to each other, forming a Web of Data. A calculus which models concurrent queries and updates over Linked Data is presented. The calculus exhibits operations essential for declaring rich atomic actions. The operations recover emergent structure in the loosely structured Web of Data. The calculus is executable due to its operational semantics. A light type system ensures that URIs with a distinguished role are used consistently. The main theorem verifies that the light type system and operational semantics work at the same level of granularity, so are compatible. Examples show that a range of existing and emerging standards are captured. Data formats include RDF, named graphs and feeds. The primitives of the calculus model SPARQL Query and the Atom Publishing Protocol. The subtype system is based on RDFS, which improves interoperability. Examples focuss on the SPARQL Update proposal for which a fine grained operational semantics is developed. Further potential high level languages are outlined for exploiting Linked Data

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Knowledge-driven Artificial Intelligence in Steelmaking: Towards Industry 4.0

    Get PDF
    With the ongoing emergence of the Fourth Industrial Revolution, often referred to as Indus-try 4.0, new innovations, concepts, and standards are reshaping manufacturing processes and production, leading to intelligent cyber-physical systems and smart factories. Steel production is one important manufacturing process that is undergoing this digital transfor-mation. Realising this vision in steel production comes with unique challenges, including the seamless interoperability between diverse and complex systems, the uniformity of het-erogeneous data, and a need for standardised human-to-machine and machine-to-machine communication protocols. To address these challenges, international standards have been developed, and new technologies have been introduced and studied in both industry and academia. However, due to the vast quantity, scale, and heterogeneous nature of industrial data and systems, achieving interoperability among components within the context of Industry 4.0 remains a challenge, requiring the need for formal knowledge representation capabilities to enhance the understanding of data and information. In response, semantic-based technologies have been proposed as a method to capture knowledge from data and resolve incompatibility conļ¬‚icts within Industry 4.0 scenarios. We propose utilising fundamental Semantic Web concepts, such as ontologies and knowledge graphs, speciļ¬cally to enhance semantic interoperability, improve data integration, and standardise data across heterogeneous systems within the context of steelmaking. Addition-ally, we investigate ongoing trends that involve the integration of Machine Learning (ML)techniques with semantic technologies, resulting in the creation of hybrid models. These models capitalise on the strengths derived from the intersection of these two AI approaches.Furthermore, we explore the need for continuous reasoning over data streams, presenting preliminary research that combines ML and semantic technologies in the context of data streams. In this thesis, we make four main contributions: (1) We discover that a clear under-standing of semantic-based asset administration shells, an international standard within the RAMI 4.0 model, was lacking, and provide an extensive survey on semantic-based implementations of asset administration shells. We focus on literature that utilises semantic technologies to enhance the representation, integration, and exchange of information in an industrial setting. (2) The creation of an ontology, a semantic knowledge base, which speciļ¬cally captures the cold rolling processes in steelmaking. We demonstrate use cases that leverage these semantic methodologies with real-world industrial data for data access, data integration, data querying, and condition-based maintenance purposes. (3) A frame-work demonstrating one approach for integrating machine learning models with semantic technologies to aid decision-making in the domain of steelmaking. We showcase a novel approach of applying random forest classiļ¬cation using rule-based reasoning, incorporating both meta-data and external domain expert knowledge into the model, resulting in improved knowledge-guided assistance for the human-in-the-loop during steelmaking processes. (4) The groundwork for a continuous data stream reasoning framework, where both domain expert knowledge and random forest classiļ¬cation can be dynamically applied to data streams on the ļ¬‚y. This approach opens up possibilities for real-time condition-based monitoring and real-time decision support for predictive maintenance applications. We demonstrate the adaptability of the framework in the context of dynamic steel production processes. Our contributions have been validated on both real-world data sets with peer-reviewed conferences and journals, as well as through collaboration with domain experts from our industrial partners at Tata Steel

    Accurate sampling-based cardinality estimation for complex graph queries

    Get PDF
    Accurately estimating the cardinality (i.e., the number of answers) of complex queries plays a central role in database systems. This problem is particularly difficult in graph databases, where queries often involve a large number of joins and self-joins. Recently, Park et al. [54] surveyed seven state-of-the-art cardinality estimation approaches for graph queries. The results of their extensive empirical evaluation show that a sampling method based on the WanderJoin online aggregation algorithm [46] consistently offers superior accuracy. We extended the framework by Park et al. [54] with three additional datasets and repeated their experiments. Our results showed that WanderJoin is indeed very accurate, but it can often take a large number of samples and thus be very slow. Moreover, when queries are complex and data distributions are skewed, it often fails to find valid samples and estimates the cardinality as zero. Finally, complex graph queries often go beyond simple graph matching and involve arbitrary nesting of relational operators such as disjunction, difference, and duplicate elimination. Neither of the methods considered by Park et al. [54] is applicable to such queries. In this paper we present a novel approach for estimating the cardinality of complex graph queries. Our approach is inspired by WanderJoin, but, unlike all approaches known to us, it can process complex queries with arbitrary operator nesting. Our estimator is strongly consistent, meaning that the average of repeated estimates converges with probability one to the actual cardinality. We present optimisations of the basic algorithm that aim to reduce the chance of producing zero estimates and improve accuracy. We show empirically that our approach is both accurate and quick on complex queries and large datasets. Finally, we discuss how to integrate our approach into a simple dynamic programming query planner, and we confirm empirically that our planner produces high-quality plans that can significantly reduce end-to-end query evaluation times

    EXCLAIM framework: a monitoring and analysis framework to support self-governance in Cloud Application Platforms

    Get PDF
    The Platform-as-a-Service segment of Cloud Computing has been steadily growing over the past several years, with more and more software developers opting for cloud platforms as convenient ecosystems for developing, deploying, testing and maintaining their software. Such cloud platforms also play an important role in delivering an easily-accessible Internet of Services. They provide rich support for software development, and, following the principles of Service-Oriented Computing, offer their subscribers a wide selection of pre-existing, reliable and reusable basic services, available through a common platform marketplace and ready to be seamlessly integrated into users' applications. Such cloud ecosystems are becoming increasingly dynamic and complex, and one of the major challenges faced by cloud providers is to develop appropriate scalable and extensible mechanisms for governance and control based on run-time monitoring and analysis of (extreme amounts of) raw heterogeneous data. In this thesis we address this important research question -- \textbf{how can we support self-governance in cloud platforms delivering the Internet of Services in the presence of large amounts of heterogeneous and rapidly changing data?} To address this research question and demonstrate our approach, we have created the Extensible Cloud Monitoring and Analysis (EXCLAIM) framework for service-based cloud platforms. The main idea underpinning our approach is to encode monitored heterogeneous data using Semantic Web languages, which then enables us to integrate these semantically enriched observation streams with static ontological knowledge and to apply intelligent reasoning. This has allowed us to create an extensible, modular, and declaratively defined architecture for performing run-time data monitoring and analysis with a view to detecting critical situations within cloud platforms. By addressing the main research question, our approach contributes to the domain of Cloud Computing, and in particular to the area of autonomic and self-managing capabilities of service-based cloud platforms. Our main contributions include the approach itself, which allows monitoring and analysing heterogeneous data in an extensible and scalable manner, the prototype of the EXCLAIM framework, and the Cloud Sensor Ontology. Our research also contributes to the state of the art in Software Engineering by demonstrating how existing techniques from several fields (i.e., Autonomic Computing, Service-Oriented Computing, Stream Processing, Semantic Sensor Web, and Big Data) can be combined in a novel way to create an extensible, scalable, modular, and declaratively defined monitoring and analysis solution

    Tuple-Generating Dependencies Capture Complex Values

    Get PDF
    We formalise a variant of Datalog that allows complex values constructed by nesting elements of the input database in sets and tuples. We study its complexity and give a translation into sets of tuple-generating dependencies (TGDs) for which the standard chase terminates on any input database. We identify a fragment for which reasoning is tractable. As membership is undecidable for this fragment, we develop decidable sufficient conditions

    The Origin of Data: Enabling the Determination of Provenance in Multi-institutional Scientific Systems through the Documentation of Processes

    Get PDF
    The Oxford English Dictionary defines provenance as (i) the fact of coming from some particular source or quarter; origin, derivation. (ii) the history or pedigree of a work of art, manuscript, rare book, etc.; concr., a record of the ultimate derivation and passage of an item through its various owners. In art, knowing the provenance of an artwork lends weight and authority to it while providing a context for curators and the public to understand and appreciate the workā€™s value. Without such a documented history, the work may be misunderstood, unappreciated, or undervalued. In computer systems, knowing the provenance of digital objects would provide them with greater weight, authority, and context just as it does for works of art. Specifically, if the provenance of digital objects could be determined, then users could understand how documents were produced, how simulation results were generated, and why decisions were made. Provenance is of particular importance in science, where experimental results are reused, reproduced, and verified. However, science is increasingly being done through large-scale collaborations that span multiple institutions, which makes the problem of determining the provenance of scientific results significantly harder. Current approaches to this problem are not designed specifically for multi-institutional scientific systems and their evolution towards greater dynamic and peer-to-peer topologies. Therefore, this thesis advocates a new approach, namely, that through the autonomous creation, scalable recording, and principled organisation of documentation of systemsā€™ processes, the determination of the provenance of results produced by complex multi-institutional scientific systems is enabled. The dissertation makes four contributions to the state of the art. First is the idea that provenance is a query performed over documentation of a systemā€™s past process. Thus, the problem is one of how to collect and collate documentation from multiple distributed sources and organise it in a manner that enables the provenance of a digital object to be determined. Second is an open, generic, shared, principled data model for documentation of processes, which enables its collation so that it provides high-quality evidence that a systemā€™s processes occurred. Once documentation has been created, it is recorded into specialised repositories called provenance stores using a formally specified protocol, which ensures documentation has high-quality characteristics. Furthermore, patterns and techniques are given to permit the distributed deployment of provenance stores. The protocol and patterns are the third contribution. The fourth contribution is a characterisation of the use of documentation of process to answer questions related to the provenance of digital objects and the impact recording has on application performance. Specifically, in the context of a bioinformatics case study, it is shown that six different provenance use cases are answered given an overhead of 13% on experiment run-time. Beyond the case study, the solution has been applied to other applications including fault tolerance in service-oriented systems, aerospace engineering, and organ transplant management
    • ā€¦
    corecore