81 research outputs found

    Temporal description logic for ontology-based data access

    Get PDF
    Our aim is to investigate ontology-based data access over temporal data with validity time and ontologies capable of temporal conceptual modelling. To this end, we design a temporal description logic, TQL, that extends the standard ontology language OWL2QL, provides basic means for temporal conceptual modelling and ensures first-order rewritability of conjunctive queries for suitably defined data instances with validity time

    Semantically defined Analytics for Industrial Equipment Diagnostics

    Get PDF
    In this age of digitalization, industries everywhere accumulate massive amount of data such that it has become the lifeblood of the global economy. This data may come from various heterogeneous systems, equipment, components, sensors, systems and applications in many varieties (diversity of sources), velocities (high rate of changes) and volumes (sheer data size). Despite significant advances in the ability to collect, store, manage and filter data, the real value lies in the analytics. Raw data is meaningless, unless it is properly processed to actionable (business) insights. Those that know how to harness data effectively, have a decisive competitive advantage, through raising performance by making faster and smart decisions, improving short and long-term strategic planning, offering more user-centric products and services and fostering innovation. Two distinct paradigms in practice can be discerned within the field of analytics: semantic-driven (deductive) and data-driven (inductive). The first emphasizes logic as a way of representing the domain knowledge encoded in rules or ontologies and are often carefully curated and maintained. However, these models are often highly complex, and require intensive knowledge processing capabilities. Data-driven analytics employ machine learning (ML) to directly learn a model from the data with minimal human intervention. However, these models are tuned to trained data and context, making it difficult to adapt. Industries today that want to create value from data must master these paradigms in combination. However, there is great need in data analytics to seamlessly combine semantic-driven and data-driven processing techniques in an efficient and scalable architecture that allows extracting actionable insights from an extreme variety of data. In this thesis, we address these needs by providing: • A unified representation of domain-specific and analytical semantics, in form of ontology models called TechOnto Ontology Stack. It is highly expressive, platform-independent formalism to capture conceptual semantics of industrial systems such as technical system hierarchies, component partonomies etc and its analytical functional semantics. • A new ontology language Semantically defined Analytical Language (SAL) on top of the ontology model that extends existing DatalogMTL (a Horn fragment of Metric Temporal Logic) with analytical functions as first class citizens. • A method to generate semantic workflows using our SAL language. It helps in authoring, reusing and maintaining complex analytical tasks and workflows in an abstract fashion. • A multi-layer architecture that fuses knowledge- and data-driven analytics into a federated and distributed solution. To our knowledge, the work in this thesis is one of the first works to introduce and investigate the use of the semantically defined analytics in an ontology-based data access setting for industrial analytical applications. The reason behind focusing our work and evaluation on industrial data is due to (i) the adoption of semantic technology by the industries in general, and (ii) the common need in literature and in practice to allow domain expertise to drive the data analytics on semantically interoperable sources, while still harnessing the power of analytics to enable real-time data insights. Given the evaluation results of three use-case studies, our approach surpass state-of-the-art approaches for most application scenarios.Im Zeitalter der Digitalisierung sammeln die Industrien überall massive Daten-mengen, die zum Lebenselixier der Weltwirtschaft geworden sind. Diese Daten können aus verschiedenen heterogenen Systemen, Geräten, Komponenten, Sensoren, Systemen und Anwendungen in vielen Varianten (Vielfalt der Quellen), Geschwindigkeiten (hohe Änderungsrate) und Volumina (reine Datengröße) stammen. Trotz erheblicher Fortschritte in der Fähigkeit, Daten zu sammeln, zu speichern, zu verwalten und zu filtern, liegt der eigentliche Wert in der Analytik. Rohdaten sind bedeutungslos, es sei denn, sie werden ordnungsgemäß zu verwertbaren (Geschäfts-)Erkenntnissen verarbeitet. Wer weiß, wie man Daten effektiv nutzt, hat einen entscheidenden Wettbewerbsvorteil, indem er die Leistung steigert, indem er schnellere und intelligentere Entscheidungen trifft, die kurz- und langfristige strategische Planung verbessert, mehr benutzerorientierte Produkte und Dienstleistungen anbietet und Innovationen fördert. In der Praxis lassen sich im Bereich der Analytik zwei unterschiedliche Paradigmen unterscheiden: semantisch (deduktiv) und Daten getrieben (induktiv). Die erste betont die Logik als eine Möglichkeit, das in Regeln oder Ontologien kodierte Domänen-wissen darzustellen, und wird oft sorgfältig kuratiert und gepflegt. Diese Modelle sind jedoch oft sehr komplex und erfordern eine intensive Wissensverarbeitung. Datengesteuerte Analysen verwenden maschinelles Lernen (ML), um mit minimalem menschlichen Eingriff direkt ein Modell aus den Daten zu lernen. Diese Modelle sind jedoch auf trainierte Daten und Kontext abgestimmt, was die Anpassung erschwert. Branchen, die heute Wert aus Daten schaffen wollen, müssen diese Paradigmen in Kombination meistern. Es besteht jedoch ein großer Bedarf in der Daten-analytik, semantisch und datengesteuerte Verarbeitungstechniken nahtlos in einer effizienten und skalierbaren Architektur zu kombinieren, die es ermöglicht, aus einer extremen Datenvielfalt verwertbare Erkenntnisse zu gewinnen. In dieser Arbeit, die wir auf diese Bedürfnisse durch die Bereitstellung: • Eine einheitliche Darstellung der Domänen-spezifischen und analytischen Semantik in Form von Ontologie Modellen, genannt TechOnto Ontology Stack. Es ist ein hoch-expressiver, plattformunabhängiger Formalismus, die konzeptionelle Semantik industrieller Systeme wie technischer Systemhierarchien, Komponenten-partonomien usw. und deren analytische funktionale Semantik zu erfassen. • Eine neue Ontologie-Sprache Semantically defined Analytical Language (SAL) auf Basis des Ontologie-Modells das bestehende DatalogMTL (ein Horn fragment der metrischen temporären Logik) um analytische Funktionen als erstklassige Bürger erweitert. • Eine Methode zur Erzeugung semantischer workflows mit unserer SAL-Sprache. Es hilft bei der Erstellung, Wiederverwendung und Wartung komplexer analytischer Aufgaben und workflows auf abstrakte Weise. • Eine mehrschichtige Architektur, die Wissens- und datengesteuerte Analysen zu einer föderierten und verteilten Lösung verschmilzt. Nach unserem Wissen, die Arbeit in dieser Arbeit ist eines der ersten Werke zur Einführung und Untersuchung der Verwendung der semantisch definierten Analytik in einer Ontologie-basierten Datenzugriff Einstellung für industrielle analytische Anwendungen. Der Grund für die Fokussierung unserer Arbeit und Evaluierung auf industrielle Daten ist auf (i) die Übernahme semantischer Technologien durch die Industrie im Allgemeinen und (ii) den gemeinsamen Bedarf in der Literatur und in der Praxis zurückzuführen, der es der Fachkompetenz ermöglicht, die Datenanalyse auf semantisch inter-operablen Quellen voranzutreiben, und nutzen gleichzeitig die Leistungsfähigkeit der Analytik, um Echtzeit-Daten-einblicke zu ermöglichen. Aufgrund der Evaluierungsergebnisse von drei Anwendungsfällen Übertritt unser Ansatz für die meisten Anwendungsszenarien Modernste Ansätze

    On decidability and tractability of querying in temporal EL

    Get PDF
    We study access to temporal data with TEL, a temporal extension of the tractable description logic EL. Our aim is to establish a clear computational complexity landscape for the atomic query answering problem, in terms of both data and combined complexity. Atomic queries in full TEL turn out to be undecidable even in data complexity. Motivated by the negative result, we identify well-behaved yet expressive fragments of TEL. Our main contributions are a semantic and sufficient syntactic conditions for decidability and three orthogonal tractable fragments, which are based on restricted use of rigid roles, temporal operators, and novel acyclicity conditions on the ontologies

    Efficient Management for Geospatial and Temporal Data using Ontology-based Data Access Techniques

    Get PDF
    Το μοντέλο δεδομένων RDF και η γλώσσα επερωτήσεων SPARQL είναι ευρέως διαδεδομένα για την χρήση τους με σκοπό την ενοποίηση πληροφορίας που προέρχεται από διαφορετικές πηγές. Ο αυξανόμενος αριθμός των γεωχωρικών συνόλων δεδομένων που είναι πλέον διαθέσιμα σαν γεωχωρικά διασυνδεδεμένα δεδομένα οδήγησε στην εμφάνιση επεκτάσεων του μοντέλου δεδομένων RDF και της γλώσσας επερωτήσεων SPARQL. Δύο από τις σημαντικότερες επεκτάσεις αυτές είναι η γλώσσα GeoSPARQL, η οποία έγινε OGC πρότυπο, και το πλαίσιο του μοντέλου δεδομένων stRDF και της γλώσσας επερωτήσεων stSPARQL. Και οι δύο προσεγγίσεις μπορούν να χρησιμοποιηθούν για την αναπαράσταση και επερώτηση διασυνδεδεμένων γεωχωρικών δεδομένων, ενώ το μοντέλο stRDF και η γλώσσα stSPARQL παρέχουν επίσης επιπλέον λειτουργικότητα για την αναπαράσταση και επερώτηση χρονικών δεδομένων. Παρότι ο αριθμός των δεδομένων που είναι διαθέσιμα σαν γεωχωρικά ή και χρονικά διασυνδεδεμένα δεδομένα αυξάνεται, η μετατροπή των γεωχωρικών δεδομένων σε RDF και η αποθήκευσή τους σε αποθετήρια RDF δεν είναι πάντα η βέλτιστη λύση, ειδικά όταν τα δεδομένα βρίσκονται εξαρχής σε σχεσιακές βάσεις οι οποίες μπορεί να έχουν αρκετά μεγάλο μέγεθος ή και να ενημερώνονται πολύ συχνά. Στα πλαίσια αυτής της διδακτορικής διατριβής, προτείνουμε μια λύση βασισμένη στην ανάκτηση πληροφορίας με χρήση οντολογιών και αντιστοιχίσεων για την επερώτηση δεδομένων πάνω από γεωχωρικές σχεσιακές βάσεις δεδομένων. Επεκτείνουμε τεχνικές επανεγγραφής GeoSPARQL ερωτημάτων σε SQL ώστε η αποτίμηση των επερωτήσεων να γίνεται εξολοκλήρου στο γεωχωρικό σύστημα διαχείρισης βάσεων δεδομένων. Επίσης, εισαγάγουμε επιπλέον λειτουργικότητα στη χρονική συνιστώσα του μοντέλου δεδομένων stRDF και της γλώσσας επερωτήσεων stSPARQL, προκειμένου να διευκολυνθεί η υποστήριξη χρονικών τελεστών σε συστήματα OBDA. Στη συνέχεια, επεκτείνουμε τις παραπάνω μεθόδους με την υποστήριξη διαφορετικών πηγών δεδομένων πέρα από σχεσιακές βάσεις και παρουσιάζουμε μια OBDA προσέγγιση που επιτρέπει τη δημιουργία εικονικών RDF γράφων πάνω από δεδομένα που βρίσκονται διαθέσιμα στο διαδίκτυο σε διάφορες μορφές (πχ. HTML πίνακες, web διεπαφές), με χρήση οντολογιών και αντιστοιχίσεων. Συγκρίναμε την απόδοση του συστήματός μας με ένα σχετικό σύστημα και τα αποτελέσματα έδειξαν ότι πέραν του ότι το σύστημά μας παρέχει μεγαλύτερη λειτουργικότητα (πχ. υποστηρίζει περισσότερα είδη πηγών δεδομένων, περιλαμβάνει απλούστερες διαδικασίες και εξασφαλίζει καλύτερη απόδοση. Τέλος, παρουσιάζουμε την εφαρμογή των μεθόδων και συστημάτων που περιγράφονται στη διατριβή σε πραγματικά σενάρια χρήσης.The data model RDF and query language SPARQL have been widely used for the integration of data coming from different souces. Due to the increasing number of geospatial datasets that are being available as linked open data, a lot of effort focuses in the development of geospatial (and temporal, accordingly) extensions of the framework of RDF and SPARQL. Two highlights of these efforts are the query language GeoSPARQL, that is an OGC standard, and the framework of stRDF and stSPARQL. Both frameworks can be used for the representation and querying of linked geospatial data, and stSPARQL also includes a temporal dimension. Although a lot of geospatial (and some temporal) RDF stores started to emerge, converting geospatial data into RDF and then storing it into an RDF stores is not always best practice, especially when the data exists in a relational database that is fairly large and/or it gets updated frequently. In this thesis, we propose an Ontology-based Data Access (OBDA) approach for accessing geospatial data stored in geospatial relational databases, using the OGC standard GeoSPARQL and R2RML or OBDA mappings. We introduce extensions to an existing SPARQL-to-SQL translation method to support GeoSPARQL features. We describe the implementation of our approach in the system Ontop-spatial, an extension of the OBDA system Ontop for creating virtual geospatial RDF graphs on top of geospatial relational databases. Ontop-spatial is the first geospatial OBDA system and outperforms state-of-the-art geospatial RDF stores. We also show how to answer queries with temproal operators in the OBDA framework, by utilizing the framework stRDF and the query language stSPARQL which we extend with some new features. Next, we extend the data sources supported by Ontop-spatial going beyond relational database management systems, and we present our OBDA solutions for creating virtual RDF graphs on top of various web data sources (e.g., HTML tables, Web APIs) using ontologies and mappings. We compared the performance of our approach with a related implementation and evaluation results showed that not only does Ontop-spatial support more functionalities (e.g., more data sources, more simple workflow), but it also achieves better performance. Last, we describe how the work described in this thesis is applied in real-world application scenarios

    On decidability and tractability of querying in temporal EL

    Get PDF
    We study access to temporal data with TEL, a temporal extension of the tractable description logic EL. Our aim is to establish a clear computational complexity landscape for the atomic query answering problem, in terms of both data and combined complexity. Atomic queries in full TEL turn out to be undecidable even in data complexity. Motivated by the negative result, we identify well-behaved yet expressive fragments of TEL. Our main contributions are a semantic and sufficient syntactic conditions for decidability and three orthogonal tractable fragments, which are based on restricted use of rigid roles, temporal operators, and novel acyclicity conditions on the ontologies

    Scalable integration of uncertainty reasoning and semantic web technologies

    Full text link
    In recent years formal logical standards for knowledge representation to model real world knowledge and domains and make them accessible for computers gained a lot of trac- tion. They provide an expressive logical framework for modeling, consistency checking, reasoning, and query answering, and have proven to be versatile methods to capture knowledge of various fields. Those formalisms and methods focus on specifying knowl- edge as precisely as possible. At the same time, many applications in particular on the Semantic Web have to deal with uncertainty in their data; and handling uncertain knowledge is crucial in many real- world domains. However, regular logic is unable to capture the real-world properly due to its inherent complexity and uncertainty, all the while handling uncertain or incomplete information is getting more and more important in applications like expert system, data integration or information extraction. The overall objective of this dissertation is to identify scenarios and datasets where methods that incorporate their inherent uncertainty improve results, and investigate approaches and tools that are suitable for the respective task. In summary, this work is set out to tackle the following objectives: 1. debugging uncertain knowledge bases in order to generate consistent knowledge graphs to make them accessible for logical reasoning, 2. combining probabilistic query answering and logical reasoning which in turn uses these consistent knowledge graphs to answer user queries, and 3. employing the aforementioned techniques to the problem of risk management in IT infrastructures, as a concrete real-world application. We show that in all those scenarios, users can benefit from incorporating uncertainty in the knowledge base. Furthermore, we conduct experiments that demonstrate the real- world scalability of the demonstrated approaches. Overall, we argue that integrating uncertainty and logical reasoning, despite being theoretically intractable, is feasible in real-world application and warrants further research

    Semantic-guided predictive modeling and relational learning within industrial knowledge graphs

    Get PDF
    The ubiquitous availability of data in today’s manufacturing environments, mainly driven by the extended usage of software and built-in sensing capabilities in automation systems, enables companies to embrace more advanced predictive modeling and analysis in order to optimize processes and usage of equipment. While the potential insight gained from such analysis is high, it often remains untapped, since integration and analysis of data silos from different production domains requires high manual effort and is therefore not economic. Addressing these challenges, digital representations of production equipment, so-called digital twins, have emerged leading the way to semantic interoperability across systems in different domains. From a data modeling point of view, digital twins can be seen as industrial knowledge graphs, which are used as semantic backbone of manufacturing software systems and data analytics. Due to the prevalent historically grown and scattered manufacturing software system landscape that is comprising of numerous proprietary information models, data sources are highly heterogeneous. Therefore, there is an increasing need for semi-automatic support in data modeling, enabling end-user engineers to model their domain and maintain a unified semantic knowledge graph across the company. Once the data modeling and integration is done, further challenges arise, since there has been little research on how knowledge graphs can contribute to the simplification and abstraction of statistical analysis and predictive modeling, especially in manufacturing. In this thesis, new approaches for modeling and maintaining industrial knowledge graphs with focus on the application of statistical models are presented. First, concerning data modeling, we discuss requirements from several existing standard information models and analytic use cases in the manufacturing and automation system domains and derive a fragment of the OWL 2 language that is expressive enough to cover the required semantics for a broad range of use cases. The prototypical implementation enables domain end-users, i.e. engineers, to extend the basis ontology model with intuitive semantics. Furthermore it supports efficient reasoning and constraint checking via translation to rule-based representations. Based on these models, we propose an architecture for the end-user facilitated application of statistical models using ontological concepts and ontology-based data access paradigms. In addition to that we present an approach for domain knowledge-driven preparation of predictive models in terms of feature selection and show how schema-level reasoning in the OWL 2 language can be employed for this task within knowledge graphs of industrial automation systems. A production cycle time prediction model in an example application scenario serves as a proof of concept and demonstrates that axiomatized domain knowledge about features can give competitive performance compared to purely data-driven ones. In the case of high-dimensional data with small sample size, we show that graph kernels of domain ontologies can provide additional information on the degree of variable dependence. Furthermore, a special application of feature selection in graph-structured data is presented and we develop a method that allows to incorporate domain constraints derived from meta-paths in knowledge graphs in a branch-and-bound pattern enumeration algorithm. Lastly, we discuss maintenance of facts in large-scale industrial knowledge graphs focused on latent variable models for the automated population and completion of missing facts. State-of-the art approaches can not deal with time-series data in form of events that naturally occur in industrial applications. Therefore we present an extension of learning knowledge graph embeddings in conjunction with data in form of event logs. Finally, we design several use case scenarios of missing information and evaluate our embedding approach on data coming from a real-world factory environment. We draw the conclusion that industrial knowledge graphs are a powerful tool that can be used by end-users in the manufacturing domain for data modeling and model validation. They are especially suitable in terms of the facilitated application of statistical models in conjunction with background domain knowledge by providing information about features upfront. Furthermore, relational learning approaches showed great potential to semi-automatically infer missing facts and provide recommendations to production operators on how to keep stored facts in synch with the real world
    corecore