21 research outputs found

    SemML: Reusable ML for condition monitoring in discrete manufacturing

    Get PDF
    Machine learning (ML) is gaining much attention for data analysis in manufacturing. Despite the success, there is still a number of challenges in widening the scope of ML adoption. The main challenges include the exhausting effort of data integration and lacking of generalisability of developed ML pipelines to diverse data variants, sources, and domain processes. In this demo we present our SemML system that addresses these challenges by enhancing machine learning with semantic technologies: by capturing domain and ML knowledge in ontologies and ontology templates and automating various ML steps using reasoning. During the demo the attendees will experience three cunningly-designed scenarios based on real industrial applications of manufacturing condition monitoring at Bosch, and witness the power of ontologies and templates in enabling reusable ML pipelines

    Ontology-Based Data Access to Big Data

    Get PDF
    Recent approaches to ontology-based data access (OBDA) have extended the focus from relational database systems to other types of backends such as cluster frameworks in order to cope with the four Vs associated with big data: volume, veracity, variety and velocity (stream processing). The abstraction that an ontology provides is a benefit from the enduser point of view, but it represents a challenge for developers because high-level queries must be transformed into queries executable on the backend level. In this paper, we discuss and evaluate an OBDA system that uses STARQL (Streaming and Temporal ontology Access with a Reasoning-based Query Language), as a high-level query language to access data stored in a SPARK cluster framework. The development of the STARQL-SPARK engine show that there is a need to provide a homogeneous interface to access both static and temporal as well as streaming data because cluster frameworks usually lack such an interface. The experimental evaluation shows that building a scalable OBDA system that runs with SPARK is more than plug-and-play as one needs to know quite well the data formats and the data organisation in the cluster framework

    Virtual Knowledge Graphs: An Overview of Systems and Use Cases

    Get PDF
    In this paper, we present the virtual knowledge graph (VKG) paradigm for data integration and access, also known in the literature as Ontology-based Data Access. Instead of structuring the integration layer as a collection of relational tables, the VKG paradigm replaces the rigid structure of tables with the flexibility of graphs that are kept virtual and embed domain knowledge. We explain the main notions of this paradigm, its tooling ecosystem and significant use cases in a wide range of applications. Finally, we discuss future research directions

    Two-dimensional rule language for querying sensor log data: a framework and use cases

    Get PDF
    Motivated by two industrial use cases that involve detecting events of interest in (asynchronous) time series from sensors in manufacturing rigs and gas turbines, we design an expressive rule language DslD equipped with interval aggregate functions (such as weighted average over a time interval), Allen’s interval relations and various metric constructs. We demonstrate how to model events in the uses cases in terms of DslD programs. We show that answering DslD queries in our use cases can be reduced to evaluating SQL queries. Our experiments with the use cases, carried out on the Apache Spark system, show that such SQL queries scale well on large real-world datasets

    Building Semantic Knowledge Graphs from (Semi-)Structured Data: A Review

    Get PDF
    Knowledge graphs have, for the past decade, been a hot topic both in public and private domains, typically used for large-scale integration and analysis of data using graph-based data models. One of the central concepts in this area is the Semantic Web, with the vision of providing a well-defined meaning to information and services on the Web through a set of standards. Particularly, linked data and ontologies have been quite essential for data sharing, discovery, integration, and reuse. In this paper, we provide a systematic literature review on knowledge graph creation from structured and semi-structured data sources using Semantic Web technologies. The review takes into account four prominent publication venues, namely, Extended Semantic Web Conference, International Semantic Web Conference, Journal of Web Semantics, and Semantic Web Journal. The review highlights the tools, methods, types of data sources, ontologies, and publication methods, together with the challenges, limitations, and lessons learned in the knowledge graph creation processes.publishedVersio

    View and Index Selection on Graph Databases

    Get PDF
    Μια από τις σημαντικότερες πτυχές των βάσεων δεδομένων γραφημάτων με εγγενή επεξεργασία γράφων είναι η ιδιότητα γειτνίασης χωρίς ευρετήριο (index-free-adjacency), βάση της οποίας όλοι οι κόμβοι του γράφου έχουν άμεση φυσική διεύθυνση RAM και δείκτες σε άλλους γειτονικούς κόμβους. Η ιδιότητα γειτνίασης χωρίς ευρετήριο επιταχύνει την απάντηση ερωτημάτων για ερωτήματα που συνδέονται με έναν (ή περισσότερους) συγκεκριμένους κόμβους εντός του γραφήματος, δηλαδή τους κόμβους αγκύρωσης (anchor nodes). Ο αντίστοιχος κόμβος αγκύρωσης χρησιμοποιείται ως σημείο εκκίνησης για την απάντηση στο ερώτημα εξετάζοντας τους παρακείμενους κόμβους του αντί για ολόκληρο το γράφημα. Παρόλα αυτά, τα ερωτήματα που δεν αρχίζουν από κόμβους αγκύρωσης απαντώνται πολύ πιο δύσκολα, καθώς ο σχεδιαστής ερωτημάτων(query planner) θα πρέπει να εξετάσει ένα μεγάλο μέρος του γραφήματος για να απαντήσει στο αντίστοιχο ερώτημα. Σε αυτή την εργασία μελετάμε τεχνικές επιλογής όψεων και ευρετηρίων προκειμένου να επιταχύνουμε την προαναφερθείσα κατηγορία ερωτημάτων. Αναλύουμε διαφορετικές στρατηγικές επιλογής όψεων και ευρετηρίων για την απάντηση ερωτημάτων και δείχνουμε ότι, ανάλογα με τα χαρακτηριστικά του ερωτήματος, τη βάση δεδομένων γραφημάτων και το αντίστοιχο σύνολο απαντήσεων, μια διαφορετική στρατηγική μπορεί να είναι βέλτιστη μεταξύ των εναλλακτικών λύσεων ευρετηρίασης και υλοποίησης προβολής. Πριν από την επιλογή των όψεων και των ευρετηρίων, το σύστημά μας χρησιμοποιεί τεχνικές εξόρυξης προτύπων για να μαντέψει τα χαρακτηριστικά των μελλοντικών ερωτημάτων. Έτσι, ο αρχικός φόρτος εργασίας του ερωτήματος αντιπροσωπεύεται από μια πολύ μικρότερη σύνοψη των μοτίβων ερωτημάτων που είναι πιο πιθανό να εμφανιστούν σε μελλοντικά ερωτήματα, με κάθε μοτίβο να έχει τον αντίστοιχο αναμενόμενο αριθμό εμφανίσεων. Η στρατηγική επιλογής μας βασίζεται σε μια στρατηγική επιλογής άπληστης όψεων & ευρετηρίων που σε κάθε βήμα της εκτέλεσής της προσπαθεί να μεγιστοποιήσει την αναλογία του οφέλους από την υλοποίηση μιας/ενός όψεως/ευρετηρίου, προς το αντίστοιχο κόστος αποθήκευσής τους. Ο αλγόριθμος επιλογής μας είναι εμπνευσμένος από τον αντίστοιχο άπληστο αλγόριθμο για τη «Μεγιστοποίηση μιας μη ελαττούμενης συνάρτησης υποδομοστοιχειωτού συνόλου που υπόκειται σε περιορισμό σακιδίου». Η πειραματική μας αξιολόγηση δείχνει ότι όλα τα βήματα της διαδικασίας επιλογής ευρετηρίου ολοκληρώνονται σε λίγα δευτερόλεπτα, ενώ οι αντίστοιχες επανεγγραφές επιταχύνουν το 15,44% των ερωτημάτων στον φόρτο εργασίας των ερωτημάτων της DbPedia. Αυτά τα ερωτήματα εκτελούνται στο 1,63% του αρχικού τους χρόνου κατά μέσο όρο.One of the most important aspects of native graph-database systems is their index-free adjacency property that enforces the nodes to have direct physical RAM addresses and physically point to other adjacent nodes. The index-free adjacency property accelerates query answering for queries that are bound to one (or more) specific nodes within the graph, namely anchor nodes. The corresponding anchor node is used as the starting point for answering the query by examining its adjacent nodes instead of the whole graph. Nevertheless, non-anchored-node queries are much harder to answer since the query planner should examine a large portion of the graph in order to answer the corresponding query. In this work we study view and index selection techniques in order to accelerate the aforementioned class of queries. We analyze different index and view selection strategies for query answering and show that, depending on the characteristics of the query, the graph database, and the corresponding answer set, a different strategy may be optimal among the indexing and view materialization alternatives. Before selecting the views and indices, our system employs pattern mining techniques in order to guess the characteristics of future queries. Thus, the initial query workload is represented by a much smaller summary of the query patterns that are most likely to appear in future queries, each pattern having a corresponding expected number of appearances. Our selection strategy is based on a greedy view & index selection strategy that at each step of its execution tries to maximize the ratio of the benefit of materializing a view/index, to the corresponding cost of storing it. Our selection algorithm is inspired by the corresponding greedy algorithm for “Maximizing a Nondecreasing Submodular Set Function Subject to a Knapsack Constraint”. Our experimental evaluation shows that all the steps of the index selection process are completed in a few seconds, while the corresponding rewritings accelerate 15.44% of the queries in the DbPedia query workload. Those queries are executed in 1.63% of their initial time on average

    Semantic Data Management in Data Lakes

    Full text link
    In recent years, data lakes emerged as away to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Some approaches propose the linkage of metadata to knowledge graphs based on the Linked Data principles to provide more meaning and semantics to the data in the lake. Such a semantic layer may be utilized not only for data management but also to tackle the problem of data integration from heterogeneous sources, in order to make data access more expressive and interoperable. In this survey, we review recent approaches with a specific focus on the application within data lake systems and scalability to Big Data. We classify the approaches into (i) basic semantic data management, (ii) semantic modeling approaches for enriching metadata in data lakes, and (iii) methods for ontologybased data access. In each category, we cover the main techniques and their background, and compare latest research. Finally, we point out challenges for future work in this research area, which needs a closer integration of Big Data and Semantic Web technologies

    Semantically defined Analytics for Industrial Equipment Diagnostics

    Get PDF
    In this age of digitalization, industries everywhere accumulate massive amount of data such that it has become the lifeblood of the global economy. This data may come from various heterogeneous systems, equipment, components, sensors, systems and applications in many varieties (diversity of sources), velocities (high rate of changes) and volumes (sheer data size). Despite significant advances in the ability to collect, store, manage and filter data, the real value lies in the analytics. Raw data is meaningless, unless it is properly processed to actionable (business) insights. Those that know how to harness data effectively, have a decisive competitive advantage, through raising performance by making faster and smart decisions, improving short and long-term strategic planning, offering more user-centric products and services and fostering innovation. Two distinct paradigms in practice can be discerned within the field of analytics: semantic-driven (deductive) and data-driven (inductive). The first emphasizes logic as a way of representing the domain knowledge encoded in rules or ontologies and are often carefully curated and maintained. However, these models are often highly complex, and require intensive knowledge processing capabilities. Data-driven analytics employ machine learning (ML) to directly learn a model from the data with minimal human intervention. However, these models are tuned to trained data and context, making it difficult to adapt. Industries today that want to create value from data must master these paradigms in combination. However, there is great need in data analytics to seamlessly combine semantic-driven and data-driven processing techniques in an efficient and scalable architecture that allows extracting actionable insights from an extreme variety of data. In this thesis, we address these needs by providing: • A unified representation of domain-specific and analytical semantics, in form of ontology models called TechOnto Ontology Stack. It is highly expressive, platform-independent formalism to capture conceptual semantics of industrial systems such as technical system hierarchies, component partonomies etc and its analytical functional semantics. • A new ontology language Semantically defined Analytical Language (SAL) on top of the ontology model that extends existing DatalogMTL (a Horn fragment of Metric Temporal Logic) with analytical functions as first class citizens. • A method to generate semantic workflows using our SAL language. It helps in authoring, reusing and maintaining complex analytical tasks and workflows in an abstract fashion. • A multi-layer architecture that fuses knowledge- and data-driven analytics into a federated and distributed solution. To our knowledge, the work in this thesis is one of the first works to introduce and investigate the use of the semantically defined analytics in an ontology-based data access setting for industrial analytical applications. The reason behind focusing our work and evaluation on industrial data is due to (i) the adoption of semantic technology by the industries in general, and (ii) the common need in literature and in practice to allow domain expertise to drive the data analytics on semantically interoperable sources, while still harnessing the power of analytics to enable real-time data insights. Given the evaluation results of three use-case studies, our approach surpass state-of-the-art approaches for most application scenarios.Im Zeitalter der Digitalisierung sammeln die Industrien überall massive Daten-mengen, die zum Lebenselixier der Weltwirtschaft geworden sind. Diese Daten können aus verschiedenen heterogenen Systemen, Geräten, Komponenten, Sensoren, Systemen und Anwendungen in vielen Varianten (Vielfalt der Quellen), Geschwindigkeiten (hohe Änderungsrate) und Volumina (reine Datengröße) stammen. Trotz erheblicher Fortschritte in der Fähigkeit, Daten zu sammeln, zu speichern, zu verwalten und zu filtern, liegt der eigentliche Wert in der Analytik. Rohdaten sind bedeutungslos, es sei denn, sie werden ordnungsgemäß zu verwertbaren (Geschäfts-)Erkenntnissen verarbeitet. Wer weiß, wie man Daten effektiv nutzt, hat einen entscheidenden Wettbewerbsvorteil, indem er die Leistung steigert, indem er schnellere und intelligentere Entscheidungen trifft, die kurz- und langfristige strategische Planung verbessert, mehr benutzerorientierte Produkte und Dienstleistungen anbietet und Innovationen fördert. In der Praxis lassen sich im Bereich der Analytik zwei unterschiedliche Paradigmen unterscheiden: semantisch (deduktiv) und Daten getrieben (induktiv). Die erste betont die Logik als eine Möglichkeit, das in Regeln oder Ontologien kodierte Domänen-wissen darzustellen, und wird oft sorgfältig kuratiert und gepflegt. Diese Modelle sind jedoch oft sehr komplex und erfordern eine intensive Wissensverarbeitung. Datengesteuerte Analysen verwenden maschinelles Lernen (ML), um mit minimalem menschlichen Eingriff direkt ein Modell aus den Daten zu lernen. Diese Modelle sind jedoch auf trainierte Daten und Kontext abgestimmt, was die Anpassung erschwert. Branchen, die heute Wert aus Daten schaffen wollen, müssen diese Paradigmen in Kombination meistern. Es besteht jedoch ein großer Bedarf in der Daten-analytik, semantisch und datengesteuerte Verarbeitungstechniken nahtlos in einer effizienten und skalierbaren Architektur zu kombinieren, die es ermöglicht, aus einer extremen Datenvielfalt verwertbare Erkenntnisse zu gewinnen. In dieser Arbeit, die wir auf diese Bedürfnisse durch die Bereitstellung: • Eine einheitliche Darstellung der Domänen-spezifischen und analytischen Semantik in Form von Ontologie Modellen, genannt TechOnto Ontology Stack. Es ist ein hoch-expressiver, plattformunabhängiger Formalismus, die konzeptionelle Semantik industrieller Systeme wie technischer Systemhierarchien, Komponenten-partonomien usw. und deren analytische funktionale Semantik zu erfassen. • Eine neue Ontologie-Sprache Semantically defined Analytical Language (SAL) auf Basis des Ontologie-Modells das bestehende DatalogMTL (ein Horn fragment der metrischen temporären Logik) um analytische Funktionen als erstklassige Bürger erweitert. • Eine Methode zur Erzeugung semantischer workflows mit unserer SAL-Sprache. Es hilft bei der Erstellung, Wiederverwendung und Wartung komplexer analytischer Aufgaben und workflows auf abstrakte Weise. • Eine mehrschichtige Architektur, die Wissens- und datengesteuerte Analysen zu einer föderierten und verteilten Lösung verschmilzt. Nach unserem Wissen, die Arbeit in dieser Arbeit ist eines der ersten Werke zur Einführung und Untersuchung der Verwendung der semantisch definierten Analytik in einer Ontologie-basierten Datenzugriff Einstellung für industrielle analytische Anwendungen. Der Grund für die Fokussierung unserer Arbeit und Evaluierung auf industrielle Daten ist auf (i) die Übernahme semantischer Technologien durch die Industrie im Allgemeinen und (ii) den gemeinsamen Bedarf in der Literatur und in der Praxis zurückzuführen, der es der Fachkompetenz ermöglicht, die Datenanalyse auf semantisch inter-operablen Quellen voranzutreiben, und nutzen gleichzeitig die Leistungsfähigkeit der Analytik, um Echtzeit-Daten-einblicke zu ermöglichen. Aufgrund der Evaluierungsergebnisse von drei Anwendungsfällen Übertritt unser Ansatz für die meisten Anwendungsszenarien Modernste Ansätze
    corecore