61 research outputs found
Using Ontologies for Semantic Data Integration
While big data analytics is considered as one of the most important paths to competitive advantage of today’s enterprises, data scientists spend a comparatively large amount of time in the data preparation and data integration phase of a big data project. This shows that data integration is still a major challenge in IT applications. Over the past two decades, the idea of using semantics for data integration has become increasingly crucial, and has received much attention in the AI, database, web, and data mining communities. Here, we focus on a specific paradigm for semantic data integration, called Ontology-Based Data Access (OBDA). The goal of this paper is to provide an overview of OBDA, pointing out both the techniques that are at the basis of the paradigm, and the main challenges that remain to be addressed
Ontology-Based Data Access and Integration
An ontology-based data integration (OBDI) system is an information management system consisting of three components: an ontology, a set of data sources, and the mapping between the two. The ontology is a conceptual, formal description of the domain of interest to a given organization (or a community of users), expressed in terms of relevant concepts, attributes of concepts, relationships between concepts, and logical assertions characterizing the domain knowledge. The data sources are the repositories accessible by the organization where data concerning the domain are stored. In the general case, such repositories are numerous, heterogeneous, each one managed and maintained independently from the others. The mapping is a precise specification of the correspondence between the data contained in the data sources and the elements of the ontology. The main purpose of an OBDI system is to allow information consumers to query the data using the elements in the ontology as predicates.
In the special case where the organization manages a single data source, the term ontology-based data access (ODBA) system is used
FishMark: A Linked Data Application Benchmark
Abstract. FishBase is an important species data collection produced by the FishBase Information and Research Group Inc (FIN), a not-forprofit NGO with the aim of collecting comprehensive information (from the taxonomic to the ecological) about all the world’s finned fish species. FishBase is exposed as a MySQL backed website (supporting a range of canned, although complex queries) and serves over 33 million hits per month. FishDelish is a transformation of FishBase into LinkedData weighing in at 1.38 billion triples. We have ported a substantial number of FishBase SQL queries to FishDelish SPARQL query which form the basis of a new linked data application benchmark (using our derivative of the Berlin SPARQL Benchmark harness). We use this benchmarking framework to compare the performance of the native MySQL application, the Virtuoso RDF triple store, and the Quest OBDA system on a fishbase.org like application.
Ontology Based Data Access in Statoil
Ontology Based Data Access (OBDA) is a prominent approach to query databases which uses an ontology to expose data in a conceptually clear manner by abstracting away from the technical schema-level details of the underlying data. The ontology is ‘connected’ to the data via mappings that allow to automatically translate queries posed over the ontology into data-level queries that can be executed by the underlying database management system. Despite a lot of attention from the research community, there are still few instances of real world industrial use of OBDA systems. In this work we present data access challenges in the data-intensive petroleum company Statoil and our experience in addressing these challenges with OBDA technology. In particular, we have developed a deployment module to create ontologies and mappings from relational databases in a semi-automatic fashion; a query processing module to perform and optimise the process of translating ontological queries into data queries and their execution over either a single DB of federated DBs; and a query formulation module to support query construction for engineers with a limited IT background. Our modules have been integrated in one OBDA system, deployed at Statoil, integrated with Statoil’s infrastructure, and evaluated with Statoil’s engineers and data
Ontology-based data access: ontop of databases
We present the architecture and technologies underpinning the OBDA system Ontop and taking full advantage of storing data in relational databases. We discuss the theoretical foundations of Ontop: the tree-witness query rewriting, T-mappings and optimisations based on database integrity constraints and SQL features. We analyse the performance of Ontop in a series of experiments and demonstrate that, for standard ontologies, queries and data stored in relational databases, Ontop is fast, efficient and produces SQL rewritings of high quality
Implementing OBDA for an end-user query answering service on an educational ontology
In the age where productivity of society is no longer defined by the amount of information
generated, but from the quality and assertiveness that a set of data may potentially hold,
the right questions to do depends on the semantic awareness capability that an
information system could evolve into. To address this challenge, in the last decade,
exhaustive research has been done in the Ontology Based Data Access (OBDA)
paradigm.
A conspectus of the most promising technologies with data integration capabilities and
the foundations where they rely are documented in this memory as a point of reference
for choosing tools that supports the incorporation of a conceptual model under a OBDA
method. The present study provides a practical approach for implementing an ontology
based data access service, to educational context users of a Learning Analytics initiative,
by means of allowing them to formulate intuitive enquiries with a familiar domain
terminology on top of a Learning Management System. The ontology used was
completely transformed to semantic linked data standards and some data mappings for
testing were included. Semantic Linked Data technologies exposed in this document may
exert modernization to environments in which object oriented and relational paradigms
may propagate heterogeneous and contradictory requirements. Finally, to validate the
implementation, a set of queries were constructed emulating the most relevant dynamics
of the model regarding the dataset nature
Semantically defined Analytics for Industrial Equipment Diagnostics
In this age of digitalization, industries everywhere accumulate massive amount of data such that it has become the lifeblood of the global economy. This data may come from various heterogeneous systems, equipment, components, sensors, systems and applications in many varieties (diversity of sources), velocities (high rate of changes) and volumes (sheer data size).
Despite significant advances in the ability to collect, store, manage and filter data, the real value lies in the analytics. Raw data is meaningless, unless it is properly processed to actionable (business) insights. Those that know how to harness data effectively, have a decisive competitive advantage, through raising performance by making faster and smart decisions, improving short and long-term strategic planning, offering more user-centric products and services and fostering innovation. Two distinct paradigms in practice can be discerned within the field of analytics: semantic-driven (deductive) and data-driven (inductive).
The first emphasizes logic as a way of representing the domain knowledge encoded in rules or ontologies and are often carefully curated and maintained. However, these models are often highly complex, and require intensive knowledge processing capabilities. Data-driven analytics employ machine learning (ML) to directly learn a model from the data with minimal human intervention. However, these models are tuned to trained data and context, making it difficult to adapt.
Industries today that want to create value from data must master these paradigms in combination. However, there is great need in data analytics to seamlessly combine semantic-driven and data-driven processing techniques in an efficient and scalable architecture that allows extracting actionable insights from an extreme variety of data. In this thesis, we address these needs by providing:
• A unified representation of domain-specific and analytical semantics, in form of ontology models called TechOnto Ontology Stack. It is highly expressive, platform-independent formalism to capture conceptual semantics of industrial systems such as technical system hierarchies, component partonomies etc and its analytical functional semantics.
• A new ontology language Semantically defined Analytical Language (SAL) on top of the ontology model that extends existing DatalogMTL (a Horn fragment of Metric Temporal Logic) with analytical functions as first class citizens.
• A method to generate semantic workflows using our SAL language. It helps in authoring, reusing and maintaining complex analytical tasks and workflows in an abstract fashion.
• A multi-layer architecture that fuses knowledge- and data-driven analytics into a federated and distributed solution.
To our knowledge, the work in this thesis is one of the first works to introduce and investigate the use of the semantically defined analytics in an ontology-based data access setting for industrial analytical applications. The reason behind focusing our work and evaluation on industrial data is due to (i) the adoption of semantic technology by the industries in general, and (ii) the common need in literature and
in practice to allow domain expertise to drive the data analytics on semantically interoperable sources, while still harnessing the power of analytics to enable real-time data insights. Given the evaluation results of three use-case studies, our approach surpass state-of-the-art approaches for most application scenarios.Im Zeitalter der Digitalisierung sammeln die Industrien überall massive Daten-mengen, die zum Lebenselixier der Weltwirtschaft geworden sind. Diese Daten können aus verschiedenen heterogenen Systemen, Geräten, Komponenten, Sensoren, Systemen und Anwendungen in vielen Varianten (Vielfalt der Quellen), Geschwindigkeiten (hohe Änderungsrate) und Volumina (reine Datengröße) stammen.
Trotz erheblicher Fortschritte in der Fähigkeit, Daten zu sammeln, zu speichern, zu verwalten und zu filtern, liegt der eigentliche Wert in der Analytik. Rohdaten sind bedeutungslos, es sei denn, sie werden ordnungsgemäß zu verwertbaren (Geschäfts-)Erkenntnissen verarbeitet. Wer weiß, wie man Daten effektiv nutzt, hat einen entscheidenden Wettbewerbsvorteil, indem er die Leistung steigert, indem er schnellere und intelligentere Entscheidungen trifft, die kurz- und langfristige strategische Planung verbessert, mehr benutzerorientierte Produkte und Dienstleistungen anbietet und Innovationen fördert. In der Praxis lassen sich im Bereich der Analytik zwei unterschiedliche Paradigmen unterscheiden: semantisch (deduktiv) und Daten getrieben (induktiv).
Die erste betont die Logik als eine Möglichkeit, das in Regeln oder Ontologien kodierte Domänen-wissen darzustellen, und wird oft sorgfältig kuratiert und gepflegt. Diese Modelle sind jedoch oft sehr komplex und erfordern eine intensive Wissensverarbeitung. Datengesteuerte Analysen verwenden maschinelles Lernen (ML), um mit minimalem menschlichen Eingriff direkt ein Modell aus den Daten zu lernen. Diese Modelle sind jedoch auf trainierte Daten und Kontext abgestimmt, was die Anpassung erschwert.
Branchen, die heute Wert aus Daten schaffen wollen, müssen diese Paradigmen in Kombination meistern. Es besteht jedoch ein großer Bedarf in der Daten-analytik, semantisch und datengesteuerte Verarbeitungstechniken nahtlos in einer effizienten und skalierbaren Architektur zu kombinieren, die es ermöglicht, aus einer extremen Datenvielfalt verwertbare Erkenntnisse zu gewinnen. In dieser Arbeit, die wir auf diese Bedürfnisse durch die Bereitstellung:
• Eine einheitliche Darstellung der Domänen-spezifischen und analytischen Semantik in Form von Ontologie Modellen, genannt TechOnto Ontology Stack. Es ist ein hoch-expressiver, plattformunabhängiger Formalismus, die konzeptionelle Semantik industrieller Systeme wie technischer Systemhierarchien, Komponenten-partonomien usw. und deren analytische funktionale Semantik zu erfassen.
• Eine neue Ontologie-Sprache Semantically defined Analytical Language (SAL) auf Basis des Ontologie-Modells das bestehende DatalogMTL (ein Horn fragment der metrischen temporären Logik) um analytische Funktionen als erstklassige Bürger erweitert.
• Eine Methode zur Erzeugung semantischer workflows mit unserer SAL-Sprache. Es hilft bei der Erstellung, Wiederverwendung und Wartung komplexer analytischer Aufgaben und workflows auf abstrakte Weise.
• Eine mehrschichtige Architektur, die Wissens- und datengesteuerte Analysen zu einer föderierten und verteilten Lösung verschmilzt.
Nach unserem Wissen, die Arbeit in dieser Arbeit ist eines der ersten Werke zur Einführung und Untersuchung der Verwendung der semantisch definierten Analytik in einer Ontologie-basierten Datenzugriff Einstellung für industrielle analytische Anwendungen. Der Grund für die Fokussierung unserer Arbeit und Evaluierung auf industrielle Daten ist auf (i) die Übernahme semantischer Technologien durch die Industrie im Allgemeinen und (ii) den gemeinsamen Bedarf in der Literatur und in der Praxis zurückzuführen, der es der Fachkompetenz ermöglicht, die Datenanalyse auf semantisch inter-operablen Quellen voranzutreiben, und nutzen gleichzeitig die Leistungsfähigkeit der Analytik, um Echtzeit-Daten-einblicke zu ermöglichen. Aufgrund der Evaluierungsergebnisse von drei Anwendungsfällen Übertritt unser Ansatz für die meisten Anwendungsszenarien Modernste Ansätze
Recommended from our members
Finding Data Should be Easier than Finding Oil
The competitiveness of modern enterprises heavily depends on their ability to make the right business decisions by relying on efficient and timely analysis of the right business critical data. In large and data intensive companies such as Equinor, a Norwegian multinational oil and gas company with more than 20,000 employees, gathering such data is not a trivial task due to the growing size and complexity of corporate information sources. As a result, the data gathering task is often the most time-consuming part of the decision making process, in particular when it comes to the work processes of Equinor's exploration geologists that should find in a timely manner new exploitable accumulations of oil or gas in given areas by analysing data about these areas. In this work we present our experience in addressing this data challenge tast at Equinor. We have developed and deployed at Equinor a semantic data access system that relies on the Ontology Based Data Access (OBDA) approach. Our system is based on our solid theoretical contributions and has been extensively evaluated at Equinor
Experiencing OptiqueVQS: A Multi-paradigm and Ontology-based Visual Query System for End Users
This is author's post-print version, published version available on http://link.springer.com/article/10.1007%2Fs10209-015-0404-5Data access in an enterprise setting is a determining factor for value creation processes, such as sense-making, decision-making, and intelligence analysis. Particularly, in an enterprise setting, intuitive data access tools that directly engage domain experts with data could substantially increase competitiveness and profitability. In this respect, the use of ontologies as a natural communication medium between end users and computers has emerged as a prominent approach. To this end, this article introduces a novel ontology-based visual query system, named OptiqueVQS, for end users. OptiqueVQS is built on a powerful and scalable data access platform and has a user-centric design supported by a widget-based flexible and extensible architecture allowing multiple coordinated representation and interaction paradigms to be employed. The results of a usability experiment performed with non-expert users suggest that OptiqueVQS provides a decent level of expressivity and high usability and hence is quite promising
Virtual Knowledge Graphs: An Overview of Systems and Use Cases
In this paper, we present the virtual knowledge graph (VKG) paradigm for data integration and access, also known in the literature as Ontology-based Data Access. Instead of structuring the integration layer as a collection of relational tables, the VKG paradigm replaces the rigid structure of tables with the flexibility of graphs that are kept virtual and embed domain knowledge. We explain the main notions of this paradigm, its tooling ecosystem and significant use cases in a wide range of applications. Finally, we discuss future research directions
- …