Search CORE

99 research outputs found

Semantic Data Management in Data Lakes

Author: Hoseini Sayed
Quix Christoph
Theissen-Lipp Johannes
Publication venue
Publication date: 23/10/2023
Field of study

In recent years, data lakes emerged as away to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Some approaches propose the linkage of metadata to knowledge graphs based on the Linked Data principles to provide more meaning and semantics to the data in the lake. Such a semantic layer may be utilized not only for data management but also to tackle the problem of data integration from heterogeneous sources, in order to make data access more expressive and interoperable. In this survey, we review recent approaches with a specific focus on the application within data lake systems and scalability to Big Data. We classify the approaches into (i) basic semantic data management, (ii) semantic modeling approaches for enriching metadata in data lakes, and (iii) methods for ontologybased data access. In each category, we cover the main techniques and their background, and compare latest research. Finally, we point out challenges for future work in this research area, which needs a closer integration of Big Data and Semantic Web technologies

arXiv.org e-Print Archive

Reasoning in Many Dimensions : Uncertainty and Products of Modal Logics

Author: Jung Jean Christoph
Publication venue
Publication date: 01/01/2014
Field of study

Probabilistic Description Logics (ProbDLs) are an extension of Description Logics that are designed to capture uncertainty. We study problems related to these logics. First, we investigate the monodic fragment of Probabilistic first-order logic, show that it has many nice properties, and are able to explain the complexity results obtained for ProbDLs. Second, in order to identify well-behaved, in best-case tractable ProbDLs, we study the complexity landscape for different fragments of ProbEL; amongst others, we are able to identify a tractable fragment. We then study the reasoning problem of ontological query answering, but apply it to probabilistic data. Therefore, we define the framework of ontology-based access to probabilistic data and study the computational complexity therein. In the final part of the thesis, we study the complexity of the satisfiability problem in the two-dimensional modal logic KxK. We are able to close a gap that has been open for more than ten years

E-LIB Dokumentserver - Staats und Universitätsbibliothek Bremen

Scalable integration of uncertainty reasoning and semantic web technologies

Author: Schönfisch Jörg
Publication venue
Publication date: 01/01/2018
Field of study

In recent years formal logical standards for knowledge representation to model real world knowledge and domains and make them accessible for computers gained a lot of trac- tion. They provide an expressive logical framework for modeling, consistency checking, reasoning, and query answering, and have proven to be versatile methods to capture knowledge of various fields. Those formalisms and methods focus on specifying knowl- edge as precisely as possible. At the same time, many applications in particular on the Semantic Web have to deal with uncertainty in their data; and handling uncertain knowledge is crucial in many real- world domains. However, regular logic is unable to capture the real-world properly due to its inherent complexity and uncertainty, all the while handling uncertain or incomplete information is getting more and more important in applications like expert system, data integration or information extraction. The overall objective of this dissertation is to identify scenarios and datasets where methods that incorporate their inherent uncertainty improve results, and investigate approaches and tools that are suitable for the respective task. In summary, this work is set out to tackle the following objectives: 1. debugging uncertain knowledge bases in order to generate consistent knowledge graphs to make them accessible for logical reasoning, 2. combining probabilistic query answering and logical reasoning which in turn uses these consistent knowledge graphs to answer user queries, and 3. employing the aforementioned techniques to the problem of risk management in IT infrastructures, as a concrete real-world application. We show that in all those scenarios, users can benefit from incorporating uncertainty in the knowledge base. Furthermore, we conduct experiments that demonstrate the real- world scalability of the demonstrated approaches. Overall, we argue that integrating uncertainty and logical reasoning, despite being theoretically intractable, is feasible in real-world application and warrants further research

MAnnheim DOCument Server

Semantic-guided predictive modeling and relational learning within industrial knowledge graphs

Author: Ringsquandl Martin
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 25/09/2019
Field of study

The ubiquitous availability of data in today’s manufacturing environments, mainly driven by the extended usage of software and built-in sensing capabilities in automation systems, enables companies to embrace more advanced predictive modeling and analysis in order to optimize processes and usage of equipment. While the potential insight gained from such analysis is high, it often remains untapped, since integration and analysis of data silos from diﬀerent production domains requires high manual eﬀort and is therefore not economic. Addressing these challenges, digital representations of production equipment, so-called digital twins, have emerged leading the way to semantic interoperability across systems in diﬀerent domains. From a data modeling point of view, digital twins can be seen as industrial knowledge graphs, which are used as semantic backbone of manufacturing software systems and data analytics. Due to the prevalent historically grown and scattered manufacturing software system landscape that is comprising of numerous proprietary information models, data sources are highly heterogeneous. Therefore, there is an increasing need for semi-automatic support in data modeling, enabling end-user engineers to model their domain and maintain a uniﬁed semantic knowledge graph across the company. Once the data modeling and integration is done, further challenges arise, since there has been little research on how knowledge graphs can contribute to the simpliﬁcation and abstraction of statistical analysis and predictive modeling, especially in manufacturing. In this thesis, new approaches for modeling and maintaining industrial knowledge graphs with focus on the application of statistical models are presented. First, concerning data modeling, we discuss requirements from several existing standard information models and analytic use cases in the manufacturing and automation system domains and derive a fragment of the OWL 2 language that is expressive enough to cover the required semantics for a broad range of use cases. The prototypical implementation enables domain end-users, i.e. engineers, to extend the basis ontology model with intuitive semantics. Furthermore it supports eﬃcient reasoning and constraint checking via translation to rule-based representations. Based on these models, we propose an architecture for the end-user facilitated application of statistical models using ontological concepts and ontology-based data access paradigms. In addition to that we present an approach for domain knowledge-driven preparation of predictive models in terms of feature selection and show how schema-level reasoning in the OWL 2 language can be employed for this task within knowledge graphs of industrial automation systems. A production cycle time prediction model in an example application scenario serves as a proof of concept and demonstrates that axiomatized domain knowledge about features can give competitive performance compared to purely data-driven ones. In the case of high-dimensional data with small sample size, we show that graph kernels of domain ontologies can provide additional information on the degree of variable dependence. Furthermore, a special application of feature selection in graph-structured data is presented and we develop a method that allows to incorporate domain constraints derived from meta-paths in knowledge graphs in a branch-and-bound pattern enumeration algorithm. Lastly, we discuss maintenance of facts in large-scale industrial knowledge graphs focused on latent variable models for the automated population and completion of missing facts. State-of-the art approaches can not deal with time-series data in form of events that naturally occur in industrial applications. Therefore we present an extension of learning knowledge graph embeddings in conjunction with data in form of event logs. Finally, we design several use case scenarios of missing information and evaluate our embedding approach on data coming from a real-world factory environment. We draw the conclusion that industrial knowledge graphs are a powerful tool that can be used by end-users in the manufacturing domain for data modeling and model validation. They are especially suitable in terms of the facilitated application of statistical models in conjunction with background domain knowledge by providing information about features upfront. Furthermore, relational learning approaches showed great potential to semi-automatically infer missing facts and provide recommendations to production operators on how to keep stored facts in synch with the real world

Digitale Hochschulschriften der LMU

An integration-oriented ontology to govern evolution in big data ecosystems

Author: Abelló Gamazo Alberto
Nadal Francesch Sergi
Romero Moral Óscar
Vansummeren Stijn
Vassiliadis Panos
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their analytical processes after each API release. This gets more challenging when performing an integrated or historical analysis. To cope with such complexity, in this paper, we present the Big Data Integration ontology, the core construct to govern the data integration process under schema evolution by systematically annotating it with information regarding the schema of the sources. We present a query rewriting algorithm that, using the annotated ontology, converts queries posed over the ontology to queries over the sources. To cope with syntactic evolution in the sources, we present an algorithm that semi-automatically adapts the ontology upon new releases. This guarantees ontology-mediated queries to correctly retrieve data from the most recent schema version as well as correctness in historical queries. A functional and performance evaluation on real-world APIs is performed to validate our approach.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

An Integration-Oriented Ontology to Govern Evolution in Big Data Ecosystems

Author: Alban Caporossi
André Lecoanet
Anne-Marie Dols
Arnaud Seigneurin
Bastien Boussat
Patrice François
Publication venue
Publication date: 01/01/2018
Field of study

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Hal - Université Grenoble Alpes

Directory of Open Access Journals

HAL Descartes

DI-fusion

FigShare

Recommended from our members

Ontology-based end-user visual query formulation: Why, what, who, how, and which?

Author: A Cali
A D’Ulizia
A Gomez-Perez
A Harth
A Jimeno-Yepes
A Katifori
A McAfee
A Segev
A Soylu
A Soylu
A Soylu
A Soylu
A Soylu
AHM Hofstede Ter
Ahmet Soylu
AK Dey
AS Dadzie
B Glimm
B Henderson-Sellers
B Shneiderman
B Shneiderman
BC Grau
BR Gaines
C Beshers
C Bettini
C Bizer
C Bobed
C Civili
C Martinez-Cruz
D Braga
D Damljanovic
D Howe
DE Spanos
Dmitriy Zheleznyakov
E Kapetanios
E Kaufmann
EF Codd
EF Codd
EF Codd
Ernesto Jimenez-Ruiz
Evgeny Kharlamov
F Benzi
F Fonseca
F Ham van
G Allen
G Lindgaard
G Marchionini
G Marchionini
G Tummarello
GL Lohse
H Kondylakis
H Storrle
HJ Levesque
Ian Horrocks
J Claussen
J Coutaz
J Gersh
J Kawash
J Mackinlay
J Minker
J Nielsen
J Nielsen
JA Gallud
JA Konstan
JF Sequeda
JM Brunetti
K Munir
K Siau
K Zheng
KL Siau
KY Whang
L Certo
L Cinque
LJ Campbell
M Angelaccioa
M Erwig
M Giese
M Kifer
M Latapy
M Salehie
M Turk
MA Hearst
Martin Giese
MC Schraefel
ML Wilson
MM Burnett
MM Zloof
MR Kogalovsky
MYM Yen
N Bevan
NH Balkir
O Kolomiyets
P Besnard
P Ingwersen
PD Bruza
PK Chen
PK Robertson
R Baeza-Yates
R Cassino
R Stevens
R Studer
RG Epstein
RM Friedhoff
RN Cuff
RW White
S Krivov
S Lederman
S Madden
S Philippi
S Spiekermann
T Berners-Lee
T Catarci
T Catarci
T Eiter
T Halpin
T Tran
TR Gruber
V Lopez
V Uren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Value creation in an organisation is a time-sensitive and data-intensive process, yet it is often delayed and bounded by the reliance on IT experts extracting data for domain experts. Hence, there is a need for providing people who are not professional developers with the flexibility to pose relatively complex and ad hoc queries in an easy and intuitive way. In this respect, visual methods for query formulation undertake the challenge of making querying independent of users’ technical skills and the knowledge of the underlying textual query language and the structure of data. An ontology is more promising than the logical schema of the underlying data for guiding users in formulating queries, since it provides a richer vocabulary closer to the users’ understanding. However, on the one hand, today the most of world’s enterprise data reside in relational databases rather than triple stores, and on the other, visual query formulation has become more compelling due to ever-increasing data size and complexity—known as Big Data. This article presents and argues for ontology-based visual query formulation for end-users; discusses its feasibility in terms of ontology-based data access, which virtualises legacy relational databases as RDF, and the dimensions of Big Data; presents key conceptual aspects and dimensions, challenges, and requirements; and reviews, categorises, and discusses notable approaches and systems

City Research Online

Crossref

NORA - Norwegian Open Research Archives

Probabilistic techniques for bridging the semantic gap in schema alignment

Author: ANASTASIO Francesca
Publication venue: place:Palermo
Publication date
Field of study

Connecting pieces of informations from heterogeneous sources sharing the same domain is an open challenge in Semantic Web, Big Data and business communities. The main problem in this research area is to bridge the expressiveness gap between relational databases and ontologies. In general, an ontology is more expressive and captures more semantic information behind data than a relational database does. On the other side, databases are the most common used persistent storage system and they grant beneﬁts such as security and data integrity but they need to be managed by expert users. The problem is quite signiﬁcant above all when enterprise or corporate ontologies are used to share infomations coming from diﬀerent databases and where a more eﬃcient data management is auspicable for interoperability purposes. The main motivations on this thesis are related to the database access via ontology, as in the OBDA (Ontology Based Data Access) scenario, wich provides a formal speciﬁcation of the domain close to the human’s view, while technical details of the database are hidden from end-user, and also the persistent storageof ontologies in databases for facilitating search and retrieval, keeping the beneﬁts of database management systems. In these cases the assertion component (A-Box) is usually stored into a database, and terminological one (T-Box) is mantained in an ontology. So it is more necessary to align schemas than matching instances. The term alignment can be used to deﬁne the whole process comprising the mapping process between two existent heterogeneous sources, such as ontology and relational database, and the trasformation process from a representation to the other one, such as ontology-to-database and database-to-ontology. Deﬁning mappings manually is an hard task expecially for large and complex data representations and existing methodologies fail in loosing some contents and several elements are left unaligned. In this thesis are discussed various aspects of the alignment in all these senses. The presented techniques are based on a probabilistic approach that ﬁts well on the uncertain alignment process, where are involved two diﬀerent representations with a diﬀerent level of expressiveness. In the methodology ontologies and databases are described in terms of Ontology Web Language (OWL) and Entity-Relationship Diagram (ERD) lexical descriptions. So, the ontologies are represented by a set of OWL axioms while a properly deﬁned Context-Free Grammar (CFG) is used to represent ERDs (Entity-Relationship Diagrams) as a set of sentences. Both the OWL → ERD transformation and the mapping rely on HMMs (Hidden Markov Models) to estimate the most likely sequence of ERD symbols observing OWL symbols. In the model deﬁnition OWL constructs are the observable states, while the ERD symbols are the hidden states. The tools developed, one for OWL → ERD transformation purpose, called OMEGA (Ontology → Markov → ERD Generator Application) and one for mapping OWL and ERD, called HOwErd (HMM OWL-ERD) own their own GUI interface for showing the alignment results. Finally, HOwErd is compared with the most widespread tools in the reference literature

Archivio istituzionale della ricerca - Università di Palermo