Search CORE

250 research outputs found

Probabilistic techniques for bridging the semantic gap in schema alignment

Author: ANASTASIO Francesca
Publication venue: place:Palermo
Publication date
Field of study

Connecting pieces of informations from heterogeneous sources sharing the same domain is an open challenge in Semantic Web, Big Data and business communities. The main problem in this research area is to bridge the expressiveness gap between relational databases and ontologies. In general, an ontology is more expressive and captures more semantic information behind data than a relational database does. On the other side, databases are the most common used persistent storage system and they grant beneﬁts such as security and data integrity but they need to be managed by expert users. The problem is quite signiﬁcant above all when enterprise or corporate ontologies are used to share infomations coming from diﬀerent databases and where a more eﬃcient data management is auspicable for interoperability purposes. The main motivations on this thesis are related to the database access via ontology, as in the OBDA (Ontology Based Data Access) scenario, wich provides a formal speciﬁcation of the domain close to the human’s view, while technical details of the database are hidden from end-user, and also the persistent storageof ontologies in databases for facilitating search and retrieval, keeping the beneﬁts of database management systems. In these cases the assertion component (A-Box) is usually stored into a database, and terminological one (T-Box) is mantained in an ontology. So it is more necessary to align schemas than matching instances. The term alignment can be used to deﬁne the whole process comprising the mapping process between two existent heterogeneous sources, such as ontology and relational database, and the trasformation process from a representation to the other one, such as ontology-to-database and database-to-ontology. Deﬁning mappings manually is an hard task expecially for large and complex data representations and existing methodologies fail in loosing some contents and several elements are left unaligned. In this thesis are discussed various aspects of the alignment in all these senses. The presented techniques are based on a probabilistic approach that ﬁts well on the uncertain alignment process, where are involved two diﬀerent representations with a diﬀerent level of expressiveness. In the methodology ontologies and databases are described in terms of Ontology Web Language (OWL) and Entity-Relationship Diagram (ERD) lexical descriptions. So, the ontologies are represented by a set of OWL axioms while a properly deﬁned Context-Free Grammar (CFG) is used to represent ERDs (Entity-Relationship Diagrams) as a set of sentences. Both the OWL → ERD transformation and the mapping rely on HMMs (Hidden Markov Models) to estimate the most likely sequence of ERD symbols observing OWL symbols. In the model deﬁnition OWL constructs are the observable states, while the ERD symbols are the hidden states. The tools developed, one for OWL → ERD transformation purpose, called OMEGA (Ontology → Markov → ERD Generator Application) and one for mapping OWL and ERD, called HOwErd (HMM OWL-ERD) own their own GUI interface for showing the alignment results. Finally, HOwErd is compared with the most widespread tools in the reference literature

Archivio istituzionale della ricerca - Università di Palermo

Ontology completion using graph convolutional networks

Author: Li Na
Bouraoui Zied
Schockaert Steven
Publication venue
Publication date: 16/01/2020
Field of study

Black and white 8x10 acetate negativehttps://digitalmaine.com/arc_george_french_photos_f/1624/thumbnail.jp

Online Research @ Cardiff

Maine State Library

Maine State Documents (Maine State Library)

Knowledge extraction from unstructured data and classification through distributed ontologies

Author: Rizzo Giuseppe
Publication venue
Publication date: 01/01/2012
Field of study

The World Wide Web has changed the way humans use and share any kind of information. The Web removed several access barriers to the information published and has became an enormous space where users can easily navigate through heterogeneous resources (such as linked documents) and can easily edit, modify, or produce them. Documents implicitly enclose information and relationships among them which become only accessible to human beings. Indeed, the Web of documents evolved towards a space of data silos, linked each other only through untyped references (such as hypertext references) where only humans were able to understand. A growing desire to programmatically access to pieces of data implicitly enclosed in documents has characterized the last efforts of the Web research community. Direct access means structured data, thus enabling computing machinery to easily exploit the linking of different data sources. It has became crucial for the Web community to provide a technology stack for easing data integration at large scale, first structuring the data using standard ontologies and afterwards linking them to external data. Ontologies became the best practices to define axioms and relationships among classes and the Resource Description Framework (RDF) became the basic data model chosen to represent the ontology instances (i.e. an instance is a value of an axiom, class or attribute). Data becomes the new oil, in particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. In the literature these problems have been addressed with several proposals and standards, that mainly focus on technologies to access the data and on formats to represent the semantics of the data and their relationships. With the increasing of the volume of interconnected and serialized RDF data, RDF repositories may suffer from data overloading and may become a single point of failure for the overall Linked Data vision. One of the goals of this dissertation is to propose a thorough approach to manage the large scale RDF repositories, and to distribute them in a redundant and reliable peer-to-peer RDF architecture. The architecture consists of a logic to distribute and mine the knowledge and of a set of physical peer nodes organized in a ring topology based on a Distributed Hash Table (DHT). Each node shares the same logic and provides an entry point that enables clients to query the knowledge base using atomic, disjunctive and conjunctive SPARQL queries. The consistency of the results is increased using data redundancy algorithm that replicates each RDF triple in multiple nodes so that, in the case of peer failure, other peers can retrieve the data needed to resolve the queries. Additionally, a distributed load balancing algorithm is used to maintain a uniform distribution of the data among the participating peers by dynamically changing the key space assigned to each node in the DHT. Recently, the process of data structuring has gained more and more attention when applied to the large volume of text information spread on the Web, such as legacy data, news papers, scientific papers or (micro-)blog posts. This process mainly consists in three steps: \emph{i)} the extraction from the text of atomic pieces of information, called named entities; \emph{ii)} the classification of these pieces of information through ontologies; \emph{iii)} the disambigation of them through Uniform Resource Identifiers (URIs) identifying real world objects. As a step towards interconnecting the web to real world objects via named entities, different techniques have been proposed. The second objective of this work is to propose a comparison of these approaches in order to highlight strengths and weaknesses in different scenarios such as scientific and news papers, or user generated contents. We created the Named Entity Recognition and Disambiguation (NERD) web framework, publicly accessible on the Web (through REST API and web User Interface), which unifies several named entity extraction technologies. Moreover, we proposed the NERD ontology, a reference ontology for comparing the results of these technologies. Recently, the NERD ontology has been included in the NIF (Natural language processing Interchange Format) specification, part of the Creating Knowledge out of Interlinked Data (LOD2) project. Summarizing, this dissertation defines a framework for the extraction of knowledge from unstructured data and its classification via distributed ontologies. A detailed study of the Semantic Web and knowledge extraction fields is proposed to define the issues taken under investigation in this work. Then, it proposes an architecture to tackle the single point of failure issue introduced by the RDF repositories spread within the Web. Although the use of ontologies enables a Web where data is structured and comprehensible by computing machinery, human users may take advantage of it especially for the annotation task. Hence, this work describes an annotation tool for web editing, audio and video annotation in a web front end User Interface powered on the top of a distributed ontology. Furthermore, this dissertation details a thorough comparison of the state of the art of named entity technologies. The NERD framework is presented as technology to encompass existing solutions in the named entity extraction field and the NERD ontology is presented as reference ontology in the field. Finally, this work highlights three use cases with the purpose to reduce the amount of data silos spread within the Web: a Linked Data approach to augment the automatic classification task in a Systematic Literature Review, an application to lift educational data stored in Sharable Content Object Reference Model (SCORM) data silos to the Web of data and a scientific conference venue enhancer plug on the top of several data live collectors. Significant research efforts have been devoted to combine the efficiency of a reliable data structure and the importance of data extraction techniques. This dissertation opens different research doors which mainly join two different research communities: the Semantic Web and the Natural Language Processing community. The Web provides a considerable amount of data where NLP techniques may shed the light within it. The use of the URI as a unique identifier may provide one milestone for the materialization of entities lifted from a raw text to real world object

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Ontology completion using graph convolutional networks

Author: Bouraoui Zied
Li Na
Schockaert Steven
Publication venue
Publication date
Field of study

Online Research @ Cardiff

10. Interuniversitäres Doktorandenseminar Wirtschaftsinformatik Juli 2009

Author: Johannes Ruhland (Hrsg.)
Kathrin Kirchner (Hrsg.)
Publication venue
Publication date
Field of study

Begonnen im Jahr 2000, ist das Interuniversitäre Wirtschaftsinformatik-Doktorandenseminar mittlerweile zu einer schönen Tradition geworden. Zunächst unter Beteiligung der Universitäten Leipzig und Halle-Wittenberg gestartet. Seit 2003 wird das Seminar zusammen mit der Jenaer Universität durchgeführt, in diesem Jahr sind erstmals auch die Technische Universität Dresden und die TU Bergakademie Freiberg dabei. Ziel der Interuniversitären Doktorandenseminare ist der über die eigenen Institutsgrenzen hinausgehende Gedankenaustausch zu aktuellen, in Promotionsprojekten behandelten Forschungsthemen. Indem der Schwerpunkt der Vorträge auch auf das Forschungsdesign gelegt wird, bietet sich allen Doktoranden die Möglichkeit, bereits in einer frühen Phase ihrer Arbeit wichtige Hinweise und Anregungen aus einem breiten Hörerspektrum zu bekommen. In den vorliegenden Research Papers sind elf Beiträge zum diesjährigen Doktorandenseminar in Jena enthalten. Sie stecken ein weites Feld ab - vom Data Mining und Wissensmanagement über die Unterstützung von Prozessen in Unternehmen bis hin zur RFID-Technologie. Die Wirtschaftsinformatik als typische Bindestrich-Informatik hat den Ruf einer thematischen Breite. Die Dissertationsprojekte aus fünf Universitäten belegen dies eindrucksvoll.

Research Papers in Economics

Intelligent Information Access to Linked Data - Weaving the Cultural Heritage Web

Author: Kummer Robert
Publication venue
Publication date: 01/01/2013
Field of study

The subject of the dissertation is an information alignment experiment of two cultural heritage information systems (ALAP): The Perseus Digital Library and Arachne. In modern societies, information integration is gaining importance for many tasks such as business decision making or even catastrophe management. It is beyond doubt that the information available in digital form can offer users new ways of interaction. Also, in the humanities and cultural heritage communities, more and more information is being published online. But in many situations the way that information has been made publicly available is disruptive to the research process due to its heterogeneity and distribution. Therefore integrated information will be a key factor to pursue successful research, and the need for information alignment is widely recognized. ALAP is an attempt to integrate information from Perseus and Arachne, not only on a schema level, but to also perform entity resolution. To that end, technical peculiarities and philosophical implications of the concepts of identity and co-reference are discussed. Multiple approaches to information integration and entity resolution are discussed and evaluated. The methodology that is used to implement ALAP is mainly rooted in the fields of information retrieval and knowledge discovery. First, an exploratory analysis was performed on both information systems to get a first impression of the data. After that, (semi-)structured information from both systems was extracted and normalized. Then, a clustering algorithm was used to reduce the number of needed entity comparisons. Finally, a thorough matching was performed on the different clusters. ALAP helped with identifying challenges and highlighted the opportunities that arise during the attempt to align cultural heritage information systems

Kölner UniversitätsPublikationsServer

Recommended from our members

Facilitating file retrieval on resource limited devices

Author: Jan Sadaqat
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2011
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The rapid development of mobile technologies has facilitated users to generate and store files on mobile devices. However, it has become a challenging issue for users to search efficiently and effectively for files of interest in a mobile environment that involves a large number of mobile nodes. In this thesis, file management and retrieval alternatives have been investigated to propose a feasible framework that can be employed on resource-limited devices without altering their operating systems. The file annotation and retrieval framework (FARM) proposed in the thesis automatically annotates the files with their basic file attributes by extracting them from the underlying operating system of the device. The framework is implemented in the JME platform as a case study. This framework provides a variety of features for managing the metadata and file search features on the device itself and on other devices in a networked environment. FARM not only automates the file-search process but also provides accurate results as demonstrated by the experimental analysis. In order to facilitate a file search and take advantage of the Semantic Web Technologies, the SemFARM framework is proposed which utilizes the knowledge of a generic ontology. The generic ontology defines the most common keywords that can be used as the metadata of stored files. This provides semantic-based file search capabilities on low-end devices where the search keywords are enriched with additional knowledge extracted from the defined ontology. The existing frameworks annotate image files only, while SemFARM can be used to annotate all types of files. Semantic heterogeneity is a challenging issue and necessitates extensive research to accomplish the aim of a semantic web. For this reason, significant research efforts have been made in recent years by proposing an enormous number of ontology alignment systems to deal with ontology heterogeneities. In the process of aligning different ontologies, it is essential to encompass their semantic, structural or any system-specific measures in mapping decisions to produce more accurate alignments. The proposed solution, in this thesis, for ontology alignment presents a structural matcher, which computes the similarity between the super-classes, sub-classes and properties of two entities from different ontologies that require aligning. The proposed alignment system (OARS) uses Rough Sets to aggregate the results obtained from various matchers in order to deal with uncertainties during the mapping process of entities. The OARS uses a combinational approach by using a string-based and linguistic-based matcher, in addition to structural-matcher for computing the overall similarity between two entities. The performance of the OARS is evaluated in comparison with existing state of the art alignment systems in terms of precision and recall. The performance tests are performed by using benchmark ontologies and the results show significant improvements, specifically in terms of recall on all groups of test ontologies. There is no such existing framework, which can use alignments for file search on mobile devices. The ontology alignment paradigm is integrated in the SemFARM to further enhance the file search features of the framework as it utilises the knowledge of more than one ontology in order to perform a search query. The experimental evaluations show that it performs better in terms of precision and recall where more than one ontology is available when searching for a required file.Education Commission of Pakistan and the University of Engineering & Technology, Peshawa

Brunel University Research Archive

Scalable integration of uncertainty reasoning and semantic web technologies

Author: Schönfisch Jörg
Publication venue
Publication date: 01/01/2018
Field of study

In recent years formal logical standards for knowledge representation to model real world knowledge and domains and make them accessible for computers gained a lot of trac- tion. They provide an expressive logical framework for modeling, consistency checking, reasoning, and query answering, and have proven to be versatile methods to capture knowledge of various fields. Those formalisms and methods focus on specifying knowl- edge as precisely as possible. At the same time, many applications in particular on the Semantic Web have to deal with uncertainty in their data; and handling uncertain knowledge is crucial in many real- world domains. However, regular logic is unable to capture the real-world properly due to its inherent complexity and uncertainty, all the while handling uncertain or incomplete information is getting more and more important in applications like expert system, data integration or information extraction. The overall objective of this dissertation is to identify scenarios and datasets where methods that incorporate their inherent uncertainty improve results, and investigate approaches and tools that are suitable for the respective task. In summary, this work is set out to tackle the following objectives: 1. debugging uncertain knowledge bases in order to generate consistent knowledge graphs to make them accessible for logical reasoning, 2. combining probabilistic query answering and logical reasoning which in turn uses these consistent knowledge graphs to answer user queries, and 3. employing the aforementioned techniques to the problem of risk management in IT infrastructures, as a concrete real-world application. We show that in all those scenarios, users can benefit from incorporating uncertainty in the knowledge base. Furthermore, we conduct experiments that demonstrate the real- world scalability of the demonstrated approaches. Overall, we argue that integrating uncertainty and logical reasoning, despite being theoretically intractable, is feasible in real-world application and warrants further research

MAnnheim DOCument Server