250 research outputs found

    Probabilistic techniques for bridging the semantic gap in schema alignment

    Get PDF
    Connecting pieces of informations from heterogeneous sources sharing the same domain is an open challenge in Semantic Web, Big Data and business communities. The main problem in this research area is to bridge the expressiveness gap between relational databases and ontologies. In general, an ontology is more expressive and captures more semantic information behind data than a relational database does. On the other side, databases are the most common used persistent storage system and they grant benefits such as security and data integrity but they need to be managed by expert users. The problem is quite significant above all when enterprise or corporate ontologies are used to share infomations coming from different databases and where a more efficient data management is auspicable for interoperability purposes. The main motivations on this thesis are related to the database access via ontology, as in the OBDA (Ontology Based Data Access) scenario, wich provides a formal specification of the domain close to the human’s view, while technical details of the database are hidden from end-user, and also the persistent storageof ontologies in databases for facilitating search and retrieval, keeping the benefits of database management systems. In these cases the assertion component (A-Box) is usually stored into a database, and terminological one (T-Box) is mantained in an ontology. So it is more necessary to align schemas than matching instances. The term alignment can be used to define the whole process comprising the mapping process between two existent heterogeneous sources, such as ontology and relational database, and the trasformation process from a representation to the other one, such as ontology-to-database and database-to-ontology. Defining mappings manually is an hard task expecially for large and complex data representations and existing methodologies fail in loosing some contents and several elements are left unaligned. In this thesis are discussed various aspects of the alignment in all these senses. The presented techniques are based on a probabilistic approach that fits well on the uncertain alignment process, where are involved two different representations with a different level of expressiveness. In the methodology ontologies and databases are described in terms of Ontology Web Language (OWL) and Entity-Relationship Diagram (ERD) lexical descriptions. So, the ontologies are represented by a set of OWL axioms while a properly defined Context-Free Grammar (CFG) is used to represent ERDs (Entity-Relationship Diagrams) as a set of sentences. Both the OWL → ERD transformation and the mapping rely on HMMs (Hidden Markov Models) to estimate the most likely sequence of ERD symbols observing OWL symbols. In the model definition OWL constructs are the observable states, while the ERD symbols are the hidden states. The tools developed, one for OWL → ERD transformation purpose, called OMEGA (Ontology → Markov → ERD Generator Application) and one for mapping OWL and ERD, called HOwErd (HMM OWL-ERD) own their own GUI interface for showing the alignment results. Finally, HOwErd is compared with the most widespread tools in the reference literature

    Ontology completion using graph convolutional networks

    Get PDF
    Black and white 8x10 acetate negativehttps://digitalmaine.com/arc_george_french_photos_f/1624/thumbnail.jp

    Knowledge extraction from unstructured data and classification through distributed ontologies

    Get PDF
    The World Wide Web has changed the way humans use and share any kind of information. The Web removed several access barriers to the information published and has became an enormous space where users can easily navigate through heterogeneous resources (such as linked documents) and can easily edit, modify, or produce them. Documents implicitly enclose information and relationships among them which become only accessible to human beings. Indeed, the Web of documents evolved towards a space of data silos, linked each other only through untyped references (such as hypertext references) where only humans were able to understand. A growing desire to programmatically access to pieces of data implicitly enclosed in documents has characterized the last efforts of the Web research community. Direct access means structured data, thus enabling computing machinery to easily exploit the linking of different data sources. It has became crucial for the Web community to provide a technology stack for easing data integration at large scale, first structuring the data using standard ontologies and afterwards linking them to external data. Ontologies became the best practices to define axioms and relationships among classes and the Resource Description Framework (RDF) became the basic data model chosen to represent the ontology instances (i.e. an instance is a value of an axiom, class or attribute). Data becomes the new oil, in particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. In the literature these problems have been addressed with several proposals and standards, that mainly focus on technologies to access the data and on formats to represent the semantics of the data and their relationships. With the increasing of the volume of interconnected and serialized RDF data, RDF repositories may suffer from data overloading and may become a single point of failure for the overall Linked Data vision. One of the goals of this dissertation is to propose a thorough approach to manage the large scale RDF repositories, and to distribute them in a redundant and reliable peer-to-peer RDF architecture. The architecture consists of a logic to distribute and mine the knowledge and of a set of physical peer nodes organized in a ring topology based on a Distributed Hash Table (DHT). Each node shares the same logic and provides an entry point that enables clients to query the knowledge base using atomic, disjunctive and conjunctive SPARQL queries. The consistency of the results is increased using data redundancy algorithm that replicates each RDF triple in multiple nodes so that, in the case of peer failure, other peers can retrieve the data needed to resolve the queries. Additionally, a distributed load balancing algorithm is used to maintain a uniform distribution of the data among the participating peers by dynamically changing the key space assigned to each node in the DHT. Recently, the process of data structuring has gained more and more attention when applied to the large volume of text information spread on the Web, such as legacy data, news papers, scientific papers or (micro-)blog posts. This process mainly consists in three steps: \emph{i)} the extraction from the text of atomic pieces of information, called named entities; \emph{ii)} the classification of these pieces of information through ontologies; \emph{iii)} the disambigation of them through Uniform Resource Identifiers (URIs) identifying real world objects. As a step towards interconnecting the web to real world objects via named entities, different techniques have been proposed. The second objective of this work is to propose a comparison of these approaches in order to highlight strengths and weaknesses in different scenarios such as scientific and news papers, or user generated contents. We created the Named Entity Recognition and Disambiguation (NERD) web framework, publicly accessible on the Web (through REST API and web User Interface), which unifies several named entity extraction technologies. Moreover, we proposed the NERD ontology, a reference ontology for comparing the results of these technologies. Recently, the NERD ontology has been included in the NIF (Natural language processing Interchange Format) specification, part of the Creating Knowledge out of Interlinked Data (LOD2) project. Summarizing, this dissertation defines a framework for the extraction of knowledge from unstructured data and its classification via distributed ontologies. A detailed study of the Semantic Web and knowledge extraction fields is proposed to define the issues taken under investigation in this work. Then, it proposes an architecture to tackle the single point of failure issue introduced by the RDF repositories spread within the Web. Although the use of ontologies enables a Web where data is structured and comprehensible by computing machinery, human users may take advantage of it especially for the annotation task. Hence, this work describes an annotation tool for web editing, audio and video annotation in a web front end User Interface powered on the top of a distributed ontology. Furthermore, this dissertation details a thorough comparison of the state of the art of named entity technologies. The NERD framework is presented as technology to encompass existing solutions in the named entity extraction field and the NERD ontology is presented as reference ontology in the field. Finally, this work highlights three use cases with the purpose to reduce the amount of data silos spread within the Web: a Linked Data approach to augment the automatic classification task in a Systematic Literature Review, an application to lift educational data stored in Sharable Content Object Reference Model (SCORM) data silos to the Web of data and a scientific conference venue enhancer plug on the top of several data live collectors. Significant research efforts have been devoted to combine the efficiency of a reliable data structure and the importance of data extraction techniques. This dissertation opens different research doors which mainly join two different research communities: the Semantic Web and the Natural Language Processing community. The Web provides a considerable amount of data where NLP techniques may shed the light within it. The use of the URI as a unique identifier may provide one milestone for the materialization of entities lifted from a raw text to real world object

    10. Interuniversitäres Doktorandenseminar Wirtschaftsinformatik Juli 2009

    Get PDF
    Begonnen im Jahr 2000, ist das Interuniversitäre Wirtschaftsinformatik-Doktorandenseminar mittlerweile zu einer schÜnen Tradition geworden. Zunächst unter Beteiligung der Universitäten Leipzig und Halle-Wittenberg gestartet. Seit 2003 wird das Seminar zusammen mit der Jenaer Universität durchgefßhrt, in diesem Jahr sind erstmals auch die Technische Universität Dresden und die TU Bergakademie Freiberg dabei. Ziel der Interuniversitären Doktorandenseminare ist der ßber die eigenen Institutsgrenzen hinausgehende Gedankenaustausch zu aktuellen, in Promotionsprojekten behandelten Forschungsthemen. Indem der Schwerpunkt der Vorträge auch auf das Forschungsdesign gelegt wird, bietet sich allen Doktoranden die MÜglichkeit, bereits in einer frßhen Phase ihrer Arbeit wichtige Hinweise und Anregungen aus einem breiten HÜrerspektrum zu bekommen. In den vorliegenden Research Papers sind elf Beiträge zum diesjährigen Doktorandenseminar in Jena enthalten. Sie stecken ein weites Feld ab - vom Data Mining und Wissensmanagement ßber die Unterstßtzung von Prozessen in Unternehmen bis hin zur RFID-Technologie. Die Wirtschaftsinformatik als typische Bindestrich-Informatik hat den Ruf einer thematischen Breite. Die Dissertationsprojekte aus fßnf Universitäten belegen dies eindrucksvoll.

    Intelligent Information Access to Linked Data - Weaving the Cultural Heritage Web

    Get PDF
    The subject of the dissertation is an information alignment experiment of two cultural heritage information systems (ALAP): The Perseus Digital Library and Arachne. In modern societies, information integration is gaining importance for many tasks such as business decision making or even catastrophe management. It is beyond doubt that the information available in digital form can offer users new ways of interaction. Also, in the humanities and cultural heritage communities, more and more information is being published online. But in many situations the way that information has been made publicly available is disruptive to the research process due to its heterogeneity and distribution. Therefore integrated information will be a key factor to pursue successful research, and the need for information alignment is widely recognized. ALAP is an attempt to integrate information from Perseus and Arachne, not only on a schema level, but to also perform entity resolution. To that end, technical peculiarities and philosophical implications of the concepts of identity and co-reference are discussed. Multiple approaches to information integration and entity resolution are discussed and evaluated. The methodology that is used to implement ALAP is mainly rooted in the fields of information retrieval and knowledge discovery. First, an exploratory analysis was performed on both information systems to get a first impression of the data. After that, (semi-)structured information from both systems was extracted and normalized. Then, a clustering algorithm was used to reduce the number of needed entity comparisons. Finally, a thorough matching was performed on the different clusters. ALAP helped with identifying challenges and highlighted the opportunities that arise during the attempt to align cultural heritage information systems

    Scalable integration of uncertainty reasoning and semantic web technologies

    Full text link
    In recent years formal logical standards for knowledge representation to model real world knowledge and domains and make them accessible for computers gained a lot of trac- tion. They provide an expressive logical framework for modeling, consistency checking, reasoning, and query answering, and have proven to be versatile methods to capture knowledge of various fields. Those formalisms and methods focus on specifying knowl- edge as precisely as possible. At the same time, many applications in particular on the Semantic Web have to deal with uncertainty in their data; and handling uncertain knowledge is crucial in many real- world domains. However, regular logic is unable to capture the real-world properly due to its inherent complexity and uncertainty, all the while handling uncertain or incomplete information is getting more and more important in applications like expert system, data integration or information extraction. The overall objective of this dissertation is to identify scenarios and datasets where methods that incorporate their inherent uncertainty improve results, and investigate approaches and tools that are suitable for the respective task. In summary, this work is set out to tackle the following objectives: 1. debugging uncertain knowledge bases in order to generate consistent knowledge graphs to make them accessible for logical reasoning, 2. combining probabilistic query answering and logical reasoning which in turn uses these consistent knowledge graphs to answer user queries, and 3. employing the aforementioned techniques to the problem of risk management in IT infrastructures, as a concrete real-world application. We show that in all those scenarios, users can benefit from incorporating uncertainty in the knowledge base. Furthermore, we conduct experiments that demonstrate the real- world scalability of the demonstrated approaches. Overall, we argue that integrating uncertainty and logical reasoning, despite being theoretically intractable, is feasible in real-world application and warrants further research
    • …