1,373 research outputs found

    PlanetOnto: from news publishing to integrated knowledge management support

    Get PDF
    Given a scenario in which members of an academic community collaboratively construct and share an archive of news items, several knowledge management challenges arise. The authors' integrated suite of tools, called PlanetOnto, supports a speedy but high quality publishing process, allows ontology-driven document formalization and augments standard browsing and search facilities with deductive knowledge retrieva

    Ontology-driven document enrichment: principles, tools and applications

    Get PDF
    In this paper, we present an approach to document enrichment, which consists of developing and integrating formal knowledge models with archives of documents, to provide intelligent knowledge retrieval and (possibly) additional knowledge-intensive services, beyond what is currently available using ā€œstandardā€ information retrieval and search facilities. Our approach is ontology-driven, in the sense that the construction of the knowledge model is carried out in a top-down fashion, by populating a given ontology, rather than in a bottom-up fashion, by annotating a particular document. In this paper, we give an overview of the approach and we examine the various types of issues (e.g. modelling, organizational and user interface issues) which need to be tackled to effectively deploy our approach in the workplace. In addition, we also discuss a number of technologies we have developed to support ontology-driven document enrichment and we illustrate our ideas in the domains of electronic news publishing, scholarly discourse and medical guidelines

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and ā€œenablersā€, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Interoperability of semantics in news production

    Get PDF

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Knowledge extraction from unstructured data and classification through distributed ontologies

    Get PDF
    The World Wide Web has changed the way humans use and share any kind of information. The Web removed several access barriers to the information published and has became an enormous space where users can easily navigate through heterogeneous resources (such as linked documents) and can easily edit, modify, or produce them. Documents implicitly enclose information and relationships among them which become only accessible to human beings. Indeed, the Web of documents evolved towards a space of data silos, linked each other only through untyped references (such as hypertext references) where only humans were able to understand. A growing desire to programmatically access to pieces of data implicitly enclosed in documents has characterized the last efforts of the Web research community. Direct access means structured data, thus enabling computing machinery to easily exploit the linking of different data sources. It has became crucial for the Web community to provide a technology stack for easing data integration at large scale, first structuring the data using standard ontologies and afterwards linking them to external data. Ontologies became the best practices to define axioms and relationships among classes and the Resource Description Framework (RDF) became the basic data model chosen to represent the ontology instances (i.e. an instance is a value of an axiom, class or attribute). Data becomes the new oil, in particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. In the literature these problems have been addressed with several proposals and standards, that mainly focus on technologies to access the data and on formats to represent the semantics of the data and their relationships. With the increasing of the volume of interconnected and serialized RDF data, RDF repositories may suffer from data overloading and may become a single point of failure for the overall Linked Data vision. One of the goals of this dissertation is to propose a thorough approach to manage the large scale RDF repositories, and to distribute them in a redundant and reliable peer-to-peer RDF architecture. The architecture consists of a logic to distribute and mine the knowledge and of a set of physical peer nodes organized in a ring topology based on a Distributed Hash Table (DHT). Each node shares the same logic and provides an entry point that enables clients to query the knowledge base using atomic, disjunctive and conjunctive SPARQL queries. The consistency of the results is increased using data redundancy algorithm that replicates each RDF triple in multiple nodes so that, in the case of peer failure, other peers can retrieve the data needed to resolve the queries. Additionally, a distributed load balancing algorithm is used to maintain a uniform distribution of the data among the participating peers by dynamically changing the key space assigned to each node in the DHT. Recently, the process of data structuring has gained more and more attention when applied to the large volume of text information spread on the Web, such as legacy data, news papers, scientific papers or (micro-)blog posts. This process mainly consists in three steps: \emph{i)} the extraction from the text of atomic pieces of information, called named entities; \emph{ii)} the classification of these pieces of information through ontologies; \emph{iii)} the disambigation of them through Uniform Resource Identifiers (URIs) identifying real world objects. As a step towards interconnecting the web to real world objects via named entities, different techniques have been proposed. The second objective of this work is to propose a comparison of these approaches in order to highlight strengths and weaknesses in different scenarios such as scientific and news papers, or user generated contents. We created the Named Entity Recognition and Disambiguation (NERD) web framework, publicly accessible on the Web (through REST API and web User Interface), which unifies several named entity extraction technologies. Moreover, we proposed the NERD ontology, a reference ontology for comparing the results of these technologies. Recently, the NERD ontology has been included in the NIF (Natural language processing Interchange Format) specification, part of the Creating Knowledge out of Interlinked Data (LOD2) project. Summarizing, this dissertation defines a framework for the extraction of knowledge from unstructured data and its classification via distributed ontologies. A detailed study of the Semantic Web and knowledge extraction fields is proposed to define the issues taken under investigation in this work. Then, it proposes an architecture to tackle the single point of failure issue introduced by the RDF repositories spread within the Web. Although the use of ontologies enables a Web where data is structured and comprehensible by computing machinery, human users may take advantage of it especially for the annotation task. Hence, this work describes an annotation tool for web editing, audio and video annotation in a web front end User Interface powered on the top of a distributed ontology. Furthermore, this dissertation details a thorough comparison of the state of the art of named entity technologies. The NERD framework is presented as technology to encompass existing solutions in the named entity extraction field and the NERD ontology is presented as reference ontology in the field. Finally, this work highlights three use cases with the purpose to reduce the amount of data silos spread within the Web: a Linked Data approach to augment the automatic classification task in a Systematic Literature Review, an application to lift educational data stored in Sharable Content Object Reference Model (SCORM) data silos to the Web of data and a scientific conference venue enhancer plug on the top of several data live collectors. Significant research efforts have been devoted to combine the efficiency of a reliable data structure and the importance of data extraction techniques. This dissertation opens different research doors which mainly join two different research communities: the Semantic Web and the Natural Language Processing community. The Web provides a considerable amount of data where NLP techniques may shed the light within it. The use of the URI as a unique identifier may provide one milestone for the materialization of entities lifted from a raw text to real world object

    Converting a Controlled Vocabulary into an Ontology: the Case of GEM

    Get PDF
    The prevalance of digital information raised issues regarding the suitability of conventional library tools for organizing information. The multi-dimensionality of digital resources requires a more versatile and flexible representation to accommodate intelligent information representation and retrieval. Ontologies are used as a solution to such issues in many application domains, mainly due to their ability explicitly to specify the semantics and relations and to express them in a computer understandable language. Conventional knowledge organization tools such as classifications and thesauri resemble ontologies in a way that they define concepts and relationships in a systematic manner, but they are less expressive than ontologies when it comes to machine language. This paper used the controlled vocabulary at the Gateway to Educational Materials (GEM) as an example to address the issues in representing digital resources. The theoretical and methodological framework in this paper serves as the rationale and guideline for converting the GEM controlled vocabulary into an ontology. Compared to the original semantic model of GEM controlled vocabulary, the major difference between the two models lies in the values added through deeper semantics in describing digital objects, both conceptually and relationally

    Uma visĆ£o geral sobre ontologias: pesquisa sobre definiƧƵes, tipos, aplicaƧƵes, mĆ©todos de avaliaĆ§Ć£o e de construĆ§Ć£o

    Get PDF
    Os estudos sobre a organizaĆ§Ć£o da informaĆ§Ć£o tem recebido cada vez mais importĆ¢ncia Ć  medida que o nĆŗmero crescente de fontes de dados disponĆ­veis dificulta a recuperaĆ§Ć£o da informaĆ§Ć£o. Nos Ćŗltimos anos, vĆ”rios trabalhos tĆŖm destacado o uso de ontologias como alternativa para a organizaĆ§Ć£o da informaĆ§Ć£o. Encontram-se na literatura abordagens das mais variadas sobre o assunto. Esse artigo objetiva proporcionar uma visĆ£o geral sobre o estado-da-arte no estudo de ontologias. Apresentam-se definiƧƵes para o termo, uma breve discussĆ£o sobre seu significado, tipos de ontologias, propostas para aplicaƧƵes em diferentes domĆ­nios de conhecimento e propostas para a construĆ§Ć£o de ontologias (metodologias, ferramentas e linguagens)
    • ā€¦
    corecore