4,197 research outputs found

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Knowledge Organization Systems (KOS) in the Semantic Web: A Multi-Dimensional Review

    Full text link
    Since the Simple Knowledge Organization System (SKOS) specification and its SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a significant number of conventional knowledge organization systems (KOS) (including thesauri, classification schemes, name authorities, and lists of codes and terms, produced before the arrival of the ontology-wave) have made their journeys to join the Semantic Web mainstream. This paper uses "LOD KOS" as an umbrella term to refer to all of the value vocabularies and lightweight ontologies within the Semantic Web framework. The paper provides an overview of what the LOD KOS movement has brought to various communities and users. These are not limited to the colonies of the value vocabulary constructors and providers, nor the catalogers and indexers who have a long history of applying the vocabularies to their products. The LOD dataset producers and LOD service providers, the information architects and interface designers, and researchers in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper examines a set of the collected cases (experimental or in real applications) and aims to find the usages of LOD KOS in order to share the practices and ideas among communities and users. Through the viewpoints of a number of different user groups, the functions of LOD KOS are examined from multiple dimensions. This paper focuses on the LOD dataset producers, vocabulary producers, and researchers (as end-users of KOS).Comment: 31 pages, 12 figures, accepted paper in International Journal on Digital Librarie

    Application of Semantics to Solve Problems in Life Sciences

    Get PDF
    Fecha de lectura de Tesis: 10 de diciembre de 2018La cantidad de informaciĂłn que se genera en la Web se ha incrementado en los Ășltimos años. La mayor parte de esta informaciĂłn se encuentra accesible en texto, siendo el ser humano el principal usuario de la Web. Sin embargo, a pesar de todos los avances producidos en el ĂĄrea del procesamiento del lenguaje natural, los ordenadores tienen problemas para procesar esta informaciĂłn textual. En este cotexto, existen dominios de aplicaciĂłn en los que se estĂĄn publicando grandes cantidades de informaciĂłn disponible como datos estructurados como en el ĂĄrea de las Ciencias de la Vida. El anĂĄlisis de estos datos es de vital importancia no sĂłlo para el avance de la ciencia, sino para producir avances en el ĂĄmbito de la salud. Sin embargo, estos datos estĂĄn localizados en diferentes repositorios y almacenados en diferentes formatos que hacen difĂ­cil su integraciĂłn. En este contexto, el paradigma de los Datos Vinculados como una tecnologĂ­a que incluye la aplicaciĂłn de algunos estĂĄndares propuestos por la comunidad W3C tales como HTTP URIs, los estĂĄndares RDF y OWL. Haciendo uso de esta tecnologĂ­a, se ha desarrollado esta tesis doctoral basada en cubrir los siguientes objetivos principales: 1) promover el uso de los datos vinculados por parte de la comunidad de usuarios del ĂĄmbito de las Ciencias de la Vida 2) facilitar el diseño de consultas SPARQL mediante el descubrimiento del modelo subyacente en los repositorios RDF 3) crear un entorno colaborativo que facilite el consumo de Datos Vinculados por usuarios finales, 4) desarrollar un algoritmo que, de forma automĂĄtica, permita descubrir el modelo semĂĄntico en OWL de un repositorio RDF, 5) desarrollar una representaciĂłn en OWL de ICD-10-CM llamada Dione que ofrezca una metodologĂ­a automĂĄtica para la clasificaciĂłn de enfermedades de pacientes y su posterior validaciĂłn haciendo uso de un razonador OWL

    prototypical implementations

    Get PDF
    In this technical report, we present prototypical implementations of innovative tools and methods developed according to the working plan outlined in Technical Report TR-B-09-05 [23]. We present an ontology modularization and integration framework and the SVoNt server, the server-side end of an SVN- based versioning system for ontologies in the Corporate Ontology Engineering pillar. For the Corporate Semantic Collaboration pillar, we present the prototypical implementation of a light-weight ontology editor for non-experts and an ontology based expert finder system. For the Corporate Semantic Search pillar, we present a prototype for algorithmic extraction of relations in folksonomies, a tool for trend detection using a semantic analyzer, a tool for automatic classification of web documents using Hidden Markov models, a personalized semantic recommender for multimedia content, and a semantic search assistant developed in co-operation with the Museumsportal Berlin. The prototypes complete the next milestone on the path to an integral Cor- porate Semantic Web architecture based on the three pillars Corporate Ontol- ogy Engineering, Corporate Semantic Collaboration, and Corporate Semantic Search, as envisioned in [23]

    The genesis and emergence of Web 3.0: a study in the integration of artificial intelligence and the semantic web in knowledge creation

    Get PDF
    The web as we know it has evolved rapidly over the last decade. We have gone from a phase of rapid growth as seen with the dot.com boom where business was king to the current web 2.0 phase where social networking, Wiki’s, Blogs and other related tools flood the bandwidth of the world wide web. The empowerment of the web user with web 2.0 technologies has led to the exponential growth of data, information and knowledge on the web. With this rapid change, there is a need to logically categorise this information and knowledge so it can be fully utilised by all. It can be argued that the power of the knowledge held on the web is not fully exposed under its current structure and to improve this we need to explore the foundations of the web. This dissertation will explore the evolution of the web from its early days to the present day. It will examine the way web content is stored and discuss the new semantic technologies now available to represent this content. The research aims to demonstrate the possibilities of efficient knowledge extraction from a knowledge portal such as a Wiki or SharePoint portal using these semantic technologies. This generation of dynamic knowledge content within a limited domain will attempt to demonstrate the benefits of semantic web to the knowledge age

    A matter of words: NLP for quality evaluation of Wikipedia medical articles

    Get PDF
    Automatic quality evaluation of Web information is a task with many fields of applications and of great relevance, especially in critical domains like the medical one. We move from the intuition that the quality of content of medical Web documents is affected by features related with the specific domain. First, the usage of a specific vocabulary (Domain Informativeness); then, the adoption of specific codes (like those used in the infoboxes of Wikipedia articles) and the type of document (e.g., historical and technical ones). In this paper, we propose to leverage specific domain features to improve the results of the evaluation of Wikipedia medical articles. In particular, we evaluate the articles adopting an "actionable" model, whose features are related to the content of the articles, so that the model can also directly suggest strategies for improving a given article quality. We rely on Natural Language Processing (NLP) and dictionaries-based techniques in order to extract the bio-medical concepts in a text. We prove the effectiveness of our approach by classifying the medical articles of the Wikipedia Medicine Portal, which have been previously manually labeled by the Wiki Project team. The results of our experiments confirm that, by considering domain-oriented features, it is possible to obtain sensible improvements with respect to existing solutions, mainly for those articles that other approaches have less correctly classified. Other than being interesting by their own, the results call for further research in the area of domain specific features suitable for Web data quality assessment

    Emergent Capabilities for Collaborative Teams in the Evolving Web Environment

    No full text
    This paper reports on our investigation of the latest advances for the Social Web, Web 2.0 and the Linked Data Web. These advances are discussed in terms of the latest capabilities that are available (or being made available) on the Web at the time of writing this paper. Such capabilities can be of significant benefit to teams, especially those comprised of multinational, geographically-dispersed team members. The specific context of coalition members in a rapidly formed diverse military context such as disaster relief or humanitarian aid is considered, where close working between non-government organisations and non-military teams will help to achieve results as quickly and efficiently as possible. The heterogeneity one finds in such teams, coupled with a lack of dedicated private network infrastructure, poses a number of challenges for collaboration, and the current paper represents an attempt to assess whether nascent Web-based capabilities can support such teams in terms of both their collaborative activities and their access to (and sharing of) information resources
    • 

    corecore