Search CORE

4,197 research outputs found

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Knowledge Organization Systems (KOS) in the Semantic Web: A Multi-Dimensional Review

Author: Mayr Philipp
Zeng Marcia Lei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/01/2018
Field of study

Since the Simple Knowledge Organization System (SKOS) specification and its SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a significant number of conventional knowledge organization systems (KOS) (including thesauri, classification schemes, name authorities, and lists of codes and terms, produced before the arrival of the ontology-wave) have made their journeys to join the Semantic Web mainstream. This paper uses "LOD KOS" as an umbrella term to refer to all of the value vocabularies and lightweight ontologies within the Semantic Web framework. The paper provides an overview of what the LOD KOS movement has brought to various communities and users. These are not limited to the colonies of the value vocabulary constructors and providers, nor the catalogers and indexers who have a long history of applying the vocabularies to their products. The LOD dataset producers and LOD service providers, the information architects and interface designers, and researchers in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper examines a set of the collected cases (experimental or in real applications) and aims to find the usages of LOD KOS in order to share the practices and ideas among communities and users. Through the viewpoints of a number of different user groups, the functions of LOD KOS are examined from multiple dimensions. This paper focuses on the LOD dataset producers, vocabulary producers, and researchers (as end-users of KOS).Comment: 31 pages, 12 figures, accepted paper in International Journal on Digital Librarie

arXiv.org e-Print Archive

Crossref

SSOAR - Social Science Open Access Repository

Application of Semantics to Solve Problems in Life Sciences

Author: García Godoy María Jesús
Publication venue: UMA Editorial
Publication date: 01/01/2018
Field of study

Fecha de lectura de Tesis: 10 de diciembre de 2018La cantidad de información que se genera en la Web se ha incrementado en los últimos años. La mayor parte de esta información se encuentra accesible en texto, siendo el ser humano el principal usuario de la Web. Sin embargo, a pesar de todos los avances producidos en el área del procesamiento del lenguaje natural, los ordenadores tienen problemas para procesar esta información textual. En este cotexto, existen dominios de aplicación en los que se están publicando grandes cantidades de información disponible como datos estructurados como en el área de las Ciencias de la Vida. El análisis de estos datos es de vital importancia no sólo para el avance de la ciencia, sino para producir avances en el ámbito de la salud. Sin embargo, estos datos están localizados en diferentes repositorios y almacenados en diferentes formatos que hacen difícil su integración. En este contexto, el paradigma de los Datos Vinculados como una tecnología que incluye la aplicación de algunos estándares propuestos por la comunidad W3C tales como HTTP URIs, los estándares RDF y OWL. Haciendo uso de esta tecnología, se ha desarrollado esta tesis doctoral basada en cubrir los siguientes objetivos principales: 1) promover el uso de los datos vinculados por parte de la comunidad de usuarios del ámbito de las Ciencias de la Vida 2) facilitar el diseño de consultas SPARQL mediante el descubrimiento del modelo subyacente en los repositorios RDF 3) crear un entorno colaborativo que facilite el consumo de Datos Vinculados por usuarios finales, 4) desarrollar un algoritmo que, de forma automática, permita descubrir el modelo semántico en OWL de un repositorio RDF, 5) desarrollar una representación en OWL de ICD-10-CM llamada Dione que ofrezca una metodología automática para la clasificación de enfermedades de pacientes y su posterior validación haciendo uso de un razonador OWL

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

prototypical implementations

Author: Coskun Gökhan
Hartrampf Dennis
Heese Ralf
Luczak-Rösch Markus
Oldakowski Radoslaw
Paschke Adrian
Rothe Mario
Schäfermeier Ralph
Streibel Olga
Publication venue
Publication date: 01/01/2010
Field of study

In this technical report, we present prototypical implementations of innovative tools and methods developed according to the working plan outlined in Technical Report TR-B-09-05 [23]. We present an ontology modularization and integration framework and the SVoNt server, the server-side end of an SVN- based versioning system for ontologies in the Corporate Ontology Engineering pillar. For the Corporate Semantic Collaboration pillar, we present the prototypical implementation of a light-weight ontology editor for non-experts and an ontology based expert finder system. For the Corporate Semantic Search pillar, we present a prototype for algorithmic extraction of relations in folksonomies, a tool for trend detection using a semantic analyzer, a tool for automatic classification of web documents using Hidden Markov models, a personalized semantic recommender for multimedia content, and a semantic search assistant developed in co-operation with the Museumsportal Berlin. The prototypes complete the next milestone on the path to an integral Cor- porate Semantic Web architecture based on the three pillars Corporate Ontol- ogy Engineering, Corporate Semantic Collaboration, and Corporate Semantic Search, as envisioned in [23]

Institutional Repository of the Freie Universität Berlin

Recommended from our members

A framework for feeding Linked Data to Complex Event Processing engines

Author: Domingue John
Liu Dong
Pedrinaci Carlos
Publication venue
Publication date: 01/01/2010
Field of study

A huge volume of Linked Data has been published on the Web, yet is not processable by Complex Event Processing (CEP) or Event Stream Processing (ESP) engines. This paper presents a frame-work to bridge this gap, under which Linked Data are first translated into events conforming to a lightweight ontology, and then fed to CEP engines. The event processing results will also be published back onto the Web of Data. In this way, CEP engines are connected to the Web of Data, and the ontological reasoning is integrated with event processing. Finally, the implementation method and a case study of the framework are presented

Open Research Online (The Open University)

The genesis and emergence of Web 3.0: a study in the integration of artificial intelligence and the semantic web in knowledge creation

Author: Mulpeter David
Publication venue: Dublin Institute of Technology
Publication date: 01/07/2009
Field of study

The web as we know it has evolved rapidly over the last decade. We have gone from a phase of rapid growth as seen with the dot.com boom where business was king to the current web 2.0 phase where social networking, Wiki’s, Blogs and other related tools flood the bandwidth of the world wide web. The empowerment of the web user with web 2.0 technologies has led to the exponential growth of data, information and knowledge on the web. With this rapid change, there is a need to logically categorise this information and knowledge so it can be fully utilised by all. It can be argued that the power of the knowledge held on the web is not fully exposed under its current structure and to improve this we need to explore the foundations of the web. This dissertation will explore the evolution of the web from its early days to the present day. It will examine the way web content is stored and discuss the new semantic technologies now available to represent this content. The research aims to demonstrate the possibilities of efficient knowledge extraction from a knowledge portal such as a Wiki or SharePoint portal using these semantic technologies. This generation of dynamic knowledge content within a limited domain will attempt to demonstrate the benefits of semantic web to the knowledge age

Arrow@TUDublin

A matter of words: NLP for quality evaluation of Wikipedia medical articles

Author: B Stvilia
DMW Powers
E Marzini
F Cabitza
G Pasi
K Wecel
K Wu
M Hall
NV Chawla
O Bodenreider
SA Azer
TL Saaty
TM Cover
Publication venue
Publication date: 01/01/2016
Field of study

Automatic quality evaluation of Web information is a task with many fields of applications and of great relevance, especially in critical domains like the medical one. We move from the intuition that the quality of content of medical Web documents is affected by features related with the specific domain. First, the usage of a specific vocabulary (Domain Informativeness); then, the adoption of specific codes (like those used in the infoboxes of Wikipedia articles) and the type of document (e.g., historical and technical ones). In this paper, we propose to leverage specific domain features to improve the results of the evaluation of Wikipedia medical articles. In particular, we evaluate the articles adopting an "actionable" model, whose features are related to the content of the articles, so that the model can also directly suggest strategies for improving a given article quality. We rely on Natural Language Processing (NLP) and dictionaries-based techniques in order to extract the bio-medical concepts in a text. We prove the effectiveness of our approach by classifying the medical articles of the Wikipedia Medicine Portal, which have been previously manually labeled by the Wiki Project team. The results of our experiments confirm that, by considering domain-oriented features, it is possible to obtain sensible improvements with respect to existing solutions, mainly for those articles that other approaches have less correctly classified. Other than being interesting by their own, the results call for further research in the area of domain specific features suitable for Web data quality assessment

arXiv.org e-Print Archive

Crossref

Catalogo dei prodotti della ricerca

Archivio della ricerca- Università di Roma La Sapienza

Online Research Database In Technology

Archivio istituzionale della ricerca - Università di Padova

Emergent Capabilities for Collaborative Teams in the Evolving Web Environment

Author: Bao Jie
Braines Dave
Smart Paul R
Publication venue
Publication date: 14/09/2010
Field of study

This paper reports on our investigation of the latest advances for the Social Web, Web 2.0 and the Linked Data Web. These advances are discussed in terms of the latest capabilities that are available (or being made available) on the Web at the time of writing this paper. Such capabilities can be of significant benefit to teams, especially those comprised of multinational, geographically-dispersed team members. The specific context of coalition members in a rapidly formed diverse military context such as disaster relief or humanitarian aid is considered, where close working between non-government organisations and non-military teams will help to achieve results as quickly and efficiently as possible. The heterogeneity one finds in such teams, coupled with a lack of dedicated private network infrastructure, poses a number of challenges for collaboration, and the current paper represents an attempt to assess whether nascent Web-based capabilities can support such teams in terms of both their collaborative activities and their access to (and sharing of) information resources

Southampton (e-Prints Soton)