4,197 research outputs found
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
Knowledge Organization Systems (KOS) in the Semantic Web: A Multi-Dimensional Review
Since the Simple Knowledge Organization System (SKOS) specification and its
SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a
significant number of conventional knowledge organization systems (KOS)
(including thesauri, classification schemes, name authorities, and lists of
codes and terms, produced before the arrival of the ontology-wave) have made
their journeys to join the Semantic Web mainstream. This paper uses "LOD KOS"
as an umbrella term to refer to all of the value vocabularies and lightweight
ontologies within the Semantic Web framework. The paper provides an overview of
what the LOD KOS movement has brought to various communities and users. These
are not limited to the colonies of the value vocabulary constructors and
providers, nor the catalogers and indexers who have a long history of applying
the vocabularies to their products. The LOD dataset producers and LOD service
providers, the information architects and interface designers, and researchers
in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper
examines a set of the collected cases (experimental or in real applications)
and aims to find the usages of LOD KOS in order to share the practices and
ideas among communities and users. Through the viewpoints of a number of
different user groups, the functions of LOD KOS are examined from multiple
dimensions. This paper focuses on the LOD dataset producers, vocabulary
producers, and researchers (as end-users of KOS).Comment: 31 pages, 12 figures, accepted paper in International Journal on
Digital Librarie
Application of Semantics to Solve Problems in Life Sciences
Fecha de lectura de Tesis: 10 de diciembre de 2018La cantidad de informaciĂłn que se genera en la Web se ha incrementado en los Ășltimos años. La mayor parte de esta informaciĂłn se encuentra accesible en texto, siendo el ser humano el principal usuario de la Web. Sin embargo, a pesar de todos los avances producidos en el ĂĄrea del procesamiento del lenguaje natural, los ordenadores tienen problemas para procesar esta informaciĂłn textual. En este cotexto, existen dominios de aplicaciĂłn en los que se estĂĄn publicando grandes cantidades de informaciĂłn disponible como datos estructurados como en el ĂĄrea de las Ciencias de la Vida. El anĂĄlisis de estos datos es de vital importancia no sĂłlo para el avance de la ciencia, sino para producir avances en el ĂĄmbito de la salud. Sin embargo, estos datos estĂĄn localizados en diferentes repositorios y almacenados en diferentes formatos que hacen difĂcil su integraciĂłn. En este contexto, el paradigma de los Datos Vinculados como una tecnologĂa que incluye la aplicaciĂłn de algunos estĂĄndares propuestos por la comunidad W3C tales como HTTP URIs, los estĂĄndares RDF y OWL. Haciendo uso de esta tecnologĂa, se ha desarrollado esta tesis doctoral basada en cubrir los siguientes objetivos principales: 1) promover el uso de los datos vinculados por parte de la comunidad de usuarios del ĂĄmbito de las Ciencias de la Vida 2) facilitar el diseño de consultas SPARQL mediante el descubrimiento del modelo subyacente en los repositorios RDF 3) crear un entorno colaborativo que facilite el consumo de Datos Vinculados por usuarios finales, 4) desarrollar un algoritmo que, de forma automĂĄtica, permita descubrir el modelo semĂĄntico en OWL de un repositorio RDF, 5) desarrollar una representaciĂłn en OWL de ICD-10-CM llamada Dione que ofrezca una metodologĂa automĂĄtica para la clasificaciĂłn de enfermedades de pacientes y su posterior validaciĂłn haciendo uso de un razonador OWL
prototypical implementations
In this technical report, we present prototypical implementations of
innovative tools and methods developed according to the working plan outlined
in Technical Report TR-B-09-05 [23]. We present an ontology modularization and
integration framework and the SVoNt server, the server-side end of an SVN-
based versioning system for ontologies in the Corporate Ontology Engineering
pillar. For the Corporate Semantic Collaboration pillar, we present the
prototypical implementation of a light-weight ontology editor for non-experts
and an ontology based expert finder system. For the Corporate Semantic Search
pillar, we present a prototype for algorithmic extraction of relations in
folksonomies, a tool for trend detection using a semantic analyzer, a tool for
automatic classification of web documents using Hidden Markov models, a
personalized semantic recommender for multimedia content, and a semantic
search assistant developed in co-operation with the Museumsportal Berlin. The
prototypes complete the next milestone on the path to an integral Cor- porate
Semantic Web architecture based on the three pillars Corporate Ontol- ogy
Engineering, Corporate Semantic Collaboration, and Corporate Semantic Search,
as envisioned in [23]
Recommended from our members
A framework for feeding Linked Data to Complex Event Processing engines
A huge volume of Linked Data has been published on the Web, yet is not processable by Complex Event Processing (CEP) or Event Stream Processing (ESP) engines. This paper presents a frame-work to bridge this gap, under which Linked Data are first translated into events conforming to a lightweight ontology, and then fed to CEP engines. The event processing results will also be published back onto the Web of Data. In this way, CEP engines are connected to the Web of Data, and the ontological reasoning is integrated with event processing. Finally, the implementation method and a case study of the framework are presented
The genesis and emergence of Web 3.0: a study in the integration of artificial intelligence and the semantic web in knowledge creation
The web as we know it has evolved rapidly over the last decade. We have gone from a phase of rapid growth as seen with the dot.com boom where business was king to the current web 2.0 phase where social networking, Wikiâs, Blogs and other related tools flood the bandwidth of the world wide web. The empowerment of the web user with web 2.0 technologies has led to the exponential growth of data, information and knowledge on the web. With this rapid change, there is a need to logically categorise this information and knowledge so it can be fully utilised by all. It can be argued that the power of the knowledge held on the web is not fully exposed under its current structure and to improve this we need to explore the foundations of the web. This dissertation will explore the evolution of the web from its early days to the present day. It will examine the way web content is stored and discuss the new semantic technologies now available to represent this content. The research aims to demonstrate the possibilities of efficient knowledge extraction from a knowledge portal such as a Wiki or SharePoint portal using these semantic technologies. This generation of dynamic knowledge content within a limited domain will attempt to demonstrate the benefits of semantic web to the knowledge age
A matter of words: NLP for quality evaluation of Wikipedia medical articles
Automatic quality evaluation of Web information is a task with many fields of
applications and of great relevance, especially in critical domains like the
medical one. We move from the intuition that the quality of content of medical
Web documents is affected by features related with the specific domain. First,
the usage of a specific vocabulary (Domain Informativeness); then, the adoption
of specific codes (like those used in the infoboxes of Wikipedia articles) and
the type of document (e.g., historical and technical ones). In this paper, we
propose to leverage specific domain features to improve the results of the
evaluation of Wikipedia medical articles. In particular, we evaluate the
articles adopting an "actionable" model, whose features are related to the
content of the articles, so that the model can also directly suggest strategies
for improving a given article quality. We rely on Natural Language Processing
(NLP) and dictionaries-based techniques in order to extract the bio-medical
concepts in a text. We prove the effectiveness of our approach by classifying
the medical articles of the Wikipedia Medicine Portal, which have been
previously manually labeled by the Wiki Project team. The results of our
experiments confirm that, by considering domain-oriented features, it is
possible to obtain sensible improvements with respect to existing solutions,
mainly for those articles that other approaches have less correctly classified.
Other than being interesting by their own, the results call for further
research in the area of domain specific features suitable for Web data quality
assessment
Emergent Capabilities for Collaborative Teams in the Evolving Web Environment
This paper reports on our investigation of the latest advances for the Social Web, Web 2.0 and the Linked Data Web. These advances are discussed in terms of the latest capabilities that are available (or being made available) on the Web at the time of writing this paper. Such capabilities can be of significant benefit to teams, especially those comprised of multinational, geographically-dispersed team members. The specific context of coalition members in a rapidly formed diverse military context such as disaster relief or humanitarian aid is considered, where close working between non-government organisations and non-military teams will help to achieve results as quickly and efficiently as possible. The heterogeneity one finds in such teams, coupled with a lack of dedicated private network infrastructure, poses a number of challenges for collaboration, and the current paper represents an attempt to assess whether nascent Web-based capabilities can support such teams in terms of both their collaborative activities and their access to (and sharing of) information resources
- âŠ