1,821 research outputs found
Knowledge Organization Systems (KOS) in the Semantic Web: A Multi-Dimensional Review
Since the Simple Knowledge Organization System (SKOS) specification and its
SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a
significant number of conventional knowledge organization systems (KOS)
(including thesauri, classification schemes, name authorities, and lists of
codes and terms, produced before the arrival of the ontology-wave) have made
their journeys to join the Semantic Web mainstream. This paper uses "LOD KOS"
as an umbrella term to refer to all of the value vocabularies and lightweight
ontologies within the Semantic Web framework. The paper provides an overview of
what the LOD KOS movement has brought to various communities and users. These
are not limited to the colonies of the value vocabulary constructors and
providers, nor the catalogers and indexers who have a long history of applying
the vocabularies to their products. The LOD dataset producers and LOD service
providers, the information architects and interface designers, and researchers
in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper
examines a set of the collected cases (experimental or in real applications)
and aims to find the usages of LOD KOS in order to share the practices and
ideas among communities and users. Through the viewpoints of a number of
different user groups, the functions of LOD KOS are examined from multiple
dimensions. This paper focuses on the LOD dataset producers, vocabulary
producers, and researchers (as end-users of KOS).Comment: 31 pages, 12 figures, accepted paper in International Journal on
Digital Librarie
Advanced Knowledge Technologies at the Midterm: Tools and Methods for the Semantic Web
The University of Edinburgh and research sponsors are authorised to reproduce and distribute reprints and on-line copies for their purposes notwithstanding any copyright annotation hereon. The views and conclusions contained herein are the authorâs and shouldnât be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of other parties.In a celebrated essay on the new electronic media, Marshall McLuhan wrote in 1962:Our private senses are not closed systems but are endlessly translated into each other in that experience which we call consciousness. Our extended senses, tools, technologies, through the ages, have been closed systems incapable of interplay or collective awareness. Now, in the electric age, the very
instantaneous nature of co-existence among our technological instruments has created a crisis quite new in human history. Our extended faculties and senses now constitute a single field of experience which demands that they become collectively conscious. Our technologies, like our private senses, now demand an interplay and ratio that makes rational co-existence possible. As long as our technologies were as slow as the wheel or the alphabet or money, the fact that
they were separate, closed systems was socially and psychically supportable. This is not true now when sight and sound and movement are simultaneous and global in extent. (McLuhan 1962, p.5, emphasis in original)Over forty years later, the seamless interplay that McLuhan demanded between our
technologies is still barely visible. McLuhanâs predictions of the spread, and increased importance, of electronic media have of course been borne out, and the worlds of business, science and knowledge storage and transfer have been revolutionised. Yet
the integration of electronic systems as open systems remains in its infancy.Advanced Knowledge Technologies (AKT) aims to address this problem, to create a view of knowledge and its management across its lifecycle, to research and create the
services and technologies that such unification will require. Half way through its sixyear span, the results are beginning to come through, and this paper will explore some of the services, technologies and methodologies that have been developed. We hope to give a sense in this paper of the potential for the next three years, to discuss the insights and lessons learnt in the first phase of the project, to articulate the challenges and issues that remain.The WWW provided the original context that made the AKT approach to knowledge
management (KM) possible. AKT was initially proposed in 1999, it brought together an interdisciplinary consortium with the technological breadth and complementarity to create the conditions for a unified approach to knowledge across its lifecycle. The
combination of this expertise, and the time and space afforded the consortium by the
IRC structure, suggested the opportunity for a concerted effort to develop an approach
to advanced knowledge technologies, based on the WWW as a basic infrastructure.The technological context of AKT altered for the better in the short period between the development of the proposal and the beginning of the project itself with the development of the semantic web (SW), which foresaw much more intelligent manipulation and querying of knowledge. The opportunities that the SW provided for e.g., more intelligent retrieval, put AKT in the centre of information technology innovation and knowledge management services; the AKT skill set would clearly be central for the exploitation of those opportunities.The SW, as an extension of the WWW, provides an interesting set of constraints to
the knowledge management services AKT tries to provide. As a medium for the
semantically-informed coordination of information, it has suggested a number of ways in which the objectives of AKT can be achieved, most obviously through the
provision of knowledge management services delivered over the web as opposed to the creation and provision of technologies to manage knowledge.AKT is working on the assumption that many web services will be developed and provided for users. The KM problem in the near future will be one of deciding which services are needed and of coordinating them. Many of these services will be largely or entirely legacies of the WWW, and so the capabilities of the services will vary. As well as providing useful KM services in their own right, AKT will be aiming to exploit this opportunity, by reasoning over services, brokering between them, and providing essential meta-services for SW knowledge service management.Ontologies will be a crucial tool for the SW. The AKT consortium brings a lot of expertise on ontologies together, and ontologies were always going to be a key part of the strategy. All kinds of knowledge sharing and transfer activities will be mediated by ontologies, and ontology management will be an important enabling task. Different
applications will need to cope with inconsistent ontologies, or with the problems that will follow the automatic creation of ontologies (e.g. merging of pre-existing
ontologies to create a third). Ontology mapping, and the elimination of conflicts of
reference, will be important tasks. All of these issues are discussed along with our
proposed technologies.Similarly, specifications of tasks will be used for the deployment of knowledge services over the SW, but in general it cannot be expected that in the medium term there will be standards for task (or service) specifications. The brokering metaservices
that are envisaged will have to deal with this heterogeneity.The emerging picture of the SW is one of great opportunity but it will not be a wellordered, certain or consistent environment. It will comprise many repositories of legacy data, outdated and inconsistent stores, and requirements for common understandings across divergent formalisms. There is clearly a role for standards to play to bring much of this context together; AKT is playing a significant role in these efforts. But standards take time to emerge, they take political power to enforce, and they have been known to stifle innovation (in the short term). AKT is keen to understand the balance between principled inference and statistical processing of web content. Logical inference on the Web is tough. Complex queries using traditional AI inference methods bring most distributed computer systems to their knees. Do we set up semantically well-behaved areas of the Web? Is any part of the Web in which
semantic hygiene prevails interesting enough to reason in? These and many other
questions need to be addressed if we are to provide effective knowledge technologies
for our content on the web
Completeness and Consistency Analysis for Evolving Knowledge Bases
Assessing the quality of an evolving knowledge base is a challenging task as
it often requires to identify correct quality assessment procedures.
Since data is often derived from autonomous, and increasingly large data
sources, it is impractical to manually curate the data, and challenging to
continuously and automatically assess their quality.
In this paper, we explore two main areas of quality assessment related to
evolving knowledge bases: (i) identification of completeness issues using
knowledge base evolution analysis, and (ii) identification of consistency
issues based on integrity constraints, such as minimum and maximum cardinality,
and range constraints.
For completeness analysis, we use data profiling information from consecutive
knowledge base releases to estimate completeness measures that allow predicting
quality issues. Then, we perform consistency checks to validate the results of
the completeness analysis using integrity constraints and learning models.
The approach has been tested both quantitatively and qualitatively by using a
subset of datasets from both DBpedia and 3cixty knowledge bases. The
performance of the approach is evaluated using precision, recall, and F1 score.
From completeness analysis, we observe a 94% precision for the English DBpedia
KB and 95% precision for the 3cixty Nice KB. We also assessed the performance
of our consistency analysis by using five learning models over three sub-tasks,
namely minimum cardinality, maximum cardinality, and range constraint. We
observed that the best performing model in our experimental setup is the Random
Forest, reaching an F1 score greater than 90% for minimum and maximum
cardinality and 84% for range constraints.Comment: Accepted for Journal of Web Semantic
Design and implementation of a filter engine for semantic web documents
This report describes our project that addresses the challenge of changes in the semantic web. Some studies have already been done for the so-called adaptive semantic web, such as applying inferring rules. In this study, we apply the technology of Event Notification System (ENS). Treating changes as events, we
developed a notification system for such events
Semantic web approach for italian graduates' surveys: the AlmaLaurea ontology proposal
Il crescente sviluppo e la promozione della trasparenza dei dati
nellâambito della pubblica amministrazione copre molteplici aspetti, fra cui
lâeducazione universitaria. Attualmente sono difatti numerosi i dataset rilasciati in
formato Linked Open Data disponibili a livello nazionale ed internazionale. Fra le
informazioni pubblicamente disponibili spiccano concetti riguardo lâoccupazione e
la numerositĂ dei laureati. Nonostante il progresso riscontrato, la mancanza di una
metodologia standard per la descrizione di informazioni statistiche sui laureati rende
difficoltoso un confronto di determinati fatti a partire da differenti sorgenti di dati.
Sul piano nazionale, le indagini AlmaLaurea colmano il gap informativo
dellâeterogeneitĂ delle fonti proponendo statistiche centralizzate su profilo dei
laureati e relativa condizione occupazionale, aggiornate annualmente. Scopo del
progetto di tesi Ăš la realizzazione di unâontologia di dominio che descriva diverse
peculiaritĂ dei laureati, promuovendo allo stesso tempo la definizione strutturata dei
dati AlmaLaurea e la successiva pubblicazione nel contesto Linked Open Data. Il
progetto, realizzato con lâausilio delle tecnologie del Web Semantico, propone infine la creazione di un endpoint SPARQL e di una interfaccia web per l'interrogazione e
la visualizzazione dei dati strutturati
Automated Knowledge Base Quality Assessment and Validation based on Evolution Analysis
In recent years, numerous efforts have been put towards sharing Knowledge Bases (KB) in the Linked Open Data (LOD) cloud. These KBs are being used for various tasks, including performing data analytics or building question answering systems. Such KBs evolve continuously: their data (instances) and schemas can be updated, extended, revised and refactored. However, unlike in more controlled types of knowledge bases, the evolution of KBs exposed in the LOD cloud is usually unrestrained, what may cause data to suffer from a variety of quality issues, both at a semantic level and at a pragmatic level. This situation affects negatively data stakeholders â consumers, curators, etc. â. Data quality is commonly related to the perception of the fitness for use, for a certain application or use case. Therefore, ensuring the quality of the data of a knowledge base that evolves is vital. Since data is derived from autonomous, evolving, and increasingly large data providers, it is impractical to do manual data curation, and at the same time, it is very challenging to do a continuous automatic assessment of data quality. Ensuring the quality of a KB is a non-trivial task since they are based on a combination of structured information supported by models, ontologies, and vocabularies, as well as queryable endpoints, links, and mappings. Thus, in this thesis, we explored two main areas in assessing KB quality: (i) quality assessment using KB evolution analysis, and (ii) validation using machine learning models. The evolution of a KB can be analyzed using fine-grained âchangeâ detection at low-level or using âdynamicsâ of a dataset at high-level. In this thesis, we present a novel knowledge base quality assessment approach using evolution analysis. The proposed approach uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. However, the first step in building the quality assessment approach was to identify the quality characteristics. Using high-level change detection as measurement functions, in this thesis we present four quality characteristics: Persistency, Historical Persistency, Consistency and Completeness. Persistency and historical persistency measures concern the degree of changes and lifespan of any entity type. Consistency and completeness measures identify properties with incomplete information and contradictory facts. The approach has been assessed both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight releases of 3cixty Nice. However, high-level changes, being coarse-grained, cannot capture all possible quality issues. In this context, we present a validation strategy whose rationale is twofold. First, using manual validation from qualitative analysis to identify causes of quality issues. Then, use RDF data profiling information to generate integrity constraints. The validation approach relies on the idea of inducing RDF shape by exploiting SHALL constraint components. In particular, this approach will learn, what are the integrity constraints that can be applied to a large KB by instructing a process of statistical analysis, which is followed by a learning model. We illustrate the performance of our validation approach by using five learning models over three sub-tasks, namely minimum cardinality, maximum cardinality, and range constraint. The techniques of quality assessment and validation developed during this work are automatic and can be applied to different knowledge bases independently of the domain. Furthermore, the measures are based on simple statistical operations that make the solution both flexible and scalable
Survey over Existing Query and Transformation Languages
A widely acknowledged obstacle for realizing the vision of the Semantic Web is the inability
of many current Semantic Web approaches to cope with data available in such diverging
representation formalisms as XML, RDF, or Topic Maps. A common query language is the first
step to allow transparent access to data in any of these formats. To further the understanding
of the requirements and approaches proposed for query languages in the conventional as well
as the Semantic Web, this report surveys a large number of query languages for accessing
XML, RDF, or Topic Maps. This is the first systematic survey to consider query languages from
all these areas. From the detailed survey of these query languages, a common classification
scheme is derived that is useful for understanding and differentiating languages within and
among all three areas
Augmenting applications with hyper media, functionality and meta-information
The Dynamic Hypermedia Engine (DHE) enhances analytical applications by adding relationships, semantics and other metadata to the application\u27s output and user interface. DHE also provides additional hypermedia navigational, structural and annotation functionality. These features allow application developers and users to add guided tours, personal links and sharable annotations, among other features, into applications. DHE runs as a middleware between the application user interface and its business logic and processes, in a n-tier architecture, supporting the extra functionalities without altering the original systems by means of application wrappers.
DHE automatically generates links at run-time for each of those elements having relationships and metadata. Such elements are previously identified using a Relation Navigation Analysis. DHE also constructs more sophisticated navigation techniques not often found on the Web on top of these links. The metadata, links, navigation and annotation features supplement the application\u27s primary functionality.
This research identifies element types, or classes , in the application displays. A mapping rule encodes each relationship found between two elements of interest at the class level . When the user selects a particular element, DHE instantiates the commands included in the rules with the actual instance selected and sends them to the appropriate destination system, which then dynamically generates the resulting virtual (i.e. not previously stored) page. DHE executes concurrently with these applications, providing automated link generation and other hypermedia functionality. DHE uses the extensible Markup Language (XMQ -and related World Wide Web Consortium (W3C) sets of XML recommendations, like Xlink, XML Schema, and RDF -to encode the semantic information required for the operation of the extra hypermedia features, and for the transmission of messages between the engine modules and applications.
DHE is the only approach we know that provides automated linking and metadata services in a generic manner, based on the application semantics, without altering the applications. DHE will also work with non-Web systems.
The results of this work could also be extended to other research areas, such as link ranking and filtering, automatic link generation as the result of a search query, metadata collection and support, virtual document management, hypermedia functionality on the Web, adaptive and collaborative hypermedia, web engineering, and the semantic Web
- âŠ