4 research outputs found
Ontology Ranking: Finding the Right Ontologies on the Web
Ontology search, which is the process of finding ontologies or
ontological terms for users’ defined queries from an ontology
collection, is an important task to facilitate ontology reuse of
ontology engineering. Ontology reuse is desired to avoid the
tedious process of building an ontology from scratch and to limit
the design of several competing ontologies that represent similar
knowledge. Since many organisations in both the private and
public sectors are publishing their data in RDF, they
increasingly require to find or design ontologies for data
annotation and/or integration. In general, there exist multiple
ontologies representing a domain, therefore, finding the best
matching ontologies or their terms is required to facilitate
manual or dynamic ontology selection for both ontology design and
data annotation.
The ranking is a crucial component in the ontology retrieval
process which aims at listing the ‘relevant0 ontologies or
their terms as high as possible in the search results to reduce
the human intervention. Most existing ontology ranking techniques
inherit one or more information retrieval ranking parameter(s).
They linearly combine the values of these parameters for each
ontology to compute the relevance score against a user query and
rank the results in descending order of the relevance score. A
significant aspect of achieving an effective ontology ranking
model is to develop novel metrics and dynamic techniques that can
optimise the relevance score of the most relevant ontology for a
user query.
In this thesis, we present extensive research in ontology
retrieval and ranking, where several research gaps in the
existing literature are identified and addressed. First, we begin
the thesis with a review of the literature and propose a taxonomy
of Semantic Web data (i.e., ontologies and linked data) retrieval
approaches. That allows us to identify potential research
directions in the field. In the remainder of the thesis, we
address several of the identified shortcomings in the ontology
retrieval domain. We develop a framework for the empirical and
comparative evaluation of different ontology ranking solutions,
which has not been studied in the literature so far. Second, we
propose an effective relationship-based concept retrieval
framework and a concept ranking model through the use of learning
to rank approach which addresses the limitation of the existing
linear ranking models. Third, we propose RecOn, a framework that
helps users in finding the best matching ontologies to a
multi-keyword query. There the relevance score of an ontology to
the query is computed by formulating and solving the ontology
recommendation problem as a linear and an optimisation problem.
Finally, the thesis also reports on an extensive comparative
evaluation of our proposed solutions with several other
state-of-the-art techniques using real-world ontologies. This
thesis will be useful for researchers and practitioners
interested in ontology search, for methods and performance
benchmark on ranking approaches to ontology search
Automatically assembling a full census of an academic field
The composition of the scientific workforce shapes the direction of
scientific research, directly through the selection of questions to
investigate, and indirectly through its influence on the training of future
scientists. In most fields, however, complete census information is difficult
to obtain, complicating efforts to study workforce dynamics and the effects of
policy. This is particularly true in computer science, which lacks a single,
all-encompassing directory or professional organization. A full census of
computer science would serve many purposes, not the least of which is a better
understanding of the trends and causes of unequal representation in computing.
Previous academic census efforts have relied on narrow or biased samples, or on
professional society membership rolls. A full census can be constructed
directly from online departmental faculty directories, but doing so by hand is
prohibitively expensive and time-consuming. Here, we introduce a topical web
crawler for automating the collection of faculty information from web-based
department rosters, and demonstrate the resulting system on the 205
PhD-granting computer science departments in the U.S. and Canada. This method
constructs a complete census of the field within a few minutes, and achieves
over 99% precision and recall. We conclude by comparing the resulting 2017
census to a hand-curated 2011 census to quantify turnover and retention in
computer science, in general and for female faculty in particular,
demonstrating the types of analysis made possible by automated census
construction.Comment: 11 pages, 6 figures, 2 table
A structural and quantitative analysis of the webof linked data and its components to perform retrieval data
Esta investigación consiste en un análisis cuantitativo y estructural de la Web of Linked Data con el fin de mejorar la búsqueda de datos en distintas fuentes. Para obtener métricas cuantitativas de la Web of Linked Data, se aplicarán técnicas estadísticas. En el caso del análisis estructural haremos un Análisis de Redes Sociales (ARS).
Para tener una idea de la Web of Linked Data para poder hacer un análisis, nos ayudaremos del diagrama de la Linking Open Data (LOD) cloud. Este es un catálogo online de datasets cuya información ha sido publicada usando técnicas de Linked Data. Los datasets son publicados en un lenguaje llamado Resource Description Framework (RDF), el cual crea enlaces entre ellos para que la información pudiera ser reutilizada.
El objetivo de obtener un análisis cuantitativo y estructural de la Web of Linked Data es mejorar las búsquedas de datos. Para ese propósito nosotros nos aprovecharemos del uso del lenguaje de marcado Schema.org y del proyecto Linked Open Vocabularies (LOV).
Schema.org es un conjunto de etiquetas cuyo objetivo es que los Webmasters pudieran marcar sus propias páginas Web con microdata. El microdata es usado para ayudar a los motores de búsqueda y otras herramientas Web a entender mejor la información que estas contienen. LOV es un catálogo para registrar los vocabularios que usan los datasets de la Web of Linked Data. Su objetivo es proporcionar un acceso sencillo a dichos vocabularios.
En la investigación, vamos a desarrollar un estudio para la obtención de datos de la Web of Linked Data usando las fuentes mencionadas anteriormente con técnicas de “ontology matching”. En nuestro caso, primeros vamos a mapear Schema.org con LOV, y después LOV con la Web of Linked Data. Un ARS de LOV también ha sido realizado. El objetivo de dicho análisis es obtener una idea cuantitativa y cualitativa de LOV. Sabiendo esto podemos concluir cosas como: cuales son los vocabularios más usados o si están especializados en algún campo o no. Estos pueden ser usados para filtrar datasets o reutilizar información