2,072 research outputs found
Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains
Biomedical taxonomies, thesauri and ontologies in the form of the
International Classification of Diseases (ICD) as a taxonomy or the National
Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in
acquiring, representing and processing information about human health. With
increasing adoption and relevance, biomedical ontologies have also
significantly increased in size. For example, the 11th revision of the ICD,
which is currently under active development by the WHO contains nearly 50,000
classes representing a vast variety of different diseases and causes of death.
This evolution in terms of size was accompanied by an evolution in the way
ontologies are engineered. Because no single individual has the expertise to
develop such large-scale ontologies, ontology-engineering projects have evolved
from small-scale efforts involving just a few domain experts to large-scale
projects that require effective collaboration between dozens or even hundreds
of experts, practitioners and other stakeholders. Understanding how these
stakeholders collaborate will enable us to improve editing environments that
support such collaborations. We uncover how large ontology-engineering
projects, such as the ICD in its 11th revision, unfold by analyzing usage logs
of five different biomedical ontology-engineering projects of varying sizes and
scopes using Markov chains. We discover intriguing interaction patterns (e.g.,
which properties users subsequently change) that suggest that large
collaborative ontology-engineering projects are governed by a few general
principles that determine and drive development. From our analysis, we identify
commonalities and differences between different projects that have implications
for project managers, ontology editors, developers and contributors working on
collaborative ontology-engineering projects and tools in the biomedical domain.Comment: Published in the Journal of Biomedical Informatic
Onto.PT: Automatic Construction of a Lexical Ontology for Portuguese
This ongoing research presents an alternative to the man-
ual creation of lexical resources and proposes an approach towards
the automatic construction of a lexical ontology for Portuguese. Tex-
tual sources are exploited in order to obtain a lexical network based
on terms and, after clustering and mapping, a wordnet-like lexical on-
tology is created. At the end of the paper, current results are shown
Tags Are Related: Measurement of Semantic Relatedness Based on Folksonomy Network
Folksonomy and tagging systems, which allow users to interactively annotate a pool of shared resources using descriptive tags, have enjoyed phenomenal success in recent years. The concepts are organized as a map in human mind, however, the tags in folksonomy, which reflect users' collaborative cognition on information, are isolated with current approach. What we do in this paper is to estimate the semantic relatedness among tags in folksonomy: whether tags are related from semantic view, rather than isolated? We introduce different algorithms to form networks of folksonomy, connecting tags by users collaborative tagging, or by resource context. Then we perform multiple measures of semantic relatedness on folksonomy networks to investigate semantic information within them. The result shows that the connections between tags have relatively strong semantic relatedness, and the relatedness decreases dramatically as the distance between tags increases. What we find in this paper could provide useful visions in designing future folksonomy-based systems, constructing semantic web in current state of the Internet, and developing natural language processing applications
Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon
This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for diïŹerent languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are
extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which aïŹects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The diïŹerent steps of the procedure (mapping, disambiguation, extraction, NE identiïŹcation and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the systemâs accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented
Mining Meaning from Wikipedia
Wikipedia is a goldmine of information; not just for its many readers, but
also for the growing community of researchers who recognize it as a resource of
exceptional scale and utility. It represents a vast investment of manual effort
and judgment: a huge, constantly evolving tapestry of concepts and relations
that is being applied to a host of tasks.
This article provides a comprehensive description of this work. It focuses on
research that extracts and makes use of the concepts, relations, facts and
descriptions found in Wikipedia, and organizes the work into four broad
categories: applying Wikipedia to natural language processing; using it to
facilitate information retrieval and information extraction; and as a resource
for ontology building. The article addresses how Wikipedia is being used as is,
how it is being improved and adapted, and how it is being combined with other
structures to create entirely new resources. We identify the research groups
and individuals involved, and how their work has developed in the last few
years. We provide a comprehensive list of the open-source software they have
produced.Comment: An extensive survey of re-using information in Wikipedia in natural
language processing, information retrieval and extraction and ontology
building. Accepted for publication in International Journal of Human-Computer
Studie
- âŠ