6,079 research outputs found
Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains
Biomedical taxonomies, thesauri and ontologies in the form of the
International Classification of Diseases (ICD) as a taxonomy or the National
Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in
acquiring, representing and processing information about human health. With
increasing adoption and relevance, biomedical ontologies have also
significantly increased in size. For example, the 11th revision of the ICD,
which is currently under active development by the WHO contains nearly 50,000
classes representing a vast variety of different diseases and causes of death.
This evolution in terms of size was accompanied by an evolution in the way
ontologies are engineered. Because no single individual has the expertise to
develop such large-scale ontologies, ontology-engineering projects have evolved
from small-scale efforts involving just a few domain experts to large-scale
projects that require effective collaboration between dozens or even hundreds
of experts, practitioners and other stakeholders. Understanding how these
stakeholders collaborate will enable us to improve editing environments that
support such collaborations. We uncover how large ontology-engineering
projects, such as the ICD in its 11th revision, unfold by analyzing usage logs
of five different biomedical ontology-engineering projects of varying sizes and
scopes using Markov chains. We discover intriguing interaction patterns (e.g.,
which properties users subsequently change) that suggest that large
collaborative ontology-engineering projects are governed by a few general
principles that determine and drive development. From our analysis, we identify
commonalities and differences between different projects that have implications
for project managers, ontology editors, developers and contributors working on
collaborative ontology-engineering projects and tools in the biomedical domain.Comment: Published in the Journal of Biomedical Informatic
Mining Meaning from Wikipedia
Wikipedia is a goldmine of information; not just for its many readers, but
also for the growing community of researchers who recognize it as a resource of
exceptional scale and utility. It represents a vast investment of manual effort
and judgment: a huge, constantly evolving tapestry of concepts and relations
that is being applied to a host of tasks.
This article provides a comprehensive description of this work. It focuses on
research that extracts and makes use of the concepts, relations, facts and
descriptions found in Wikipedia, and organizes the work into four broad
categories: applying Wikipedia to natural language processing; using it to
facilitate information retrieval and information extraction; and as a resource
for ontology building. The article addresses how Wikipedia is being used as is,
how it is being improved and adapted, and how it is being combined with other
structures to create entirely new resources. We identify the research groups
and individuals involved, and how their work has developed in the last few
years. We provide a comprehensive list of the open-source software they have
produced.Comment: An extensive survey of re-using information in Wikipedia in natural
language processing, information retrieval and extraction and ontology
building. Accepted for publication in International Journal of Human-Computer
Studie
エンティティ・リンキングのための候補検索とランキング方法に関する研究
Tohoku University乾健太郎課
A domain categorisation of vocabularies based on a deep learning classifier.
The publication of large amounts of open data has become a major trend nowadays. This is a consequence of pro-jects like the Linked Open Data (LOD) community, which publishes and integrates datasets using techniques like Linked Data. Linked Data publishers should follow a set of principles for dataset design. This information is described in a 2011 document that describes tasks as the consideration of reusing vocabularies. With regard to the latter, another project called Linked Open Vocabularies (LOV) attempts to compile the vocabularies used in LOD. These vocabularies have been classified by domain following the subjective criteria of LOV members, which has the inherent risk introducing personal biases. In this paper, we present an automatic classifier of vocabularies based on the main categories of the well-known knowledge source Wikipedia. For this purpose, word-embedding models were used, in combination with Deep Learning techniques. Results show that with a hybrid model of regular Deep Neural Network (DNN), Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN), vocabularies could be classified with an accuracy of 93.57 per cent. Specifically, 36.25 per cent of the vocabularies belong to the Culture category.pre-print304 K
Creating a Semantic Graph from Wikipedia
With the continued need to organize and automate the use of data, solutions are needed to transform unstructred text into structred information. By treating dependency grammar functions as programming language functions, this process produces \property maps which connect entities (people, places, events) with snippets of information. These maps are used to construct a semantic graph. By inputting Wikipedia, a large graph of information is produced representing a section of history. The resulting graph allows a user to quickly browse a topic and view the interconnections between entities across history
Information technology: gateway to direct democracy in China and the world
The world watches as China moves towards greater democracy. The question in everyone's minds, including Chinese themselves, is “what model will China arrive at, at the journey's end?” There are many lessons to be learnt from other countries, some positive (Tanzania) and some negative (Laos). The United States has no doubts about the “goodness” of its own model but their unthinking belief in the superiority of their model should not be accepted at face value. The Chinese government and people will understandably be considering various different models very carefully, so that they can choose the best possible model for their country, and their own context. In this paper we will consider why current Western models of constitution should be viewed with caution by China as it attempts to move towards an improved socialist democracy. The paper considers the electronic voting system used in the US presidential elections, and draws attention to the opportunities for vote rigging that this type of electronic voting facilitates. It also looks at models of democracy used in the ancient world, and compares these with modern systems. Finally, it presents a secure and anonymous mechanism for electronic voting on issues of concern to the population. We conclude by sounding a note of caution about the dangers of plebiscites being used as rubber stamps by dictators if there are inadequate controls over who puts issues to the vote
- …