7,218 research outputs found
Mining Domain-Specific Thesauri from Wikipedia: A case study
Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts
Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia
In this paper we investigate the nature and structure of the relation between
imposed classifications and real clustering in a particular case of a
scale-free network given by the on-line encyclopedia Wikipedia. We find a
statistical similarity in the distributions of community sizes both by using
the top-down approach of the categories division present in the archive and in
the bottom-up procedure of community detection given by an algorithm based on
the spectral properties of the graph. Regardless the statistically similar
behaviour the two methods provide a rather different division of the articles,
thereby signaling that the nature and presence of power laws is a general
feature for these systems and cannot be used as a benchmark to evaluate the
suitability of a clustering method.Comment: 5 pages, 3 figures, epl2 styl
Towards Better Understanding Researcher Strategies in Cross-Lingual Event Analytics
With an increasing amount of information on globally important events, there
is a growing demand for efficient analytics of multilingual event-centric
information. Such analytics is particularly challenging due to the large amount
of content, the event dynamics and the language barrier. Although memory
institutions increasingly collect event-centric Web content in different
languages, very little is known about the strategies of researchers who conduct
analytics of such content. In this paper we present researchers' strategies for
the content, method and feature selection in the context of cross-lingual
event-centric analytics observed in two case studies on multilingual Wikipedia.
We discuss the influence factors for these strategies, the findings enabled by
the adopted methods along with the current limitations and provide
recommendations for services supporting researchers in cross-lingual
event-centric analytics.Comment: In Proceedings of the International Conference on Theory and Practice
of Digital Libraries 201
Vision of a Visipedia
The web is not perfect: while text is easily
searched and organized, pictures (the vast majority of the bits
that one can find online) are not. In order to see how one could
improve the web and make pictures first-class citizens of the
web, I explore the idea of Visipedia, a visual interface for
Wikipedia that is able to answer visual queries and enables
experts to contribute and organize visual knowledge. Five
distinct groups of humans would interact through Visipedia:
users, experts, editors, visual workers, and machine vision
scientists. The latter would gradually build automata able to
interpret images. I explore some of the technical challenges
involved in making Visipedia happen. I argue that Visipedia will
likely grow organically, combining state-of-the-art machine
vision with human labor
- …