17,208 research outputs found
Recommended from our members
Similarities, challenges and opportunities of wikipedia content and open source projects
Copyright @ 2012 John Wiley & Sons, Ltd.Several years of research and evidence have demonstrated that Open Source Software (OSS) portals often contain a large amount of software projects that simply do not evolve, developed by relatively small communities, struggling to attract a sustained number of contributors. These portals have started to
increasingly act as a storage for abandoned projects, and researchers and practitioners should try and point out how to take advantage of such content. Similarly, other online content portals (like Wikipedia) could be harvested for valuable content. In this paper we argue that, even with differences in the requested expertise, many projects reliant on content and contributions by users undergo a similar evolution, and follow similar patterns: when a project fails to attract contributors, it appears to be not evolving, or abandoned. Far from a negative finding, even those projects could provide valuable content that should be harvested and identified based on common characteristics: by using the attributes of âusefulnessâ and âmodularityâ we isolate valuable content in both Wikipedia pages and OSS projects
Mining Domain-Specific Thesauri from Wikipedia: A case study
Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts
Topic Similarity Networks: Visual Analytics for Large Document Sets
We investigate ways in which to improve the interpretability of LDA topic
models by better analyzing and visualizing their outputs. We focus on examining
what we refer to as topic similarity networks: graphs in which nodes represent
latent topics in text collections and links represent similarity among topics.
We describe efficient and effective approaches to both building and labeling
such networks. Visualizations of topic models based on these networks are shown
to be a powerful means of exploring, characterizing, and summarizing large
collections of unstructured text documents. They help to "tease out"
non-obvious connections among different sets of documents and provide insights
into how topics form larger themes. We demonstrate the efficacy and
practicality of these approaches through two case studies: 1) NSF grants for
basic research spanning a 14 year period and 2) the entire English portion of
Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData
2014
Temporal characterization of the requests to Wikipedia
This paper presents an empirical study about the temporal patterns
characterizing the requests submitted by users to Wikipedia.
The study is based on the analysis of the log lines registered by the
Wikimedia Foundation Squid servers after having sent the appropriate
content in response to users' requests. The
analysis has been conducted regarding the ten most visited editions of
Wikipedia and has involved more than 14,000 million log lines
corresponding to the traffic of the entire year 2009. The conducted methodology
has mainly consisted in the parsing and filtering
of users' requests according to the study directives. As a result, relevant information
fields have been finally stored in a database for persistence and further
characterization. In this way, we, first, assessed, whether the traffic to Wikipedia could serve
as a reliable estimator of the overall traffic to all the Wikimedia Foundation
projects. Our subsequent analysis of the temporal evolutions corresponding to
the different types of requests to Wikipedia revealed interesting differences
and similarities among them that can be related to the users' attention to the Encyclopedia.
In addition, we have performed separated characterizations of each Wikipedia edition
to compare their respective evolutions over time
- âŠ