1,025 research outputs found
Evolution of Wikipedia's Category Structure
Wikipedia, as a social phenomenon of collaborative knowledge creating, has
been studied extensively from various points of views. The category system of
Wikipedia, introduced in 2004, has attracted relatively little attention. In
this study, we focus on the documentation of knowledge, and the transformation
of this documentation with time. We take Wikipedia as a proxy for knowledge in
general and its category system as an aspect of the structure of this
knowledge. We investigate the evolution of the category structure of the
English Wikipedia from its birth in 2004 to 2008. We treat the category system
as if it is a hierarchical Knowledge Organization System, capturing the changes
in the distributions of the top categories. We investigate how the clustering
of articles, defined by the category system, matches the direct link network
between the articles and show how it changes over time. We find the Wikipedia
category network mostly stable, but with occasional reorganization. We show
that the clustering matches the link structure quite well, except short periods
preceding the reorganizations.Comment: Preprint of an article submitted for consideration in Advances in
Complex Systems (2012) http://www.worldscinet.com/acs/, 19 pages, 7 figure
Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data
Use of socially generated "big data" to access information about collective
states of the minds in human societies has become a new paradigm in the
emerging field of computational social science. A natural application of this
would be the prediction of the society's reaction to a new product in the sense
of popularity and adoption rate. However, bridging the gap between "real time
monitoring" and "early predicting" remains a big challenge. Here we report on
an endeavor to build a minimalistic predictive model for the financial success
of movies based on collective activity data of online users. We show that the
popularity of a movie can be predicted much before its release by measuring and
analyzing the activity level of editors and viewers of the corresponding entry
to the movie in Wikipedia, the well-known online encyclopedia.Comment: 13 pages, Including Supporting Information, 7 Figures, Download the
dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zi
Changing Higher Education Learning with Web 2.0 and Open Education Citation, Annotation, and Thematic Coding Appendices
Appendices of citations, annotations and themes for research conducted on four websites: Delicious, Wikipedia, YouTube, and Facebook
Creating a Semantic Graph from Wikipedia
With the continued need to organize and automate the use of data, solutions are needed to transform unstructred text into structred information. By treating dependency grammar functions as programming language functions, this process produces \property maps which connect entities (people, places, events) with snippets of information. These maps are used to construct a semantic graph. By inputting Wikipedia, a large graph of information is produced representing a section of history. The resulting graph allows a user to quickly browse a topic and view the interconnections between entities across history
Document controversy classification based on the Wikipedia category structure
Dispute and controversy are parts of our culture and cannot be omitted on the Internet (where it becomes more anonymous). There have been many studies on controversy, especially on social networks such as Wikipedia. This free on-line encyclopedia has become a very popular data source among many researchers studying behavior or natural language processing. This paper presents using the category structure of Wikipedia to determine the controversy of a single article. This is the first part of the proposed system for classification of topic controversy score for any given text
The Computer Science Ontology: A Comprehensive Automatically-Generated Taxonomy of Research Areas
Ontologies of research areas are important tools for characterising, exploring, and analysing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 14K topics and 162K semantic relationships. It was created by applying the Klink-2 algorithm on a very large dataset of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO, we have also released the CSO Classifier, a tool for automatically classifying research papers, and the CSO Portal, a web application that enables users to download, explore, and provide granular feedback on CSO. Users can use the portal to navigate and visualise sections of the ontology, rate topics and relationships, and suggest missing ones. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various research communities engaged with scholarly data
- …