8,669 research outputs found
Wikipedias: Collaborative web-based encyclopedias as complex networks
Wikipedia is a popular web-based encyclopedia edited freely and
collaboratively by its users. In this paper we present an analysis of
Wikipedias in several languages as complex networks. The hyperlinks pointing
from one Wikipedia article to another are treated as directed links while the
articles represent the nodes of the network. We show that many network
characteristics are common to different language versions of Wikipedia, such as
their degree distributions, growth, topology, reciprocity, clustering,
assortativity, path lengths and triad significance profiles. These
regularities, found in the ensemble of Wikipedias in different languages and of
different sizes, point to the existence of a unique growth process. We also
compare Wikipedias to other previously studied networks.Comment: v3: 9 pages, 12 figures, Change of title, few paragraphs and two
figures. Accepted for publication in Phys. Rev.
Interactions of cultures and top people of Wikipedia from ranking of 24 language editions
Wikipedia is a huge global repository of human knowledge, that can be
leveraged to investigate interwinements between cultures. With this aim, we
apply methods of Markov chains and Google matrix, for the analysis of the
hyperlink networks of 24 Wikipedia language editions, and rank all their
articles by PageRank, 2DRank and CheiRank algorithms. Using automatic
extraction of people names, we obtain the top 100 historical figures, for each
edition and for each algorithm. We investigate their spatial, temporal, and
gender distributions in dependence of their cultural origins. Our study
demonstrates not only the existence of skewness with local figures, mainly
recognized only in their own cultures, but also the existence of global
historical figures appearing in a large number of editions. By determining the
birth time and place of these persons, we perform an analysis of the evolution
of such figures through 35 centuries of human history for each language, thus
recovering interactions and entanglement of cultures over time. We also obtain
the distributions of historical figures over world countries, highlighting
geographical aspects of cross-cultural links. Considering historical figures
who appear in multiple editions as interactions between cultures, we construct
a network of cultures and identify the most influential cultures according to
this network.Comment: 32 pages. 10 figures. Submitted for publication. Supporting
information is available on
http://www.quantware.ups-tlse.fr/QWLIB/topwikipeople
Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection
The state-of-the-art named entity recognition (NER) systems are supervised
machine learning models that require large amounts of manually annotated data
to achieve high accuracy. However, annotating NER data by human is expensive
and time-consuming, and can be quite difficult for a new language. In this
paper, we present two weakly supervised approaches for cross-lingual NER with
no human annotation in a target language. The first approach is to create
automatically labeled NER data for a target language via annotation projection
on comparable corpora, where we develop a heuristic scheme that effectively
selects good-quality projection-labeled data from noisy data. The second
approach is to project distributed representations of words (word embeddings)
from a target language to a source language, so that the source-language NER
system can be applied to the target language without re-training. We also
design two co-decoding schemes that effectively combine the outputs of the two
projection-based approaches. We evaluate the performance of the proposed
approaches on both in-house and open NER data for several target languages. The
results show that the combined systems outperform three other weakly supervised
approaches on the CoNLL data.Comment: 11 pages, The 55th Annual Meeting of the Association for
Computational Linguistics (ACL), 201
Highlighting Entanglement of Cultures via Ranking of Multilingual Wikipedia Articles
How different cultures evaluate a person? Is an important person in one
culture is also important in the other culture? We address these questions via
ranking of multilingual Wikipedia articles. With three ranking algorithms based
on network structure of Wikipedia, we assign ranking to all articles in 9
multilingual editions of Wikipedia and investigate general ranking structure of
PageRank, CheiRank and 2DRank. In particular, we focus on articles related to
persons, identify top 30 persons for each rank among different editions and
analyze distinctions of their distributions over activity fields such as
politics, art, science, religion, sport for each edition. We find that local
heroes are dominant but also global heroes exist and create an effective
network representing entanglement of cultures. The Google matrix analysis of
network of cultures shows signs of the Zipf law distribution. This approach
allows to examine diversity and shared characteristics of knowledge
organization between cultures. The developed computational, data driven
approach highlights cultural interconnections in a new perspective.Comment: Published in PLoS ONE
(http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0074554).
Supporting information is available on the same webpag
- …