131 research outputs found
WikiLinkGraphs: A Complete, Longitudinal and Multi-Language Dataset of the Wikipedia Link Networks
Wikipedia articles contain multiple links connecting a subject to other pages
of the encyclopedia. In Wikipedia parlance, these links are called internal
links or wikilinks. We present a complete dataset of the network of internal
Wikipedia links for the largest language editions. The dataset contains
yearly snapshots of the network and spans years, from the creation of
Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on
the complete hyperlink graph which includes also links automatically generated
by templates, we parsed each revision of each article to track links appearing
in the main text. In this way we obtained a cleaner network, discarding more
than half of the links and representing all and only the links intentionally
added by editors. We describe in detail how the Wikipedia dumps have been
processed and the challenges we have encountered, including the need to handle
special pages such as redirects, i.e., alternative article titles. We present
descriptive statistics of several snapshots of this network. Finally, we
propose several research opportunities that can be explored using this new
dataset.Comment: 10 pages, 3 figures, 7 tables, LaTeX. Final camera-ready version
accepted at the 13TH International AAAI Conference on Web and Social Media
(ICWSM 2019) - Munich (Germany), 11-14 June 201
Not all paths lead to Rome: Analysing the network of sister cities
This work analyses the practice of sister city pairing. We investigate
structural properties of the resulting city and country networks and present
rankings of the most central nodes in these networks. We identify different
country clusters and find that the practice of sister city pairing is not
influenced by geographical proximity but results in highly assortative
networks.Comment: 7 pages, 4 figure
Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions
In this paper we present the Wikipedia Cultural Diversity dataset. For each
existing Wikipedia language edition, the dataset contains a classification of
the articles that represent its associated cultural context, i.e. all concepts
and entities related to the language and to the territories where it is spoken.
We describe the methodology we employed to classify articles, and the rich set
of features that we defined to feed the classifier, and that are released as
part of the dataset. We present several purposes for which we envision the use
of this dataset, including detecting, measuring and countering content gaps in
the Wikipedia project, and encouraging cross-cultural research in the field of
digital humanities.Comment: 10 pages, 2 figure
Jointly they edit: examining the impact of community identification on political interaction in Wikipedia
In their 2005 study, Adamic and Glance coined the memorable phrase "divided
they blog", referring to a trend of cyberbalkanization in the political
blogosphere, with liberal and conservative blogs tending to link to other blogs
with a similar political slant, and not to one another. As political discussion
and activity increasingly moves online, the power of framing political
discourses is shifting from mass media to social media. Continued examination
of political interactions online is critical, and we extend this line of
research by examining the activities of political users within the Wikipedia
community. First, we examined how users in Wikipedia choose to display (or not
to display) their political affiliation. Next, we more closely examined the
patterns of cross-party interaction and community participation among those
users proclaiming a political affiliation. In contrast to previous analyses of
other social media, we did not find strong trends indicating a preference to
interact with members of the same political party within the Wikipedia
community. Our results indicate that users who proclaim their political
affiliation within the community tend to proclaim their identity as a
"Wikipedian" even more loudly. It seems that the shared identity of "being
Wikipedian" may be strong enough to triumph over other potentially divisive
facets of personal identity, such as political affiliation.Comment: 33 pages, 5 figure
Interactions of cultures and top people of Wikipedia from ranking of 24 language editions
Wikipedia is a huge global repository of human knowledge, that can be
leveraged to investigate interwinements between cultures. With this aim, we
apply methods of Markov chains and Google matrix, for the analysis of the
hyperlink networks of 24 Wikipedia language editions, and rank all their
articles by PageRank, 2DRank and CheiRank algorithms. Using automatic
extraction of people names, we obtain the top 100 historical figures, for each
edition and for each algorithm. We investigate their spatial, temporal, and
gender distributions in dependence of their cultural origins. Our study
demonstrates not only the existence of skewness with local figures, mainly
recognized only in their own cultures, but also the existence of global
historical figures appearing in a large number of editions. By determining the
birth time and place of these persons, we perform an analysis of the evolution
of such figures through 35 centuries of human history for each language, thus
recovering interactions and entanglement of cultures over time. We also obtain
the distributions of historical figures over world countries, highlighting
geographical aspects of cross-cultural links. Considering historical figures
who appear in multiple editions as interactions between cultures, we construct
a network of cultures and identify the most influential cultures according to
this network.Comment: 32 pages. 10 figures. Submitted for publication. Supporting
information is available on
http://www.quantware.ups-tlse.fr/QWLIB/topwikipeople
L'ontologie NiceTag : les tags en tant que graphes nommés
International audienceCurrent tag modelling does not fully take into account the rich and diverse nature tags, as signs, can take on. We propose an ontology of tags in which tags are modelled as named graphs. These named graphs are made of a resource linked to a “sign” which can be any resource reachable on the Web (an ontology concept, an image, etc.). The purpose of our model is to be able to describe tags in a very general manner, and as an immediate conse- quence, to describe tags as modelled by other tag models (SCOT, CommonTag, etc.).Notre analyse part du constat selon lequel les modélisations des tags dont nous disposons actuellement ne prennent pas suffisamment en considération leur richesse et leur diversité. Aussi proposons-nous, pour pallier ce défaut, une ontologie dans laquelle les tags seraient assimilés à des graphes nommés. Ceux-ci sont constitués au minimum d'une ressource reliée à un « signe » qui peut lui-même s'apparenter à n'importe quelle ressource accessible en ligne (un concept d'une ontologie, une image, etc.). Ce modèle entend ainsi fournir une caractérisation suffisamment générale et flexible des tags, et, par voie de conséquence, un cadre susceptible de s'appliquer à tous les tags, quelque soit le modèle sur lequel repose leur description (SCOT, CommonTag, etc.)
Molecular and Cellular Biology of Prostate Cancer
Prostate cancer is an enigmatic disease. Although prostatic-intraepithelial neoplasia appears as early as the third decade and as many as 80% of 80 year old men have epithelial cells in their prostate that fit the morphological criteria for cancer, only about 10% of men will ever have the clinical disease and less than 3% will die from it. There have been no significant proven interventions which have altered the natural history of the disease since hormone down regulation was introduced in the 1940s and new research has been poorly supported. There is however an urgent need to develop new criteria to distinguish those patients with localised disease who will benefit from intervention from those that do not require it or who will have occult extra prostatic metastases. Similarly, there is an urgent need to develop new treatment for those in whom the disease is extra-prostatic and therefore incurable by conventional treatments. This review covers the latest developments in epidemiology, cellular and molecular biology including new areas such as ion channels in the field of prostate cancer
- …