131 research outputs found

    WikiLinkGraphs: A Complete, Longitudinal and Multi-Language Dataset of the Wikipedia Link Networks

    Full text link
    Wikipedia articles contain multiple links connecting a subject to other pages of the encyclopedia. In Wikipedia parlance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wikipedia links for the 99 largest language editions. The dataset contains yearly snapshots of the network and spans 1717 years, from the creation of Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on the complete hyperlink graph which includes also links automatically generated by templates, we parsed each revision of each article to track links appearing in the main text. In this way we obtained a cleaner network, discarding more than half of the links and representing all and only the links intentionally added by editors. We describe in detail how the Wikipedia dumps have been processed and the challenges we have encountered, including the need to handle special pages such as redirects, i.e., alternative article titles. We present descriptive statistics of several snapshots of this network. Finally, we propose several research opportunities that can be explored using this new dataset.Comment: 10 pages, 3 figures, 7 tables, LaTeX. Final camera-ready version accepted at the 13TH International AAAI Conference on Web and Social Media (ICWSM 2019) - Munich (Germany), 11-14 June 201

    Not all paths lead to Rome: Analysing the network of sister cities

    Full text link
    This work analyses the practice of sister city pairing. We investigate structural properties of the resulting city and country networks and present rankings of the most central nodes in these networks. We identify different country clusters and find that the practice of sister city pairing is not influenced by geographical proximity but results in highly assortative networks.Comment: 7 pages, 4 figure

    Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions

    Full text link
    In this paper we present the Wikipedia Cultural Diversity dataset. For each existing Wikipedia language edition, the dataset contains a classification of the articles that represent its associated cultural context, i.e. all concepts and entities related to the language and to the territories where it is spoken. We describe the methodology we employed to classify articles, and the rich set of features that we defined to feed the classifier, and that are released as part of the dataset. We present several purposes for which we envision the use of this dataset, including detecting, measuring and countering content gaps in the Wikipedia project, and encouraging cross-cultural research in the field of digital humanities.Comment: 10 pages, 2 figure

    Jointly they edit: examining the impact of community identification on political interaction in Wikipedia

    Get PDF
    In their 2005 study, Adamic and Glance coined the memorable phrase "divided they blog", referring to a trend of cyberbalkanization in the political blogosphere, with liberal and conservative blogs tending to link to other blogs with a similar political slant, and not to one another. As political discussion and activity increasingly moves online, the power of framing political discourses is shifting from mass media to social media. Continued examination of political interactions online is critical, and we extend this line of research by examining the activities of political users within the Wikipedia community. First, we examined how users in Wikipedia choose to display (or not to display) their political affiliation. Next, we more closely examined the patterns of cross-party interaction and community participation among those users proclaiming a political affiliation. In contrast to previous analyses of other social media, we did not find strong trends indicating a preference to interact with members of the same political party within the Wikipedia community. Our results indicate that users who proclaim their political affiliation within the community tend to proclaim their identity as a "Wikipedian" even more loudly. It seems that the shared identity of "being Wikipedian" may be strong enough to triumph over other potentially divisive facets of personal identity, such as political affiliation.Comment: 33 pages, 5 figure

    Interactions of cultures and top people of Wikipedia from ranking of 24 language editions

    Get PDF
    Wikipedia is a huge global repository of human knowledge, that can be leveraged to investigate interwinements between cultures. With this aim, we apply methods of Markov chains and Google matrix, for the analysis of the hyperlink networks of 24 Wikipedia language editions, and rank all their articles by PageRank, 2DRank and CheiRank algorithms. Using automatic extraction of people names, we obtain the top 100 historical figures, for each edition and for each algorithm. We investigate their spatial, temporal, and gender distributions in dependence of their cultural origins. Our study demonstrates not only the existence of skewness with local figures, mainly recognized only in their own cultures, but also the existence of global historical figures appearing in a large number of editions. By determining the birth time and place of these persons, we perform an analysis of the evolution of such figures through 35 centuries of human history for each language, thus recovering interactions and entanglement of cultures over time. We also obtain the distributions of historical figures over world countries, highlighting geographical aspects of cross-cultural links. Considering historical figures who appear in multiple editions as interactions between cultures, we construct a network of cultures and identify the most influential cultures according to this network.Comment: 32 pages. 10 figures. Submitted for publication. Supporting information is available on http://www.quantware.ups-tlse.fr/QWLIB/topwikipeople

    L'ontologie NiceTag : les tags en tant que graphes nommés

    Get PDF
    International audienceCurrent tag modelling does not fully take into account the rich and diverse nature tags, as signs, can take on. We propose an ontology of tags in which tags are modelled as named graphs. These named graphs are made of a resource linked to a “sign” which can be any resource reachable on the Web (an ontology concept, an image, etc.). The purpose of our model is to be able to describe tags in a very general manner, and as an immediate conse- quence, to describe tags as modelled by other tag models (SCOT, CommonTag, etc.).Notre analyse part du constat selon lequel les modélisations des tags dont nous disposons actuellement ne prennent pas suffisamment en considération leur richesse et leur diversité. Aussi proposons-nous, pour pallier ce défaut, une ontologie dans laquelle les tags seraient assimilés à des graphes nommés. Ceux-ci sont constitués au minimum d'une ressource reliée à un « signe » qui peut lui-même s'apparenter à n'importe quelle ressource accessible en ligne (un concept d'une ontologie, une image, etc.). Ce modèle entend ainsi fournir une caractérisation suffisamment générale et flexible des tags, et, par voie de conséquence, un cadre susceptible de s'appliquer à tous les tags, quelque soit le modèle sur lequel repose leur description (SCOT, CommonTag, etc.)

    Molecular and Cellular Biology of Prostate Cancer

    Get PDF
    Prostate cancer is an enigmatic disease. Although prostatic-intraepithelial neoplasia appears as early as the third decade and as many as 80% of 80 year old men have epithelial cells in their prostate that fit the morphological criteria for cancer, only about 10% of men will ever have the clinical disease and less than 3% will die from it. There have been no significant proven interventions which have altered the natural history of the disease since hormone down regulation was introduced in the 1940s and new research has been poorly supported. There is however an urgent need to develop new criteria to distinguish those patients with localised disease who will benefit from intervention from those that do not require it or who will have occult extra prostatic metastases. Similarly, there is an urgent need to develop new treatment for those in whom the disease is extra-prostatic and therefore incurable by conventional treatments. This review covers the latest developments in epidemiology, cellular and molecular biology including new areas such as ion channels in the field of prostate cancer
    corecore