19 research outputs found

    The detection and effect of social events on Wikipedia data-set for studying human preferences

    Get PDF
    Several studies have used Wikipedia (WP) data-set to analyse worldwide human preferences by languages. However, those studies could suffer from bias related to exceptional social circumstances. Any massive event promoting exceptional editions of WP can be defined as a source of bias. In this article, we follow a procedure for detecting outliers. Our study is based on 12 languages and 13 different categories. Our methodology defines a parameter, which is language-dependent instead of being externally fixed. We also study the presence of human cyclic behavior to evaluate apparent outliers. After our analysis, we found that the outliers in our data-set do not significantly affect the analysis of preferences by categories among different WP languages. While investigating the possibility of bias related to exceptional social circumstances is always a safe measure before doing any analysis on Big Data, we found that in the case of the first ten years of the Wikipedia data-set, outliers do not significantly affect using Wikipedia data-set as a digital footprint to analyse worldwide human preferences

    Bounded confidence models generate more secondary clusters when the number of agents is growing

    Full text link
    We study the bounded confidence model on a growing population. We compare simulations of the agent model, its version in continuous densities and with the standard influence function or a smoother influence function. We find that the model on a growing population generates bigger secondary clusters and more systematically than when the population is fixed. Moreover, our tests with the smooth influence function suggest that these secondary clusters can be generated by a different mechanism when the population is growing than when it is fixed.Comment: 16 pages, 8 figure

    Measuring the effect of node aggregation on community detection

    Full text link
    Many times the nodes of a complex network, whether deliberately or not, are aggregated for technical, ethical, legal limitations or privacy reasons. A common example is the geographic position: one may uncover communities in a network of places, or of individuals identified with their typical geographical position, and then aggregate these places into larger entities, such as municipalities, thus obtaining another network. The communities found in the networks obtained at various levels of aggregation may exhibit various degrees of similarity, from full alignment to perfect independence. This is akin to the problem of ecological and atomic fallacies in statistics, or to the Modified Areal Unit Problem in geography. We identify the class of community detection algorithms most suitable to cope with node aggregation, and develop an index for aggregability, capturing to which extent the aggregation preserves the community structure. We illustrate its relevance on real-world examples (mobile phone and Twitter reply-to networks). Our main message is that any node-partitioning analysis performed on aggregated networks should be interpreted with caution, as the outcome may be strongly influenced by the level of the aggregation.Comment: 12 pages, 5 figure

    Continuous opinion model in small world directed networks

    Full text link
    In the compromise model of continuous opinions proposed by Deffuant et al, the states of two agents in a network can start to converge if they are neighbors and if their opinions are sufficiently close to each other, below a given threshold of tolerance ϵ\epsilon. In directed networks, if agent i is a neighbor of agent j, j need not be a neighbor of i. In Watts-Strogatz networks we performed simulations to find the averaged number of final opinions and their distribution as a function of $\epsilon$ and of the network structural disorder. In directed networks exhibits a rich structure, being larger than in undirected networks for higher values of ϵ\epsilon, and smaller for lower values of ϵ\epsilon.Comment: 15 pages, 6 figure

    The Detection and effect of social events on Wikipedia data-set for studying human preferences

    Full text link
    Several studies have used Wikipedia (WP) data-set to analyse worldwide human preferences by languages. However, those studies could suffer from bias related to exceptional social circumstances. Any massive event promoting the exceptional edition of WP can be defined as a source of bias. In this article, we follow a procedure for detecting outliers. Our study is based on 1212 languages and 1313 different categories. Our methodology defines a parameter, which is language-depending instead of being externally fixed. We also study the presence of human cyclic behaviour to evaluate apparent outliers. After our analysis, we found that the outliers in our data set do not significantly affect using the whole Wikipedia-data set as a digital footprint to analyse worldwide human preferences.Comment: 8 pages, 4 figure

    A multilevel analysis to systemic exposure: insights from local and system-wide information

    Get PDF
    In the aftermath of the financial crisis, the growing literature on financial networks has widely documented the predictive power of topological characteristics (e.g. degree centrality measures) to explain the systemic impact or systemic vulnerability of financial institutions. In this work, we show that considering alternative topological measures based on local sub-network environment improves our ability to identify systemic institutions. To provide empirical evidence, we apply a two-step procedure. First, we recover network communities (i.e. close-peer environment) on a spillover network of financial institutions. Second, we regress alternative measures of vulnerability on three levels of topological measures: the global level (i.e. firm topological characteristics computed over the whole system), local level (i.e. firm topological characteristics computed over the community) and aggregated level by averaging individual characteristics over the community. The sample includes 4646 financial institutions (banks, broker-dealers, insurance and real-estate companies) listed in the Standard \& Poor's 500 index. Our results confirm the informational content of topological metrics based on close-peer environment. Such information is different from the one embeds in traditional system wide topological metrics and is proved to be predictor of distress for financial institutions in time of crisis.Comment: 12 pages, 3 figures and 3 table

    What Can Wikipedia Tell Us About the Global or Local Character of Burstiness?

    No full text
    In this communication we take advantage of the global covering character of Wikipedia dataset to analyze the dependence of the usual coefficients used to measure burstiness respect to language. Analyzing separately the patterns for single editors over several pages, we show several characteristics of the super-editors in the WP written in English, Spanish, French and Portuguese. We report for the first time the Burstiness and Memory effect coefficients, separately for the 4 WP’s, showing similitudes and differences for all the users respect to the super-editors, the exponent for their averaged inter-event activity and finally some statistical traces for their averaged monthly activity
    corecore