26 research outputs found

    How does correlation structure differ between real and fabricated data-sets?

    Get PDF
    BACKGROUND: Misconduct in medical research has been the subject of many papers in recent years. Among different types of misconduct, data fabrication might be considered as one of the most severe cases. There have been some arguments that correlation coefficients in fabricated data-sets are usually greater than that found in real data-sets. We aim to study the differences between real and fabricated data-sets in term of the association between two variables. METHOD: Three examples are presented where outcomes from made up (fabricated) data-sets are compared with the results from three real data-sets and with appropriate simulated data-sets. Data-sets were made up by faculty members in three universities. The first two examples are devoted to the correlation structures between continuous variables in two different settings: first, when there is high correlation coefficient between variables, second, when the variables are not correlated. In the third example the differences between real data-set and fabricated data-sets are studied using the independent t-test for comparison between two means. RESULTS: In general, higher correlation coefficients are seen in made up data-sets compared to the real data-sets. This occurs even when the participants are aware that the correlation coefficient for the corresponding real data-set is zero. The findings from the third example, a comparison between means in two groups, shows that many people tend to make up data with less or no differences between groups even when they know how and to what extent the groups are different. CONCLUSION: This study indicates that high correlation coefficients can be considered as a leading sign of data fabrication; as more than 40% of the participants generated variables with correlation coefficients greater than 0.70. However, when inspecting for the differences between means in different groups, the same rule may not be applicable as we observed smaller differences between groups in made up compared to the real data-set. We also showed that inspecting the scatter-plot of two variables can be considered as a useful tool for uncovering fabricated data

    Embedding Complex Networks

    No full text
    Graph embedding is a transformation of nodes of a graph into a set of vectors. A good embedding should capture the graph topology, node-to-node relationship, and other relevant information about the graph. The main challenge at hand is to ensure that embeddings describe the properties of the graph well. As a result, selecting the best embedding is a challenging task and very often requires domain experts. In this thesis, we implement a series of extensive experiments with selected graph embedding algorithms, both on real-world and artificial networks. We conclude from these experiments that Node2Vec is the general best choice of algorithm, but that there is no single winner in all tests. Therefore, our main recommendation for practitioners is, if possible, to generate several embeddings for a problem at hand and use a general framework that provides a tool for an unsupervised graph embedding comparison.</p
    corecore