6,573 research outputs found

    Topology comparison of Twitter diffusion networks effectively reveals misleading information

    Full text link
    In recent years, malicious information had an explosive growth in social media, with serious social and political backlashes. Recent important studies, featuring large-scale analyses, have produced deeper knowledge about this phenomenon, showing that misleading information spreads faster, deeper and more broadly than factual information on social media, where echo chambers, algorithmic and human biases play an important role in diffusion networks. Following these directions, we explore the possibility of classifying news articles circulating on social media based exclusively on a topological analysis of their diffusion networks. To this aim we collected a large dataset of diffusion networks on Twitter pertaining to news articles published on two distinct classes of sources, namely outlets that convey mainstream, reliable and objective information and those that fabricate and disseminate various kinds of misleading articles, including false news intended to harm, satire intended to make people laugh, click-bait news that may be entirely factual or rumors that are unproven. We carried out an extensive comparison of these networks using several alignment-free approaches including basic network properties, centrality measures distributions, and network distances. We accordingly evaluated to what extent these techniques allow to discriminate between the networks associated to the aforementioned news domains. Our results highlight that the communities of users spreading mainstream news, compared to those sharing misleading news, tend to shape diffusion networks with subtle yet systematic differences which might be effectively employed to identify misleading and harmful information.Comment: A revised new version is available on Scientific Report

    Quantifying biogenic bias in screening libraries.

    Get PDF
    In lead discovery, libraries of 10(6) molecules are screened for biological activity. Given the over 10(60) drug-like molecules thought possible, such screens might never succeed. The fact that they do, even occasionally, implies a biased selection of library molecules. We have developed a method to quantify the bias in screening libraries toward biogenic molecules. With this approach, we consider what is missing from screening libraries and how they can be optimized

    Characteristics of WAP traffic

    Get PDF
    This paper considers the characteristics of Wireless Application Protocol (WAP) traffic. We start by constructing a WAP traffic model by analysing the behaviour of users accessing public WAP sites via a monitoring system. A wide range of different traffic scenarios were considered, but most of these scenarios resolve to one of two basic types. The paper then uses this traffic model to consider the effects of large quantities of WAP traffic on the core network. One traffic characteristic which is of particular interest in network dimensioning is the degree of self-similarity, so the paper looks at the characteristics of aggregated traffic with WAP, Web and packet speech components to estimate its self-similarity. The results indicate that, while WAP traffic alone does not exhibit a significant degree of self-similarity, a combined load from various traffic sources retains almost the same degree of self-similarity as the most self-similar individual source

    Detecting Policy Preferences and Dynamics in the UN General Debate with Neural Word Embeddings

    Get PDF
    Foreign policy analysis has been struggling to find ways to measure policy preferences and paradigm shifts in international political systems. This paper presents a novel, potential solution to this challenge, through the application of a neural word embedding (Word2vec) model on a dataset featuring speeches by heads of state or government in the United Nations General Debate. The paper provides three key contributions based on the output of the Word2vec model. First, it presents a set of policy attention indices, synthesizing the semantic proximity of political speeches to specific policy themes. Second, it introduces country-specific semantic centrality indices, based on topological analyses of countries' semantic positions with respect to each other. Third, it tests the hypothesis that there exists a statistical relation between the semantic content of political speeches and UN voting behavior, falsifying it and suggesting that political speeches contain information of different nature then the one behind voting outcomes. The paper concludes with a discussion of the practical use of its results and consequences for foreign policy analysis, public accountability, and transparency

    Large scale homophily analysis in twitter using a twixonomy

    Get PDF
    In this paper we perform a large-scale homophily analysis on Twitter using a hierarchical representation of users' interests which we call a Twixonomy. In order to build a population, community, or single-user Twixonomy we first associate "topical" friends in users' friendship lists (i.e. friends representing an interest rather than a social relation between peers) with Wikipedia categories. A wordsense disambiguation algorithm is used to select the appropriate wikipage for each topical friend. Starting from the set of wikipages representing "primitive" interests, we extract all paths connecting these pages with topmost Wikipedia category nodes, and we then prune the resulting graph G efficiently so as to induce a direct acyclic graph. This graph is the Twixonomy. Then, to analyze homophily, we compare different methods to detect communities in a peer friends Twitter network, and then for each community we compute the degree of homophily on the basis of a measure of pairwise semantic similarity. We show that the Twixonomy provides a means for describing users' interests in a compact and readable way and allows for a fine-grained homophily analysis. Furthermore, we show that midlow level categories in the Twixonomy represent the best balance between informativeness and compactness of the representation

    Characterizing the impact of geometric properties of word embeddings on task performance

    Get PDF
    Analysis of word embedding properties to inform their use in downstream NLP tasks has largely been studied by assessing nearest neighbors. However, geometric properties of the continuous feature space contribute directly to the use of embedding features in downstream models, and are largely unexplored. We consider four properties of word embedding geometry, namely: position relative to the origin, distribution of features in the vector space, global pairwise distances, and local pairwise distances. We define a sequence of transformations to generate new embeddings that expose subsets of these properties to downstream models and evaluate change in task performance to understand the contribution of each property to NLP models. We transform publicly available pretrained embeddings from three popular toolkits (word2vec, GloVe, and FastText) and evaluate on a variety of intrinsic tasks, which model linguistic information in the vector space, and extrinsic tasks, which use vectors as input to machine learning models. We find that intrinsic evaluations are highly sensitive to absolute position, while extrinsic tasks rely primarily on local similarity. Our findings suggest that future embedding models and post-processing techniques should focus primarily on similarity to nearby points in vector space.Comment: Appearing in the Third Workshop on Evaluating Vector Space Representations for NLP (RepEval 2019). 7 pages + reference
    • 

    corecore