6,573 research outputs found
Topology comparison of Twitter diffusion networks effectively reveals misleading information
In recent years, malicious information had an explosive growth in social
media, with serious social and political backlashes. Recent important studies,
featuring large-scale analyses, have produced deeper knowledge about this
phenomenon, showing that misleading information spreads faster, deeper and more
broadly than factual information on social media, where echo chambers,
algorithmic and human biases play an important role in diffusion networks.
Following these directions, we explore the possibility of classifying news
articles circulating on social media based exclusively on a topological
analysis of their diffusion networks. To this aim we collected a large dataset
of diffusion networks on Twitter pertaining to news articles published on two
distinct classes of sources, namely outlets that convey mainstream, reliable
and objective information and those that fabricate and disseminate various
kinds of misleading articles, including false news intended to harm, satire
intended to make people laugh, click-bait news that may be entirely factual or
rumors that are unproven. We carried out an extensive comparison of these
networks using several alignment-free approaches including basic network
properties, centrality measures distributions, and network distances. We
accordingly evaluated to what extent these techniques allow to discriminate
between the networks associated to the aforementioned news domains. Our results
highlight that the communities of users spreading mainstream news, compared to
those sharing misleading news, tend to shape diffusion networks with subtle yet
systematic differences which might be effectively employed to identify
misleading and harmful information.Comment: A revised new version is available on Scientific Report
Quantifying biogenic bias in screening libraries.
In lead discovery, libraries of 10(6) molecules are screened for biological activity. Given the over 10(60) drug-like molecules thought possible, such screens might never succeed. The fact that they do, even occasionally, implies a biased selection of library molecules. We have developed a method to quantify the bias in screening libraries toward biogenic molecules. With this approach, we consider what is missing from screening libraries and how they can be optimized
Characteristics of WAP traffic
This paper considers the characteristics of Wireless Application Protocol (WAP) traffic. We start by constructing a WAP traffic model by analysing the behaviour of users accessing public WAP sites via a monitoring system. A wide range of different traffic scenarios were considered, but most of these scenarios resolve to one of two basic types. The paper then uses this traffic model to consider the effects of large quantities of WAP traffic on the core network. One traffic characteristic which is of particular interest in network dimensioning is the degree of self-similarity, so the paper looks at the characteristics of aggregated traffic with WAP, Web and packet speech components to estimate its self-similarity. The results indicate that, while WAP traffic alone does not exhibit a significant degree of self-similarity, a combined load from various traffic sources retains almost the same degree of self-similarity as the most self-similar individual source
Detecting Policy Preferences and Dynamics in the UN General Debate with Neural Word Embeddings
Foreign policy analysis has been struggling to find ways to measure policy
preferences and paradigm shifts in international political systems. This paper
presents a novel, potential solution to this challenge, through the application
of a neural word embedding (Word2vec) model on a dataset featuring speeches by
heads of state or government in the United Nations General Debate. The paper
provides three key contributions based on the output of the Word2vec model.
First, it presents a set of policy attention indices, synthesizing the semantic
proximity of political speeches to specific policy themes. Second, it
introduces country-specific semantic centrality indices, based on topological
analyses of countries' semantic positions with respect to each other. Third, it
tests the hypothesis that there exists a statistical relation between the
semantic content of political speeches and UN voting behavior, falsifying it
and suggesting that political speeches contain information of different nature
then the one behind voting outcomes. The paper concludes with a discussion of
the practical use of its results and consequences for foreign policy analysis,
public accountability, and transparency
Large scale homophily analysis in twitter using a twixonomy
In this paper we perform a large-scale homophily analysis on Twitter using a hierarchical representation of users' interests which we call a Twixonomy. In order to build a population, community, or single-user Twixonomy we first associate "topical" friends in users' friendship lists (i.e. friends representing an interest rather than a social relation between peers) with Wikipedia categories. A wordsense disambiguation algorithm is used to select the appropriate wikipage for each topical friend. Starting from the set of wikipages representing "primitive" interests, we extract all paths connecting these pages with topmost Wikipedia category nodes, and we then prune the resulting graph G efficiently so as to induce a direct acyclic graph. This graph is the Twixonomy. Then, to analyze homophily, we compare different methods to detect communities in a peer friends Twitter network, and then for each community we compute the degree of homophily on the basis of a measure of pairwise semantic similarity. We show that the Twixonomy provides a means for describing users' interests in a compact and readable way and allows for a fine-grained homophily analysis. Furthermore, we show that midlow level categories in the Twixonomy represent the best balance between informativeness and compactness of the representation
Characterizing the impact of geometric properties of word embeddings on task performance
Analysis of word embedding properties to inform their use in downstream NLP
tasks has largely been studied by assessing nearest neighbors. However,
geometric properties of the continuous feature space contribute directly to the
use of embedding features in downstream models, and are largely unexplored. We
consider four properties of word embedding geometry, namely: position relative
to the origin, distribution of features in the vector space, global pairwise
distances, and local pairwise distances. We define a sequence of
transformations to generate new embeddings that expose subsets of these
properties to downstream models and evaluate change in task performance to
understand the contribution of each property to NLP models. We transform
publicly available pretrained embeddings from three popular toolkits (word2vec,
GloVe, and FastText) and evaluate on a variety of intrinsic tasks, which model
linguistic information in the vector space, and extrinsic tasks, which use
vectors as input to machine learning models. We find that intrinsic evaluations
are highly sensitive to absolute position, while extrinsic tasks rely primarily
on local similarity. Our findings suggest that future embedding models and
post-processing techniques should focus primarily on similarity to nearby
points in vector space.Comment: Appearing in the Third Workshop on Evaluating Vector Space
Representations for NLP (RepEval 2019). 7 pages + reference
- âŠ