8,061 research outputs found
A customisable pipeline for continuously harvesting socially-minded Twitter users
On social media platforms and Twitter in particular, specific classes of
users such as influencers have been given satisfactory operational definitions
in terms of network and content metrics.
Others, for instance online activists, are not less important but their
characterisation still requires experimenting.
We make the hypothesis that such interesting users can be found within
temporally and spatially localised contexts, i.e., small but topical fragments
of the network containing interactions about social events or campaigns with a
significant footprint on Twitter.
To explore this hypothesis, we have designed a continuous user profile
discovery pipeline that produces an ever-growing dataset of user profiles by
harvesting and analysing contexts from the Twitter stream.
The profiles dataset includes key network and content-based users metrics,
enabling experimentation with user-defined score functions that characterise
specific classes of online users.
The paper describes the design and implementation of the pipeline and its
empirical evaluation on a case study consisting of healthcare-related campaigns
in the UK, showing how it supports the operational definitions of online
activism, by comparing three experimental ranking functions. The code is
publicly available.Comment: Procs. ICWE 2019, June 2019, Kore
Local Ranking Problem on the BrowseGraph
The "Local Ranking Problem" (LRP) is related to the computation of a
centrality-like rank on a local graph, where the scores of the nodes could
significantly differ from the ones computed on the global graph. Previous work
has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a
graph where nodes are webpages and edges are browsing transitions. Recently,
this graph has received more and more attention in many different tasks such as
ranking, prediction and recommendation. However, a web-server has only the
browsing traffic performed on its pages (local BrowseGraph) and, as a
consequence, the local computation can lead to estimation errors, which hinders
the increasing number of applications in the state of the art. Also, although
the divergence between the local and global ranks has been measured, the
possibility of estimating such divergence using only local knowledge has been
mainly overlooked. These aspects are of great interest for online service
providers who want to: (i) gauge their ability to correctly assess the
importance of their resources only based on their local knowledge, and (ii)
take into account real user browsing fluxes that better capture the actual user
interest than the static hyperlink network. We study the LRP problem on a
BrowseGraph from a large news provider, considering as subgraphs the
aggregations of browsing traces of users coming from different domains. We show
that the distance between rankings can be accurately predicted based only on
structural information of the local graph, being able to achieve an average
rank correlation as high as 0.8
Uncovering nodes that spread information between communities in social networks
From many datasets gathered in online social networks, well defined community
structures have been observed. A large number of users participate in these
networks and the size of the resulting graphs poses computational challenges.
There is a particular demand in identifying the nodes responsible for
information flow between communities; for example, in temporal Twitter networks
edges between communities play a key role in propagating spikes of activity
when the connectivity between communities is sparse and few edges exist between
different clusters of nodes. The new algorithm proposed here is aimed at
revealing these key connections by measuring a node's vicinity to nodes of
another community. We look at the nodes which have edges in more than one
community and the locality of nodes around them which influence the information
received and broadcasted to them. The method relies on independent random walks
of a chosen fixed number of steps, originating from nodes with edges in more
than one community. For the large networks that we have in mind, existing
measures such as betweenness centrality are difficult to compute, even with
recent methods that approximate the large number of operations required. We
therefore design an algorithm that scales up to the demand of current big data
requirements and has the ability to harness parallel processing capabilities.
The new algorithm is illustrated on synthetic data, where results can be judged
carefully, and also on a real, large scale Twitter activity data, where new
insights can be gained
Viewpoint Discovery and Understanding in Social Networks
The Web has evolved to a dominant platform where everyone has the opportunity
to express their opinions, to interact with other users, and to debate on
emerging events happening around the world. On the one hand, this has enabled
the presence of different viewpoints and opinions about a - usually
controversial - topic (like Brexit), but at the same time, it has led to
phenomena like media bias, echo chambers and filter bubbles, where users are
exposed to only one point of view on the same topic. Therefore, there is the
need for methods that are able to detect and explain the different viewpoints.
In this paper, we propose a graph partitioning method that exploits social
interactions to enable the discovery of different communities (representing
different viewpoints) discussing about a controversial topic in a social
network like Twitter. To explain the discovered viewpoints, we describe a
method, called Iterative Rank Difference (IRD), which allows detecting
descriptive terms that characterize the different viewpoints as well as
understanding how a specific term is related to a viewpoint (by detecting other
related descriptive terms). The results of an experimental evaluation showed
that our approach outperforms state-of-the-art methods on viewpoint discovery,
while a qualitative analysis of the proposed IRD method on three different
controversial topics showed that IRD provides comprehensive and deep
representations of the different viewpoints
An Email Attachment is Worth a Thousand Words, or Is It?
There is an extensive body of research on Social Network Analysis (SNA) based
on the email archive. The network used in the analysis is generally extracted
either by capturing the email communication in From, To, Cc and Bcc email
header fields or by the entities contained in the email message. In the latter
case, the entities could be, for instance, the bag of words, url's, names,
phones, etc. It could also include the textual content of attachments, for
instance Microsoft Word documents, excel spreadsheets, or Adobe pdfs. The nodes
in this network represent users and entities. The edges represent communication
between users and relations to the entities. We suggest taking a different
approach to the network extraction and use attachments shared between users as
the edges. The motivation for this is two-fold. First, attachments represent
the "intimacy" manifestation of the relation's strength. Second, the
statistical analysis of private email archives that we collected and Enron
email corpus shows that the attachments contribute in average around 80-90% to
the archive's disk-space usage, which means that most of the data is presently
ignored in the SNA of email archives. Consequently, we hypothesize that this
approach might provide more insight into the social structure of the email
archive. We extract the communication and shared attachments networks from
Enron email corpus. We further analyze degree, betweenness, closeness, and
eigenvector centrality measures in both networks and review the differences and
what can be learned from them. We use nearest neighbor algorithm to generate
similarity groups for five Enron employees. The groups are consistent with
Enron's organizational chart, which validates our approach.Comment: 12 pages, 4 figures, 7 tables, IML'17, Liverpool, U
- …