23 research outputs found
A Data-driven Study of Influences in Twitter Communities
This paper presents a quantitative study of Twitter, one of the most popular
micro-blogging services, from the perspective of user influence. We crawl
several datasets from the most active communities on Twitter and obtain 20.5
million user profiles, along with 420.2 million directed relations and 105
million tweets among the users. User influence scores are obtained from
influence measurement services, Klout and PeerIndex. Our analysis reveals
interesting findings, including non-power-law influence distribution, strong
reciprocity among users in a community, the existence of homophily and
hierarchical relationships in social influences. Most importantly, we observe
that whether a user retweets a message is strongly influenced by the first of
his followees who posted that message. To capture such an effect, we propose
the first influencer (FI) information diffusion model and show through
extensive evaluation that compared to the widely adopted independent cascade
model, the FI model is more stable and more accurate in predicting influence
spreads in Twitter communities.Comment: 11 page
QDEE: Question Difficulty and Expertise Estimation in Community Question Answering Sites
In this paper, we present a framework for Question Difficulty and Expertise
Estimation (QDEE) in Community Question Answering sites (CQAs) such as Yahoo!
Answers and Stack Overflow, which tackles a fundamental challenge in
crowdsourcing: how to appropriately route and assign questions to users with
the suitable expertise. This problem domain has been the subject of much
research and includes both language-agnostic as well as language conscious
solutions. We bring to bear a key language-agnostic insight: that users gain
expertise and therefore tend to ask as well as answer more difficult questions
over time. We use this insight within the popular competition (directed) graph
model to estimate question difficulty and user expertise by identifying key
hierarchical structure within said model. An important and novel contribution
here is the application of "social agony" to this problem domain. Difficulty
levels of newly posted questions (the cold-start problem) are estimated by
using our QDEE framework and additional textual features. We also propose a
model to route newly posted questions to appropriate users based on the
difficulty level of the question and the expertise of the user. Extensive
experiments on real world CQAs such as Yahoo! Answers and Stack Overflow data
demonstrate the improved efficacy of our approach over contemporary
state-of-the-art models. The QDEE framework also allows us to characterize user
expertise in novel ways by identifying interesting patterns and roles played by
different users in such CQAs.Comment: Accepted in the Proceedings of the 12th International AAAI Conference
on Web and Social Media (ICWSM 2018). June 2018. Stanford, CA, US
Organizational Chart Inference
Nowadays, to facilitate the communication and cooperation among employees, a
new family of online social networks has been adopted in many companies, which
are called the "enterprise social networks" (ESNs). ESNs can provide employees
with various professional services to help them deal with daily work issues.
Meanwhile, employees in companies are usually organized into different
hierarchies according to the relative ranks of their positions. The company
internal management structure can be outlined with the organizational chart
visually, which is normally confidential to the public out of the privacy and
security concerns. In this paper, we want to study the IOC (Inference of
Organizational Chart) problem to identify company internal organizational chart
based on the heterogeneous online ESN launched in it. IOC is very challenging
to address as, to guarantee smooth operations, the internal organizational
charts of companies need to meet certain structural requirements (about its
depth and width). To solve the IOC problem, a novel unsupervised method Create
(ChArT REcovEr) is proposed in this paper, which consists of 3 steps: (1)
social stratification of ESN users into different social classes, (2)
supervision link inference from managers to subordinates, and (3) consecutive
social classes matching to prune the redundant supervision links. Extensive
experiments conducted on real-world online ESN dataset demonstrate that Create
can perform very well in addressing the IOC problem.Comment: 10 pages, 9 figures, 1 table. The paper is accepted by KDD 201
Soccer Team Vectors
In this work we present STEVE - Soccer TEam VEctors, a principled approach
for learning real valued vectors for soccer teams where similar teams are close
to each other in the resulting vector space. STEVE only relies on freely
available information about the matches teams played in the past. These vectors
can serve as input to various machine learning tasks. Evaluating on the task of
team market value estimation, STEVE outperforms all its competitors. Moreover,
we use STEVE for similarity search and to rank soccer teams.Comment: 11 pages, 1 figure; This paper was presented at the 6th Workshop on
Machine Learning and Data Mining for Sports Analytics at ECML/PKDD 2019,
W\"urzburg, Germany, 201
Resolution of ranking hierarchies in directed networks
Identifying hierarchies and rankings of nodes in directed graphs is
fundamental in many applications such as social network analysis, biology,
economics, and finance. A recently proposed method identifies the hierarchy by
finding the ordered partition of nodes which minimises a score function, termed
agony. This function penalises the links violating the hierarchy in a way
depending on the strength of the violation. To investigate the resolution of
ranking hierarchies we introduce an ensemble of random graphs, the Ranked
Stochastic Block Model. We find that agony may fail to identify hierarchies
when the structure is not strong enough and the size of the classes is small
with respect to the whole network. We analytically characterise the resolution
threshold and we show that an iterated version of agony can partly overcome
this resolution limit.Comment: 27 pages, 9 figure
Corporate payments networks and credit risk rating
Aggregate and systemic risk in complex systems are emergent phenomena
depending on two properties: the idiosyncratic risks of the elements and the
topology of the network of interactions among them. While a significant
attention has been given to aggregate risk assessment and risk propagation once
the above two properties are given, less is known about how the risk is
distributed in the network and its relations with the topology. We study this
problem by investigating a large proprietary dataset of payments among 2.4M
Italian firms, whose credit risk rating is known. We document significant
correlations between local topological properties of a node (firm) and its
risk. Moreover we show the existence of an homophily of risk, i.e. the tendency
of firms with similar risk profile to be statistically more connected among
themselves. This effect is observed when considering both pairs of firms and
communities or hierarchies identified in the network. We leverage this
knowledge to show the predictability of the missing rating of a firm using only
the network properties of the associated node