451 research outputs found
Statistical Inference in a Directed Network Model with Covariates
Networks are often characterized by node heterogeneity for which nodes
exhibit different degrees of interaction and link homophily for which nodes
sharing common features tend to associate with each other. In this paper, we
propose a new directed network model to capture the former via node-specific
parametrization and the latter by incorporating covariates. In particular, this
model quantifies the extent of heterogeneity in terms of outgoingness and
incomingness of each node by different parameters, thus allowing the number of
heterogeneity parameters to be twice the number of nodes. We study the maximum
likelihood estimation of the model and establish the uniform consistency and
asymptotic normality of the resulting estimators. Numerical studies demonstrate
our theoretical findings and a data analysis confirms the usefulness of our
model.Comment: 29 pages. minor revisio
A Local-Global LDA Model for Discovering Geographical Topics from Social Media
Micro-blogging services can track users' geo-locations when users check-in
their places or use geo-tagging which implicitly reveals locations. This "geo
tracking" can help to find topics triggered by some events in certain regions.
However, discovering such topics is very challenging because of the large
amount of noisy messages (e.g. daily conversations). This paper proposes a
method to model geographical topics, which can filter out irrelevant words by
different weights in the local and global contexts. Our method is based on the
Latent Dirichlet Allocation (LDA) model but each word is generated from either
a local or a global topic distribution by its generation probabilities. We
evaluated our model with data collected from Weibo, which is currently the most
popular micro-blogging service for Chinese. The evaluation results demonstrate
that our method outperforms other baseline methods in several metrics such as
model perplexity, two kinds of entropies and KL-divergence of discovered
topics
Emoticon-based Ambivalent Expression: A Hidden Indicator for Unusual Behaviors in Weibo
Recent decades have witnessed online social media being a big-data window for
quantificationally testifying conventional social theories and exploring much
detailed human behavioral patterns. In this paper, by tracing the emoticon use
in Weibo, a group of hidden "ambivalent users" are disclosed for frequently
posting ambivalent tweets containing both positive and negative emotions.
Further investigation reveals that this ambivalent expression could be a novel
indicator of many unusual social behaviors. For instance, ambivalent users with
the female as the majority like to make a sound in midnights or at weekends.
They mention their close friends frequently in ambivalent tweets, which attract
more replies and thus serve as a more private communication way. Ambivalent
users also respond differently to public affairs from others and demonstrate
more interests in entertainment and sports events. Moreover, the sentiment
shift of words adopted in ambivalent tweets is more evident than usual and
exhibits a clear "negative to positive" pattern. The above observations, though
being promiscuous seemingly, actually point to the self regulation of negative
mood in Weibo, which could find its base from the emotion management theories
in sociology but makes an interesting extension to the online environment.
Finally, as an interesting corollary, ambivalent users are found connected with
compulsive buyers and turn out to be perfect targets for online marketing.Comment: Data sets can be downloaded freely from www.datatang.com/data/47207
or http://pan.baidu.com/s/1mg67cbm. Any issues feel free to contact
[email protected]
Exploring demographic information in social media for product recommendation
In many e-commerce Web sites, product recommendation is essential to improve user experience and boost sales. Most existing product recommender systems rely on historical transaction records or Web-site-browsing history of consumers in order to accurately predict online users’ preferences for product recommendation. As such, they are constrained by limited information available on specific e-commerce Web sites. With the prolific use of social media platforms, it now becomes possible to extract product demographics from online product reviews and social networks built from microblogs. Moreover, users’ public profiles available on social media often reveal their demographic attributes such as age, gender, and education. In this paper, we propose to leverage the demographic information of both products and users extracted from social media for product recommendation. In specific, we frame recommendation as a learning to rank problem which takes as input the features derived from both product and user demographics. An ensemble method based on the gradient-boosting regression trees is extended to make it suitable for our recommendation task. We have conducted extensive experiments to obtain both quantitative and qualitative evaluation results. Moreover, we have also conducted a user study to gauge the performance of our proposed recommender system in a real-world deployment. All the results show that our system is more effective in generating recommendation results better matching users’ preferences than the competitive baselines
Look behind the Censorship: Reposting-User Characterization and Muted-Topic Restoration
The emergence of social media has largely eased the way people receive
information and participate in public discussions. However, in countries with
strict regulations on discussions in the public space, social media is no
exception. To limit the degree of dissent or inhibit the spread of "harmful"
information, a common approach is to impose information operations such as
censorship/suspension on social media. In this paper, we focus on a study of
censorship on Weibo, the counterpart of Twitter in China. Specifically, we 1)
create a web-scraping pipeline and collect a large dataset solely focus on the
reposts from Weibo; 2) discover the characteristics of users whose reposts
contain censored information, in terms of gender, device, and account type; and
3) conduct a thematic analysis by extracting and analyzing topic information.
Note that although the original posts are no longer visible, we can use
comments users wrote when reposting the original post to infer the topic of the
original content. We find that such efforts can recover the discussions around
social events that triggered massive discussions but were later muted. Further,
we show the variations of inferred topics across different user groups and time
frames.Comment: Accepted for publication in Proceedings of the International Workshop
on Social Sensing (SocialSens 2022): Special Edition on Belief Dynamics, 202
Probabilistic Matching: Causal Inference under Measurement Errors
The abundance of data produced daily from large variety of sources has
boosted the need of novel approaches on causal inference analysis from
observational data. Observational data often contain noisy or missing entries.
Moreover, causal inference studies may require unobserved high-level
information which needs to be inferred from other observed attributes. In such
cases, inaccuracies of the applied inference methods will result in noisy
outputs. In this study, we propose a novel approach for causal inference when
one or more key variables are noisy. Our method utilizes the knowledge about
the uncertainty of the real values of key variables in order to reduce the bias
induced by noisy measurements. We evaluate our approach in comparison with
existing methods both on simulated and real scenarios and we demonstrate that
our method reduces the bias and avoids false causal inference conclusions in
most cases.Comment: In Proceedings of International Joint Conference Of Neural Networks
(IJCNN) 201
- …