451 research outputs found

    Statistical Inference in a Directed Network Model with Covariates

    Get PDF
    Networks are often characterized by node heterogeneity for which nodes exhibit different degrees of interaction and link homophily for which nodes sharing common features tend to associate with each other. In this paper, we propose a new directed network model to capture the former via node-specific parametrization and the latter by incorporating covariates. In particular, this model quantifies the extent of heterogeneity in terms of outgoingness and incomingness of each node by different parameters, thus allowing the number of heterogeneity parameters to be twice the number of nodes. We study the maximum likelihood estimation of the model and establish the uniform consistency and asymptotic normality of the resulting estimators. Numerical studies demonstrate our theoretical findings and a data analysis confirms the usefulness of our model.Comment: 29 pages. minor revisio

    A Local-Global LDA Model for Discovering Geographical Topics from Social Media

    Full text link
    Micro-blogging services can track users' geo-locations when users check-in their places or use geo-tagging which implicitly reveals locations. This "geo tracking" can help to find topics triggered by some events in certain regions. However, discovering such topics is very challenging because of the large amount of noisy messages (e.g. daily conversations). This paper proposes a method to model geographical topics, which can filter out irrelevant words by different weights in the local and global contexts. Our method is based on the Latent Dirichlet Allocation (LDA) model but each word is generated from either a local or a global topic distribution by its generation probabilities. We evaluated our model with data collected from Weibo, which is currently the most popular micro-blogging service for Chinese. The evaluation results demonstrate that our method outperforms other baseline methods in several metrics such as model perplexity, two kinds of entropies and KL-divergence of discovered topics

    Emoticon-based Ambivalent Expression: A Hidden Indicator for Unusual Behaviors in Weibo

    Full text link
    Recent decades have witnessed online social media being a big-data window for quantificationally testifying conventional social theories and exploring much detailed human behavioral patterns. In this paper, by tracing the emoticon use in Weibo, a group of hidden "ambivalent users" are disclosed for frequently posting ambivalent tweets containing both positive and negative emotions. Further investigation reveals that this ambivalent expression could be a novel indicator of many unusual social behaviors. For instance, ambivalent users with the female as the majority like to make a sound in midnights or at weekends. They mention their close friends frequently in ambivalent tweets, which attract more replies and thus serve as a more private communication way. Ambivalent users also respond differently to public affairs from others and demonstrate more interests in entertainment and sports events. Moreover, the sentiment shift of words adopted in ambivalent tweets is more evident than usual and exhibits a clear "negative to positive" pattern. The above observations, though being promiscuous seemingly, actually point to the self regulation of negative mood in Weibo, which could find its base from the emotion management theories in sociology but makes an interesting extension to the online environment. Finally, as an interesting corollary, ambivalent users are found connected with compulsive buyers and turn out to be perfect targets for online marketing.Comment: Data sets can be downloaded freely from www.datatang.com/data/47207 or http://pan.baidu.com/s/1mg67cbm. Any issues feel free to contact [email protected]

    Exploring demographic information in social media for product recommendation

    Get PDF
    In many e-commerce Web sites, product recommendation is essential to improve user experience and boost sales. Most existing product recommender systems rely on historical transaction records or Web-site-browsing history of consumers in order to accurately predict online users’ preferences for product recommendation. As such, they are constrained by limited information available on specific e-commerce Web sites. With the prolific use of social media platforms, it now becomes possible to extract product demographics from online product reviews and social networks built from microblogs. Moreover, users’ public profiles available on social media often reveal their demographic attributes such as age, gender, and education. In this paper, we propose to leverage the demographic information of both products and users extracted from social media for product recommendation. In specific, we frame recommendation as a learning to rank problem which takes as input the features derived from both product and user demographics. An ensemble method based on the gradient-boosting regression trees is extended to make it suitable for our recommendation task. We have conducted extensive experiments to obtain both quantitative and qualitative evaluation results. Moreover, we have also conducted a user study to gauge the performance of our proposed recommender system in a real-world deployment. All the results show that our system is more effective in generating recommendation results better matching users’ preferences than the competitive baselines

    Look behind the Censorship: Reposting-User Characterization and Muted-Topic Restoration

    Full text link
    The emergence of social media has largely eased the way people receive information and participate in public discussions. However, in countries with strict regulations on discussions in the public space, social media is no exception. To limit the degree of dissent or inhibit the spread of "harmful" information, a common approach is to impose information operations such as censorship/suspension on social media. In this paper, we focus on a study of censorship on Weibo, the counterpart of Twitter in China. Specifically, we 1) create a web-scraping pipeline and collect a large dataset solely focus on the reposts from Weibo; 2) discover the characteristics of users whose reposts contain censored information, in terms of gender, device, and account type; and 3) conduct a thematic analysis by extracting and analyzing topic information. Note that although the original posts are no longer visible, we can use comments users wrote when reposting the original post to infer the topic of the original content. We find that such efforts can recover the discussions around social events that triggered massive discussions but were later muted. Further, we show the variations of inferred topics across different user groups and time frames.Comment: Accepted for publication in Proceedings of the International Workshop on Social Sensing (SocialSens 2022): Special Edition on Belief Dynamics, 202

    Probabilistic Matching: Causal Inference under Measurement Errors

    Get PDF
    The abundance of data produced daily from large variety of sources has boosted the need of novel approaches on causal inference analysis from observational data. Observational data often contain noisy or missing entries. Moreover, causal inference studies may require unobserved high-level information which needs to be inferred from other observed attributes. In such cases, inaccuracies of the applied inference methods will result in noisy outputs. In this study, we propose a novel approach for causal inference when one or more key variables are noisy. Our method utilizes the knowledge about the uncertainty of the real values of key variables in order to reduce the bias induced by noisy measurements. We evaluate our approach in comparison with existing methods both on simulated and real scenarios and we demonstrate that our method reduces the bias and avoids false causal inference conclusions in most cases.Comment: In Proceedings of International Joint Conference Of Neural Networks (IJCNN) 201
    • …
    corecore