11 research outputs found
Collaborative Inference of Coexisting Information Diffusions
Recently, \textit{diffusion history inference} has become an emerging
research topic due to its great benefits for various applications, whose
purpose is to reconstruct the missing histories of information diffusion traces
according to incomplete observations. The existing methods, however, often
focus only on single information diffusion trace, while in a real-world social
network, there often coexist multiple information diffusions over the same
network. In this paper, we propose a novel approach called Collaborative
Inference Model (CIM) for the problem of the inference of coexisting
information diffusions. By exploiting the synergism between the coexisting
information diffusions, CIM holistically models multiple information diffusions
as a sparse 4th-order tensor called Coexisting Diffusions Tensor (CDT) without
any prior assumption of diffusion models, and collaboratively infers the
histories of the coexisting information diffusions via a low-rank approximation
of CDT with a fusion of heterogeneous constraints generated from additional
data sources. To improve the efficiency, we further propose an optimal
algorithm called Time Window based Parallel Decomposition Algorithm (TWPDA),
which can speed up the inference without compromise on the accuracy by
utilizing the temporal locality of information diffusions. The extensive
experiments conducted on real world datasets and synthetic datasets verify the
effectiveness and efficiency of CIM and TWPDA
Comparison of home detection algorithms using smartphone GPS data
Estimation of people's home locations using location-based services data from
smartphones is a common task in human mobility assessment. However, commonly
used home detection algorithms (HDAs) are often arbitrary and unexamined. In
this study, we review existing HDAs and examine five HDAs using eight
high-quality mobile phone geolocation datasets. These include four commonly
used HDAs as well as an HDA proposed in this work. To make quantitative
comparisons, we propose three novel metrics to assess the quality of detected
home locations and test them on eight datasets across four U.S. cities. We find
that all three metrics show a consistent rank of HDAs' performances, with the
proposed HDA outperforming the others. We infer that the temporal and spatial
continuity of the geolocation data points matters more than the overall size of
the data for accurate home detection. We also find that HDAs with high (and
similar) performance metrics tend to create results with better consistency and
closer to common expectations. Further, the performance deteriorates with
decreasing data quality of the devices, though the patterns of relative
performance persist. Finally, we show how the differences in home detection can
lead to substantial differences in subsequent inferences using two case studies
- (i) hurricane evacuation estimation, and (ii) correlation of mobility
patterns with socioeconomic status. Our work contributes to improving the
transparency of large-scale human mobility assessment applications.Comment: Paper currently under review in the journal "EPJ Data Science" (ISSN:
2193-1127); Manuscript: 24 pages (including 68 references, 7 figures, 3
tables); Supplementary material document not include
Spiteful, one-off, and kind: Predicting customer feedback behavior on Twitter
National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiativ
Detecting Home Locations from CDR Data: Introducing Spatial Uncertainty to the State-of-the-Art
Non-continuous location traces inferred from Call Detail Records (CDR) at population scale are increasingly becoming available for research and show great potential for automated detection of meaningful places. Yet, a majority of Home Detection Algorithms (HDAs) suffer from “blind” deployment of criteria to define homes and from limited possibilities for validation. In this paper, we investigate the performance and capabilities of five popular criteria for home detection based on a very large mobile phone dataset from France (~18 million users, 6 months). Furthermore, we construct a data-driven framework to assess the spatial uncertainty related to the application of HDAs. Our findings appropriate spatial uncertainty in HDA and, in extension, for detection of meaningful places. We show how spatial uncertainties on the individuals’ level can be assessed in absence of ground truth annotation, how they relate to traditional, high-level validation practices and how they can be used to improve results for, e.g., nation-wide population estimation
Analyzing the Overturning of Roe vs Wade on Twitter using Natural Language Processing Techniques
In 1973, the historic U.S. Supreme Court (SCOTUS) case of Roe vs. Wade provided the constitutional rightto abortion. However, on May 2, 2022, Politico magazine leaked the draft opinion on the Dobbs v. Jackson Women’s Health Organization. The leak generated a surge of users to post their opinion on the case that would eliminate abortion as a constitutional right. Then, on June 24, 2022, SCOTUS overturned Roe vs. Wade. In this thesis, we aim to investigate the public opinion and reaction towards the overturning of Roe vs. Wade. We collected 20,640,166 tweets using Twitter API for Academic Research and an open-sourced dataset published during two periods. The first period was a week before Politico magazine leaked theSCOTUS decision and the week after. The second period was a week before and over a week after theoverturning of Roe vs. Wade. Using natural language processing techniques, including sentiment analysis,emotion recognition, topic modeling, and bi-grams, we could develop insight into public opinion based onthe posted tweets. Our research investigates if there is a change in sentiment over time, a change in theemotion expressed within the text over time, and which topics are most common within the collection oftweets. The results demonstrate a significant increase on the day of the Politico leak, which showed thatmost of the tweets published on that day expressed a positive sentiment. However, in the weeks before andafter the overturning of Roe v. Wade, we witness a decrease from the beginning of the period up to the dayof the overturn. Regarding emotion recognition, there is a significant decrease in the proportion of tweetsclassified as expressing optimism. There’s also an increase in the proportion of tweets expressing anger when comparing the day of the Politico leak and the day of the overturn. The topic model we applied to thetweets published on the day of the Politico leak revealed that states’ rights and children were discussed. Using bigram of the most negative tweets, we witnessed gun control and healthcare as words that frequently occurred within the collection
Assessing the quality of home detection from mobile phone data for official statistics
Mobile phone data are an interesting new data source for official statistics. However, multiple problems and uncertainties need to be solved before these data can inform, support or even become an integral part of statistical production processes. In this paper, we focus on arguably the most important problem hindering the application of mobile phone data in official statistics: detecting home locations. We argue that current efforts to detect home locations suffer from a blind deployment of criteria to define a place of residence and from limited validation possibilities. We support our argument by analysing the performance of five home detection algorithms (HDAs) that have been applied to a large, French, Call Detailed Record (CDR) dataset (~18 million users, 5 months). Our results show that criteria choice in HDAs influences the detection of home locations for up to about 40% of users, that HDAs perform poorly when compared with a validation dataset (the 35{\deg}-gap), and that their performance is sensitive to the time period and the duration of observation. Based on our findings and experiences, we offer several recommendations for official statistics. If adopted, our recommendations would help in ensuring a more reliable use of mobile phone data vis-\`a-vis official statistics