11 research outputs found

    Collaborative Inference of Coexisting Information Diffusions

    Full text link
    Recently, \textit{diffusion history inference} has become an emerging research topic due to its great benefits for various applications, whose purpose is to reconstruct the missing histories of information diffusion traces according to incomplete observations. The existing methods, however, often focus only on single information diffusion trace, while in a real-world social network, there often coexist multiple information diffusions over the same network. In this paper, we propose a novel approach called Collaborative Inference Model (CIM) for the problem of the inference of coexisting information diffusions. By exploiting the synergism between the coexisting information diffusions, CIM holistically models multiple information diffusions as a sparse 4th-order tensor called Coexisting Diffusions Tensor (CDT) without any prior assumption of diffusion models, and collaboratively infers the histories of the coexisting information diffusions via a low-rank approximation of CDT with a fusion of heterogeneous constraints generated from additional data sources. To improve the efficiency, we further propose an optimal algorithm called Time Window based Parallel Decomposition Algorithm (TWPDA), which can speed up the inference without compromise on the accuracy by utilizing the temporal locality of information diffusions. The extensive experiments conducted on real world datasets and synthetic datasets verify the effectiveness and efficiency of CIM and TWPDA

    Comparison of home detection algorithms using smartphone GPS data

    Full text link
    Estimation of people's home locations using location-based services data from smartphones is a common task in human mobility assessment. However, commonly used home detection algorithms (HDAs) are often arbitrary and unexamined. In this study, we review existing HDAs and examine five HDAs using eight high-quality mobile phone geolocation datasets. These include four commonly used HDAs as well as an HDA proposed in this work. To make quantitative comparisons, we propose three novel metrics to assess the quality of detected home locations and test them on eight datasets across four U.S. cities. We find that all three metrics show a consistent rank of HDAs' performances, with the proposed HDA outperforming the others. We infer that the temporal and spatial continuity of the geolocation data points matters more than the overall size of the data for accurate home detection. We also find that HDAs with high (and similar) performance metrics tend to create results with better consistency and closer to common expectations. Further, the performance deteriorates with decreasing data quality of the devices, though the patterns of relative performance persist. Finally, we show how the differences in home detection can lead to substantial differences in subsequent inferences using two case studies - (i) hurricane evacuation estimation, and (ii) correlation of mobility patterns with socioeconomic status. Our work contributes to improving the transparency of large-scale human mobility assessment applications.Comment: Paper currently under review in the journal "EPJ Data Science" (ISSN: 2193-1127); Manuscript: 24 pages (including 68 references, 7 figures, 3 tables); Supplementary material document not include

    Spiteful, one-off, and kind: Predicting customer feedback behavior on Twitter

    Get PDF
    National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiativ

    Detecting Home Locations from CDR Data: Introducing Spatial Uncertainty to the State-of-the-Art

    Get PDF
    Non-continuous location traces inferred from Call Detail Records (CDR) at population scale are increasingly becoming available for research and show great potential for automated detection of meaningful places. Yet, a majority of Home Detection Algorithms (HDAs) suffer from “blind” deployment of criteria to define homes and from limited possibilities for validation. In this paper, we investigate the performance and capabilities of five popular criteria for home detection based on a very large mobile phone dataset from France (~18 million users, 6 months). Furthermore, we construct a data-driven framework to assess the spatial uncertainty related to the application of HDAs. Our findings appropriate spatial uncertainty in HDA and, in extension, for detection of meaningful places. We show how spatial uncertainties on the individuals’ level can be assessed in absence of ground truth annotation, how they relate to traditional, high-level validation practices and how they can be used to improve results for, e.g., nation-wide population estimation

    Analyzing the Overturning of Roe vs Wade on Twitter using Natural Language Processing Techniques

    Get PDF
    In 1973, the historic U.S. Supreme Court (SCOTUS) case of Roe vs. Wade provided the constitutional rightto abortion. However, on May 2, 2022, Politico magazine leaked the draft opinion on the Dobbs v. Jackson Women’s Health Organization. The leak generated a surge of users to post their opinion on the case that would eliminate abortion as a constitutional right. Then, on June 24, 2022, SCOTUS overturned Roe vs. Wade. In this thesis, we aim to investigate the public opinion and reaction towards the overturning of Roe vs. Wade. We collected 20,640,166 tweets using Twitter API for Academic Research and an open-sourced dataset published during two periods. The first period was a week before Politico magazine leaked theSCOTUS decision and the week after. The second period was a week before and over a week after theoverturning of Roe vs. Wade. Using natural language processing techniques, including sentiment analysis,emotion recognition, topic modeling, and bi-grams, we could develop insight into public opinion based onthe posted tweets. Our research investigates if there is a change in sentiment over time, a change in theemotion expressed within the text over time, and which topics are most common within the collection oftweets. The results demonstrate a significant increase on the day of the Politico leak, which showed thatmost of the tweets published on that day expressed a positive sentiment. However, in the weeks before andafter the overturning of Roe v. Wade, we witness a decrease from the beginning of the period up to the dayof the overturn. Regarding emotion recognition, there is a significant decrease in the proportion of tweetsclassified as expressing optimism. There’s also an increase in the proportion of tweets expressing anger when comparing the day of the Politico leak and the day of the overturn. The topic model we applied to thetweets published on the day of the Politico leak revealed that states’ rights and children were discussed. Using bigram of the most negative tweets, we witnessed gun control and healthcare as words that frequently occurred within the collection

    Assessing the quality of home detection from mobile phone data for official statistics

    Get PDF
    Mobile phone data are an interesting new data source for official statistics. However, multiple problems and uncertainties need to be solved before these data can inform, support or even become an integral part of statistical production processes. In this paper, we focus on arguably the most important problem hindering the application of mobile phone data in official statistics: detecting home locations. We argue that current efforts to detect home locations suffer from a blind deployment of criteria to define a place of residence and from limited validation possibilities. We support our argument by analysing the performance of five home detection algorithms (HDAs) that have been applied to a large, French, Call Detailed Record (CDR) dataset (~18 million users, 5 months). Our results show that criteria choice in HDAs influences the detection of home locations for up to about 40% of users, that HDAs perform poorly when compared with a validation dataset (the 35{\deg}-gap), and that their performance is sensitive to the time period and the duration of observation. Based on our findings and experiences, we offer several recommendations for official statistics. If adopted, our recommendations would help in ensuring a more reliable use of mobile phone data vis-\`a-vis official statistics
    corecore