37 research outputs found

    Analysis of Home Location Estimation with Iteration on Twitter Following Relationship

    Full text link
    User's home locations are used by numerous social media applications, such as social media analysis. However, since the user's home location is not generally open to the public, many researchers have been attempting to develop a more accurate home location estimation. A social network that expresses relationships between users is used to estimate the users' home locations. The network-based home location estimation method with iteration, which propagates the estimated locations, is used to estimate more users' home locations. In this study, we analyze the function of network-based home location estimation with iteration while using the social network based on following relationships on Twitter. The results indicate that the function that selects the most frequent location among the friends' location has the best accuracy. Our analysis also shows that the 88% of users, who are in the social network based on following relationships, has at least one correct home location within one-hop (friends and friends of friends). According to this characteristic of the social network, we indicate that twice is sufficient for iteration.Comment: The 2016 International Conference on Advanced Informatics: Concepts, Theory and Application (ICAICTA2016

    Determine the User Country of a Tweet

    Get PDF
    In the widely used message platform Twitter, about 2% of the tweets contains the geographical location through exact GPS coordinates (latitude and longitude). Knowing the location of a tweet is useful for many data analytics questions. This research is looking at the determination of a location for tweets that do not contain GPS coordinates. An accuracy of 82% was achieved using a Naive Bayes model trained on features such as the users' timezone, the user's language, and the parsed user location. The classifier performs well on active Twitter countries such as the Netherlands and United Kingdom. An analysis of errors made by the classifier shows that mistakes were made due to limited information and shared properties between countries such as shared timezone. A feature analysis was performed in order to see the effect of different features. The features timezone and parsed user location were the most informative features.Comment: CTIT Technical Report, University of Twent

    Understanding Citizen Reactions and Ebola-Related Information Propagation on Social Media

    Full text link
    In severe outbreaks such as Ebola, bird flu and SARS, people share news, and their thoughts and responses regarding the outbreaks on social media. Understanding how people perceive the severe outbreaks, what their responses are, and what factors affect these responses become important. In this paper, we conduct a comprehensive study of understanding and mining the spread of Ebola-related information on social media. In particular, we (i) conduct a large-scale data-driven analysis of geotagged social media messages to understand citizen reactions regarding Ebola; (ii) build information propagation models which measure locality of information; and (iii) analyze spatial, temporal and social properties of Ebola-related information. Our work provides new insights into Ebola outbreak by understanding citizen reactions and topic-based information propagation, as well as providing a foundation for analysis and response of future public health crises.Comment: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016

    Data Quality Challenges in Twitter Content Analysis for Informing Policy Making in Health Care

    Get PDF
    Social media platforms and microblogs have become popular fora where the general public expresses opinions and concerns on a variety of matters. As a result, private and public organizations have been looking into ways for finding, understanding and communicating insights extracted from this massive amount of text-based interconnected data. There are, however, important difficulties associated with the noisiness and reliability of the content that hinder the analysis of the data. This paper reports the main challenges found in a real-world experience with social media used as a source of data to support policy making and assessment. We also propose a set of strategies for the precise retrieval of data, the profiling of social media users, and the involvement of policy makers in the analytical process

    Inferring the Origin Locations of Tweets with Quantitative Confidence

    Full text link
    Social Internet content plays an increasingly critical role in many domains, including public health, disaster management, and politics. However, its utility is limited by missing geographic information; for example, fewer than 1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable, content-based approach to estimate the location of tweets using a novel yet simple variant of gaussian mixture models. Further, because real-world applications depend on quantified uncertainty for such estimates, we propose novel metrics of accuracy, precision, and calibration, and we evaluate our approach accordingly. Experiments on 13 million global, comprehensively multi-lingual tweets show that our approach yields reliable, well-calibrated results competitive with previous computationally intensive methods. We also show that a relatively small number of training data are required for good estimates (roughly 30,000 tweets) and models are quite time-invariant (effective on tweets many weeks newer than the training set). Finally, we show that toponyms and languages with small geographic footprint provide the most useful location signals.Comment: 14 pages, 6 figures. Version 2: Move mathematics to appendix, 2 new references, various other presentation improvements. Version 3: Various presentation improvements, accepted at ACM CSCW 201

    An Analysis on the Spatial Characteristics of Satisfaction on the Residential Environment Using Tweets

    Get PDF
    The purpose of this study is to analyze the regional difference of spatial distribution of residential satisfaction by extracting the elements of residential satisfaction in the text of tweet data. We determined three themes such as “safety”, “amenity” and “convenience”, base search terms by theme. And we detailed the search terms by base search term in order to retrieve the tweets related to the satisfaction of residential environments. We analyzed the selected tweets and visualized the results of analysis on the map and then investigated the satisfaction of residential environments through the index analysis which was a proportion of tweet ratio of theme to whole tweet ratio by region This study shows that it may replace the offline survey method by the analysis of tweets on SNS in investigating the satisfaction of residential environments by regions in South Korea
    corecore