1,291 research outputs found

    Inferring the Origin Locations of Tweets with Quantitative Confidence

    Full text link
    Social Internet content plays an increasingly critical role in many domains, including public health, disaster management, and politics. However, its utility is limited by missing geographic information; for example, fewer than 1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable, content-based approach to estimate the location of tweets using a novel yet simple variant of gaussian mixture models. Further, because real-world applications depend on quantified uncertainty for such estimates, we propose novel metrics of accuracy, precision, and calibration, and we evaluate our approach accordingly. Experiments on 13 million global, comprehensively multi-lingual tweets show that our approach yields reliable, well-calibrated results competitive with previous computationally intensive methods. We also show that a relatively small number of training data are required for good estimates (roughly 30,000 tweets) and models are quite time-invariant (effective on tweets many weeks newer than the training set). Finally, we show that toponyms and languages with small geographic footprint provide the most useful location signals.Comment: 14 pages, 6 figures. Version 2: Move mathematics to appendix, 2 new references, various other presentation improvements. Version 3: Various presentation improvements, accepted at ACM CSCW 201

    Assessing the validity of location-based social media in the study of spatial processes

    Get PDF
    The advent of big spatial data has created new opportunities for studying geographic phenomena. Open mapping projects, citizen science initiatives, and location-based social media all fall under the umbrella of volunteered geographic information and are now frequently used spatial data sources. The fact that these sources are user-contributed as opposed to gathered by experts has raised significant concerns over data quality. While data accuracy, particularly in open mapping projects (e.g., OpenStreetMap), has been given considerable attention, far less has been paid to data validity, specifically on location-based social media. In this three article dissertation, I explore the validity of location-based social media in the study of spatial processes. In the first article, I implement a survey on the Oklahoma State campus to explore college students' behaviors and perceptions of location-based social media and note differences in terms of gender, race, and academic standing. The second and third articles are empirical studies utilizing geolocated data from Twitter, a popular social media platform. The second article makes use of precise location data (e.g., latitude - longitude) and uses geographically weighted regression to explore the patterns of non-English Twitter usage in Houston, Texas. The third article uses general location data (e.g., city) to explore the patterns of #BlackLivesMatter and counter-protest content across the states of Louisiana and Texas. The results of these studies collectively provide an optimistic, though cautionary, outlook on the use of location-based social media data in geography

    Spatial characteristics of Twitter users : toward the understanding of geosocial media production

    Get PDF
    Social media is a rich source of spatial data but it has also many flaws and well-known limitations, especially in regard to representation and representativeness, since very little is known about the demographics of the user population. At the same time, the use of locational services, is in fact, dependent on those characteristics. We address this gap in knowledge by exploring divides between Twitter users, based on the spatial and temporal distribution of the content they produce. We chose five cities and data from 2015 to represent different socio-spatial contexts. Users were classified according to spatial and non-spatial measures: home range estimation; standard distance; nearest neighbor index, and; proposed localness index. There are distinct groups of geosocial media producers, which suggests that such datasets cannot be treated as uniform representations. We found a positive correlation between spatial behavior and posting activity. It is suggested that there are universal patterns of behavior that are conditioned by software services-the example of Foucauldian "technologies of self". They can also represent the dominance of the most prolific users over the whole data stream. Results are discussed in the context of the importance and role of user location in social media

    Linking geosocial sensing with the socio-demographic fabric of smart cities

    Get PDF
    Technological advances have enabled new sources of geoinformation, such as geosocial media, and have supported the propagation of the concept of smart cities. This paper argues that a city cannot be smart without citizens in the loop, and that a geosocial sensor might be one component to achieve that. First, we need to better understand which facets of urban life could be detected by a geosocial sensor, and how to calibrate it. This requires replicable studies that foster longitudinal and comparative research. Consequently, this paper examines the relationship between geosocial media content and socio-demographic census data for a global city, London, at two administrative levels. It aims for a transparent study design to encourage replication, using Term Frequency—Inverse Document Frequency of keywords, rule-based and word-embedding sentiment analysis, and local cluster analysis. The findings of limited links between geosocial media content and socio-demographic characteristics support earlier critiques on the utility of geosocial media for smart city planning purposes. The paper concludes that passive listening to publicly available geosocial media, in contrast to pro-active engagement with citizens, seems of limited use to understand and improve urban quality of life

    Urban Forest Tweeting: Social Media as More-Than-Human Communication in Tokyo’s Rinshinomori Park

    Get PDF
    Urban parks are places that have significant impact on the physical and mental health of citizens, but they are also for safeguarding biodiversity and thus fostering human–nature interactions in the everyday landscape. The exploration of these spaces through social media represents a novel field of research that is contributing to revealing patterns of visitor behavior. However, there is a lack of comparable research from a non-anthropocentric perspective. What if we could use social media as a more-than-human communication medium? This research aims to reveal the possibility of communicating the urban forest’s voice through the examination of the official Twitter account of a metropolitan park in Tokyo. To this end, an analysis of the content of the messages is carried out, focusing on the narrative voice from which the message is told, the protagonists, the action performed, the network of actors deployed, and the place where it occurs. It is found that the majority of these messages are delivered from a non-human perspective, where plants, animals, or meteorological agents behave deploying complex networks of more-than-human interaction. The current study reveals the latent potential of non-humans as possible agents within the realm of social media, which can mediate the relationships between humans and their environment. It introduces a layer that can be incorporated into future lines of research, as well as provides a model case that illustrates a good practice in the management and communication of urban green spaces.This research was funded by the European Union—Next Generation EU Margarita Salas Grant and by the project LABPA-CM: CONTEMPORARY CRITERIA, METHODS and TECHNIQUES FOR LANDSCAPE KNOWLEDGE AND CONSERVATION (H2019/HUM5692), funded by the European Social Fund and the Madrid regional government

    VaxInsight: an artificial intelligence system to access large-scale public perceptions of vaccination from social media

    Get PDF
    Vaccination is considered one of the greatest public health achievements of the 20th century. A high vaccination rate is required to reduce the prevalence and incidence of vaccine-preventable diseases. However, in the last two decades, there has been a significant and increasing number of people who refuse or delay getting vaccinated and who prohibit their children from receiving vaccinations. Importantly, under-vaccination is associated with infectious disease outbreaks. A good understanding of public perceptions regarding vaccinations is important if we are to develop effective vaccination promotion strategies. Traditional methods of research, such as surveys, suffer limitations that impede our understanding of public perceptions, including resources cost, delays in data collection and analysis, especially in large samples. The popularity of social media (e.g. Twitter), combined with advances in artificial intelligence algorithms (e.g. natural language processing, deep learning), open up new avenues for accessing large scale data on public perceptions related to vaccinations. This dissertation reports on an original and systematic effort to develop artificial intelligence algorithms that will increase our ability to use Twitter discussions to understand vaccine-related perceptions and intentions. The research is framed within the perspectives offered by grounded behavior change theories. Tweets concerning the human papillomavirus (HPV) vaccine were used to accomplish three major aims: 1) Develop a deep learning-based system to better understand public perceptions of the HPV vaccine, using Twitter data and behavior change theories; 2) Develop a deep learning-based system to infer Twitter users’ demographic characteristics (e.g. gender and home location) and investigate demographic differences in public perceptions of the HPV vaccine; 3) Develop a web-based interactive visualization system to monitor real-time Twitter discussions of the HPV vaccine. For Aim 1, the bi-directional long short-term memory (LSTM) network with attention mechanism outperformed traditional machine learning and competitive deep learning algorithms in mapping Twitter discussions to the theoretical constructs of behavior change theories. Domain-specific embedding trained on HPV vaccine-related Twitter corpus by fastText algorithms further improved performance on some tasks. Time series analyses revealed evolving trends of public perceptions regarding the HPV vaccine. For Aim 2, the character-based convolutional neural network model achieved favorable state-of-the-art performance in Twitter gender inference on a Public Author Profiling challenge. The trained models then were applied to the Twitter corpus and they identified gender differences in public perceptions of the HPV vaccine. The findings on gender differences were largely consistent with previous survey-based studies. For the Twitter users’ home location inference, geo-tagging was framed as text classification tasks that resulted in a character-based recurrent neural network model. The model outperformed machine learning and deep learning baselines on home location tagging. Interstate variations in public perceptions of the HPV vaccine also were identified. For Aim 3, a prototype web-based interactive dashboard, VaxInsight, was built to synthesize HPV vaccine-related Twitter discussions in a comprehendible format. The usability test of VaxInsight showed high usability of the system. Notably, this maybe the first study to use deep learning algorithms to understand Twitter discussions of the HPV vaccine within the perspective of grounded behavior change theories. VaxInsight is also the first system that allows users to explore public health beliefs of vaccine related topics from Twitter. Thus, the present research makes original and systematical contributions to medical informatics by combining cutting-edge artificial intelligence algorithms and grounded behavior change theories. This work also builds a foundation for the next generation of real-time public health surveillance and research

    The Miami Heart Study (MiHeart) at Baptist Health South Florida, A Prospective Study of Subclinical Cardiovascular Disease and Emerging Cardiovascular Risk Factors in Asymptomatic Young and Middle-Aged Adults: The Miami Heart Study: Rationale and Design

    Get PDF
    Objective The Miami Heart Study (MiHeart) at Baptist Health South Florida is an ongoing, community-based, prospective cohort study aimed at characterizing the prevalence, characteristics, and prognostic value of diverse markers of early subclinical coronary atherosclerosis and of various potential demographic, psychosocial, and metabolic risk factors. We present the study objectives, detailed research methods, and preliminary baseline results of MiHeart. Methods MiHeart enrolled 2,459 middle-aged male and female participants from the general population of the Greater Miami Area. Enrollment occurred between May 2015 and September 2018 and was restricted to participants aged 40–65 years free of clinical cardiovascular disease (CVD). The baseline examination included assessment of demographics, lifestyles, medical history, and a detailed evaluation of psychosocial characteristics; a comprehensive physical exam; measurement of multiple blood biomarkers including measures of inflammation, advanced lipid testing, and genomics; assessment of subclinical coronary atherosclerotic plaque and vascular function using coronary computed tomography angiography, the coronary artery calcium score, carotid intima-media thickness, pulse wave velocity, and peripheral arterial tonometry; and other tests including 12-lead electrocardiography and assessment of pulmonary function. Blood samples were biobanked to facilitate future ancillary research. Results MiHeart enrolled 1,261 men (51.3%) and 1,198 women (48.7%). Mean age was 53 years, 85.6% participants were White and 47.4% were of Hispanic/Latino ethnicity. The study included 7% individuals with diabetes, 33% with hypertension, and 15% used statin therapy at baseline. Overweight or obese participants comprised 72% of the population and 3% were smokers. Median 10-year estimated atherosclerotic CVD risk using the Pooled Cohort Equations was 4%. Conclusion MiHeart will provide important, novel insights into the pathophysiology of early subclinical atherosclerosis and further our understanding of its role in the genesis of clinical CVD. The study findings will have important implications, further refining current cardiovascular prevention paradigms and risk assessment and management aproaches moving forward. KEYWORDS atherosclerosis; cardiovascular disease; cohort studies; coronary computed tomography; epidemiology; Hispanic/Latino; populations primary preventio
    • 

    corecore