507 research outputs found

    Analyzing the Language of Food on Social Media

    Full text link
    We investigate the predictive power behind the language of food on social media. We collect a corpus of over three million food-related posts from Twitter and demonstrate that many latent population characteristics can be directly predicted from this data: overweight rate, diabetes rate, political leaning, and home geographical location of authors. For all tasks, our language-based models significantly outperform the majority-class baselines. Performance is further improved with more complex natural language processing, such as topic modeling. We analyze which textual features have most predictive power for these datasets, providing insight into the connections between the language of food, geographic locale, and community characteristics. Lastly, we design and implement an online system for real-time query and visualization of the dataset. Visualization tools, such as geo-referenced heatmaps, semantics-preserving wordclouds and temporal histograms, allow us to discover more complex, global patterns mirrored in the language of food.Comment: An extended abstract of this paper will appear in IEEE Big Data 201

    Online Social Network Friends and Spatio-temporal Proximity of Their Geotagged Photos – A Case Study of Flickr Data

    Get PDF
    This empirical study aims to analyze relationships between online social network (OSN) friends and spatio-temporal proximity of their geotagged photos, using Flickr data as a case study. First, this study analyzes whether Flickr friends tend to post geotagged photos that are closer to each other compared to Flickr non-friends in space and time. Second, this study investigates whether the number of geotagged photos posted by users is related to the distance and time difference between their geotagged photos. Third, this study examines the spatial distributions of geotagged photos of Flickr friends within specific distance intervals to further understand the geographic meanings of Flickr user’s geotagging activities. Findings of this study can improve our understanding of the relationship between users’ virtual friendships and their physical activities. These understandings can support future research, including location-based services, location-based OSN searches, and location-based online marketing

    Introduction to the second international symposium of platial information science

    Get PDF
    People ‘live’ and constitute places every day through recurrent practices and experience. Our everyday lives, however, are complex, and so are places. In contrast to abstract space, the way people experience places includes a range of aspects like physical setting, meaning, and emotional attachment. This inherent complexity requires researchers to investigate the concept of place from a variety of viewpoints. The formal representation of place – a major goal in GIScience related to place – is no exception and can only be successfully addressed if we consider geographical, psychological, anthropological, sociological, cognitive, and other perspectives. This year’s symposium brings together place-based researchers from different disciplines to discuss the current state of platial research. Therefore, this volume contains contributions from a range of fields including geography, psychology, cognitive science, linguistics, and cartography

    Geolocated Social Media Posts are Happier: Understanding the Characteristics of Check-in Posts on Twitter

    Full text link
    The increasing prevalence of location-sharing features on social media has enabled researchers to ground computational social science research using geolocated data, affording opportunities to study human mobility, the impact of real-world events, and more. This paper analyzes what crucially separates posts with geotags from those without. We find that users who share location are not representative of the social media user population at large, jeopardizing the generalizability of research that uses only geolocated data.We consider three aspects: affect -- sentiment and emotions, content -- textual and non-textual, and audience engagement. By comparing a dataset of 1.3 million geotagged tweets with a random dataset of the same size, we show that geotagged posts on Twitter exhibit significantly more positivity, are often about joyous and special events such as weddings or graduations, convey more collectivism rather than individualism, and contain more additional features such as hashtags or objects in images, but at the same time generate substantially less engagement. These findings suggest there exist significant differences in the messages conveyed in geotagged posts. Our research carries important implications for future research utilizing geolocation social media data.Comment: 11 pages, 10 figures, 2 table

    Twitter as an Indicator of Sports Activities in the Helsinki Metropolitan Area

    Get PDF
    Fyysinen aktiivisuus vaikuttaa vahvasti yksilön terveyteen ja hyvinvointiin. Alueellisen eriytymisen ehkäisyn ja ympäristöllisen tasa-arvon kannalta on tärkeää, että eri alueiden asukkailla on yhtäläiset mahdollisuudet harrastaa liikuntaa. Avoimesti saatavilla olevia kattavia tutkimuksia ihmisten fyysisestä aktiivisuudesta eri puolilla pääkaupunkiseutua ei juurikaan ole tehty, paikallisia liikuntabarometrejä lukuun ottamatta. Virallisten ja kattavien tietolähteiden puutteessa käyttäjien itse tuottamaa dataa, kuten sosiaalisen median dataa, voidaan mahdollisesti käyttää fyysisen aktiivisuuden arviointiin. Tässä tutkielmassa pyrin vastaamaan kysymyksiin: 1) kuinka Twitter-dataa voidaan käyttää indikaattorina liikunnallisen aktiivisuuden arviointiin, 2) miten liikunta-aiheistet twiitit ovat jakautuneet pääkaupunkiseudulla ja 3) mitkä sosio-ekonomiset tekijät selittävät twiittien lukumäärää alueella. Liikunta-aiheisten twiittien keräämiseen hyödynsin hakua urheiluun ja liikuntaan liittyvien avainsanalistojen avulla. Haetut avainsanat sisälsivät suomen-, englannin- ja vironkielisiä termejä. Tutkimuksen alueellisen luonteen takia tarvitsin geotägättyjä twiittejä, joihin on liitetty tieto paikan koordinaateista. Vain alle 1 % twiiteistä sisältää geotägin, joten hyödynsin geoparsing-tekniikkaa tuottaakseni lisää paikkaan sidottua aineistoa. Geoparsing tarkoittaa paikan nimien tunnistamista tekstistä ja niiden muuttamista koordinaateiksi. Yhdistin geotägätyt ja geoparsing-tekniikalla sijoitetut twiitit ja ryhmitin datan postinumeroalueittain. Postinumeroalueittain ryhmitetystä datasta tein spatiaalisia ja tilastollisia analyysejä mitatakseni spatiaalista autokorrelaatiota sekä korrelaatiota eri sosio-ekonomisten muuttujien kanssa. Tulokseni osoittavat, että urheilu- ja liikunta-aiheiset twiitit keskittyvät pääasiassa Helsingin keskustaan, mihin myös väestö on keskittynyt. Helsingin keskustan lisäksi on nähtävissä paikallisempia klustereita Tapiolassa, Leppävaarassa, Tikkurilassa ja Pasilassa. Twiittien urheilulajittainen tarkastelu paljastaa mailapeli- ja hiihtotwiittien keskittyneen voimakkaasti vastaavien urheilupaikkojen ympärille. Tilastoanalyysit osoittavat, että postinumeroalueen tuloilla ja koulutustasolla ei ole yhteyttä alueella havaittuun urheilutwiittien määrään. Parhaiten urheilutwiittien määrää ennustaa liikuntapaikkojen määrä, työllisyystaso ja lasten (0–14-vuotiaat) osuus väestöstä. Avaimia onnistuneeseen vastaavaan Twitter-tutkimukseen ovat geoparsing, riittävä datan määrä ja tarpeeksi hyvä kielimalli. Tämän tutkimuksen lupaavista tuloksista huolimatta Twitteriä fyysisen aktiivisuuden indikaattorina tulee tutkia lisää kartoittamalla tarkemmin sosiaalisen median sisäsyntyisiä vinoumia ennen kuin Twitter-tutkimusten tuloksia voidaan soveltaa oikean elämän ratkaisuihin.Being physically active is one of the key aspects of health. Thus, equal opportunities for exercising in different places is one important factor of environmental justice and segregation prevention. Currently, there are no openly available scientific studies about actual physical activities in different parts of the Helsinki Metropolitan Area other than sports barometers. In the lack of comprehensive official data sources, user-generated data, like social media, may be used as a proxy for measuring the levels and geographical distribution of sports activities. In this thesis, I aim to assess 1) how Twitter tweets could be used as an indicator of sports activities, 2) how the sports tweets are distributed spatially and 3) which socio-economic factors can predict the number of sports tweets. For recognizing the tweets related to sports, out of 38.5 million tweets, I used Named Entity Matching with a list of sports-related keywords in Finnish, English and Estonian. Due to the spatial nature of my study, I needed tweets that contain a geotag, meaning that the tweet is attached to coordinates that indicate a location. However, only about 1% of tweets contain a geotag, and since 2019 Twitter doesn’t support precise geotagging anymore with some exceptions. Therefore, I implemented geoparsing methods to search for location names in the text and transform them to coordinates if the mentioned place was within the study area. After that, I aggregated the posts to postal code areas and used statistical and spatial methods to measure spatial autocorrelation and correlation with different socio-economic variables to examine the spatial patterns and socio-economic factors that affect the tweeting about sports. My results show that the sports tweets are concentrated mainly in the center of Helsinki, where the population is also concentrated. The distribution of the sports tweets exhibits local clusters like Tapiola, Leppävaara, Tikkurila and Pasila besides the largest cluster in the center of Helsinki. Sports-wise mapping of the tweets reveals that for example racket sport and skiing tweets are heavily concentrated around the corresponding facilities. Statistical analyses indicate that the number of tweets per inhabitant does not correlate with the education level or the amount of average income in the postal code area. The factors that predict the number of tweets per inhabitant are number of sports facilities per inhabitant, employment, and percentage of children (0-14 years old) in the postal code area. Keys to a successful study when analyzing Twitter data are geoparsing, having enough data, and a good language model to process it. Despite the promising results of this study, Twitter as indicator of physical activity should be studied more to better understand the kind of bias it inherently has before basing real-life decisions on Twitter research

    Feasibility of estimating travel demand using geolocations of social media data

    Get PDF
    Travel demand estimation, as represented by an origin–destination (OD) matrix, is essential for urban planning and management. Compared to data typically used in travel demand estimation, the key strengths of social media data are that they are low-cost, abundant, available in real-time, and free of geographical partition. However, the data also have significant limitations: population and behavioural biases, and lack of important information such as trip purpose and social demographics. This study systematically explores the feasibility of using geolocations of Twitter data for travel demand estimation by examining the effects of data sparsity, spatial scale, sampling methods, and sample size. We show that Twitter data are suitable for modelling the overall travel demand for an average weekday but not for commuting travel demand, due to the low reliability of identifying home and workplace. Collecting more detailed, long-term individual data from user timelines for a small number of individuals produces more accurate results than short-term data for a much larger population within a region. We developed a novel approach using geotagged tweets as attraction generators as opposed to the commonly adopted trip generators. This significantly increases usable data, resulting in better representation of travel demand. This study demonstrates that Twitter can be a viable option for estimating travel demand, though careful consideration must be given to sampling method, estimation model, and sample size

    Using social media for sub-event detection during disasters

    Get PDF
    AbstractSocial media platforms have become fundamental tools for sharing information during natural disasters or catastrophic events. This paper presents SEDOM-DD (Sub-Events Detection on sOcial Media During Disasters), a new method that analyzes user posts to discover sub-events that occurred after a disaster (e.g., collapsed buildings, broken gas pipes, floods). SEDOM-DD has been evaluated with datasets of different sizes that contain real posts from social media related to different natural disasters (e.g., earthquakes, floods and hurricanes). Starting from such data, we generated synthetic datasets with different features, such as different percentages of relevant posts and/or geotagged posts. Experiments performed on both real and synthetic datasets showed that SEDOM-DD is able to identify sub-events with high accuracy. For example, with a percentage of relevant posts of 80% and geotagged posts of 15%, our method detects the sub-events and their areas with an accuracy of 85%, revealing the high accuracy and effectiveness of the proposed approach

    Event detection, tracking, and visualization in Twitter: a mention-anomaly-based approach

    Full text link
    The ever-growing number of people using Twitter makes it a valuable source of timely information. However, detecting events in Twitter is a difficult task, because tweets that report interesting events are overwhelmed by a large volume of tweets on unrelated topics. Existing methods focus on the textual content of tweets and ignore the social aspect of Twitter. In this paper we propose MABED (i.e. mention-anomaly-based event detection), a novel statistical method that relies solely on tweets and leverages the creation frequency of dynamic links (i.e. mentions) that users insert in tweets to detect significant events and estimate the magnitude of their impact over the crowd. MABED also differs from the literature in that it dynamically estimates the period of time during which each event is discussed, rather than assuming a predefined fixed duration for all events. The experiments we conducted on both English and French Twitter data show that the mention-anomaly-based approach leads to more accurate event detection and improved robustness in presence of noisy Twitter content. Qualitatively speaking, we find that MABED helps with the interpretation of detected events by providing clear textual descriptions and precise temporal descriptions. We also show how MABED can help understanding users' interest. Furthermore, we describe three visualizations designed to favor an efficient exploration of the detected events.Comment: 17 page
    corecore