    Diurnal patterns in Twitter sentiment in Italy and United Kingdom are correlated

    Diurnal variations in indicators of emotion have been reliably observed in Twitter content, but confirmation of their circadian nature has not been possible due to the many confounding factors present in the data. We report on correlations between those indicators in Twitter content obtained from 9 cities of Italy and 54 cities in the United Kingdom, sampled hourly at the time of the 2020 national lockdowns. This experimental setting aims at minimizing synchronization effects related to television, eating habits, or other cultural factors. This correlation supports a circadian origin for these diurnal variations, although it does not exclude the possibility that similar zeitgebers exist in both countries including during lockdowns

    Towards Real-Time, Country-Level Location Classification of Worldwide Tweets

    In contrast to much previous work that has focused on location classification of tweets restricted to a specific country, here we undertake the task in a broader context by classifying global tweets at the country level, which is so far unexplored in a real-time scenario. We analyse the extent to which a tweet's country of origin can be determined by making use of eight tweet-inherent features for classification. Furthermore, we use two datasets, collected a year apart from each other, to analyse the extent to which a model trained from historical tweets can still be leveraged for classification of new tweets. With classification experiments on all 217 countries in our datasets, as well as on the top 25 countries, we offer some insights into the best use of tweet-inherent features for an accurate country-level classification of tweets. We find that the use of a single feature, such as the use of tweet content alone -- the most widely used feature in previous work -- leaves much to be desired. Choosing an appropriate combination of both tweet content and metadata can actually lead to substantial improvements of between 20\% and 50\%. We observe that tweet content, the user's self-reported location and the user's real name, all of which are inherent in a tweet and available in a real-time scenario, are particularly useful to determine the country of origin. We also experiment on the applicability of a model trained on historical tweets to classify new tweets, finding that the choice of a particular combination of features whose utility does not fade over time can actually lead to comparable performance, avoiding the need to retrain. However, the difficulty of achieving accurate classification increases slightly for countries with multiple commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE

    Modeling Twitter Engagement in Real-World Events

    Twitter offers tremendous opportunities for people to engage with real-world events (e.g., political election) through information sharing and communicating about these events. However, little is understood about the factors that affect people’s Twitter engagement (e.g., posting) in such real-world events. This paper examines multiple predictive factors associated with four different perspectives of users’ Twitter engagement, and quantify their potential influence on predicting the (i) presence; and (ii) degree of the user’s engagement with real-world events. We find that the measures of people’s prior Twitter activities, topical interests, geolocation, and social network structures are all variously correlated to their engagement with real-world events.

    Analysis of circadian rhythms from online communities of individuals with affective disorders

    The circadian system regulates 24 hour rhythms in biological creatures. It impacts mood regulation. The disruptions of circadian rhythms cause destabilization in individuals with affective disorders, such as depression and bipolar disorders. Previous work has examined the role of the circadian system on effects of light interactions on mood-related systems, the effects of light manipulation on brain, the impact of chronic stress on rhythms. However, such studies have been conducted in small, preselected populations. The deluge of data is now changing the landscape of research practice. The unprecedented growth of social media data allows one to study individual behavior across large and diverse populations. In particular, individuals with affective disorders from online communities have not been examined rigorously. In this paper, we aim to use social media as a sensor to identify circadian patterns for individuals with affective disorders in online communities.We use a large scale study cohort of data collecting from online affective disorder communities. We analyze changes in hourly, daily, weekly and seasonal affect of these clinical groups in contrast with control groups of general communities. By comparing the behaviors between the clinical groups and the control groups, our findings show that individuals with affective disorders show a significant distinction in their circadian rhythms across the online activity. The results shed light on the potential of using social media for identifying diurnal individual variation in affective state, providing key indicators and risk factors for noninvasive wellbeing monitoring and prediction

    Uncovering population dynamics using mobile phone data : the case of Helsinki Metropolitan Area

    Get PDF
    Understanding the whereabouts of people in time and space is necessary for unraveling how our societies function. Regardless, our understanding of human presence is predominantly based on static residential population data, which is often outdated and excludes certain population groups, such as commuters or tourists. In the light of development towards 24-hour societies and the needs for promoting sustainable and equitable urban planning, reliable data of population dynamics are needed. To this end, ubiquitous mobile phones provide an attractive source for estimating the spatiotemporal digital footprints of people. In this study, I set out to investigate 1) the feasibility of three different aggregated network-based mobile phone data – the number of voice calls, data transmission and general network connection attempts – as a proxy for human presence, 2) how does the population distribution vary in Helsinki Metropolitan Area over the course of a regular weekday and 3) the role of temporally-sensitive population data when analysing dynamic accessibility to grocery stores and transport hubs. To my best knowledge, this is the first attempt when mobile phone data is used to reveal population dynamics for scientific purposes in Finland. Mobile phone data collected by the mobile network operator Elisa in 2017–2018 and ancillary data about land cover, buildings and a time use survey were used to estimate the 24-hour population distribution of the Helsinki Metropolitan Area. The mobile phone data were allocated to statistical 250 m x 250 m grid cells using an advanced dasymetric interpolation method and validated against population register data from Statistics Finland. The resulting 24-hour population was used to map the pulse of the city and to introduce the first fully dynamic accessibility model in the study area. The results show that data use is a good proxy for people and outperforms voice calls or overall network connection attempts. During daytime, the static population overestimates the population in residential areas and underestimates the population in work and service areas. In general, the 24-hour population reveals the pulse of a city, which is highlighted especially in the inner city of Helsinki, where the relative share of population of the study area increases by 50 % from the share at night-time to its peak at noon. The results of the case study suggest that integrating dynamic population data to location-based accessibility analysis provides more realistic results compared to static population data, but the significance of dynamic population data depends on the study context and research questions. In summary, aggregated network-driven mobile phone data is a feasible alternative for dynamic population modelling, however, different mobile phone data types vary in representativeness, which should be taken into account when using mobile phone data in research. To this end, critical evaluation of data and transparent data description are essential. Overall, understanding 24-hour societies and supporting sustainable urban planning necessitates dynamic population data, but advancements in data policy and availability are needed to harvest these possibilities. The results of this study also provide new empirical insights of the population dynamics in the study area, which can be used to advance planning and decision making.Ymmärrys väestön alueellisen jakautumisen ajallisesta vaihtelusta on keskeistä yhteiskuntamme toiminnan ymmärtämiseksi. Tästä huolimatta ymmärrys ihmisten läsnäolosta on vähäistä ja perustuu pääasiassa staattisiin asuinpaikkakohtaisiin väestötietoihin, jotka ovat usein vanhentuneita ja saattavat johtaa eräiden väestöryhmien, kuten työmatkalaisten tai turistien, sivuuttamiseen. Kehityksen kohti ympärivuorokautista yhteiskuntaa ja kestävän ja tasa-arvoisen kaupunkisuunnittelun edistämisen tarpeiden valossa tarvitaan luotettavia tietoja väestön dynamiikasta. Tässä tutkimuksessa tarkastelin 1) kolmen eri verkkopohjaisen matkapuhelinaineiston – puheluiden, tiedonsiirtoyhteyksien ja verkkoyhteyksien muodostusyritysten lukumäärän – soveltuvuutta ihmisen läsnäolon kuvaajana, 2) miten väestöjakauma vaihtelee pääkaupunkiseudulla säännöllisen arkipäivän aikana ja 3) temporaalisten väestötietojen käytön roolia saavutettavuusmallinnuksessa tarkasteltaessa ruokakauppojen ja liikenteen solmukohtien saavutettavuutta joukkoliikenteellä. Parhaan tietämykseni mukaan tämä on ensimmäinen kerta, kun matkapuhelinaineistoja käytetään väestön dynamiikan tarkasteluun tieteellisiin tarkoituksiin Suomessa. Matkapuhelinoperaattori Elisan keräämiä matkapuhelinaineistoja (2017–2018) sekä aineistoja maankäytöstä, rakennuksista ja ajankäyttötutkimuksen tuloksia käytettiin pääkaupunkiseudun 24 tunnin väestöjakauman arvioimiseen. Matkapuhelimen tiedot allokoitiin 250 m x 250 m tilastoruutuihin käyttäen edistynyttä dasymetristä interpolointimenetelmää ja validoitiin Tilastokeskuksen väestörekisteritietoja käyttäen. Tuloksena saatua 24 tunnin väestöaineistoa käytettiin kaupungin pulssin analysointiin ja ensimmäisen täysin dynaamisen saavutettavuusmallin toteuttamiseen tutkimusalueella. Tutkimuksen tulokset osoittavat, että matkapuhelinten tiedonsiirto on hyvä kuvaaja ihmisten sijainnille ja parempi kuin puhelut tai verkkoyhteyksien muodostusyritykset. Päivän aikana staattinen väestöaineisto yliarvioi väestöä erityisesti asuinalueilla samalla aliarvioiden väestöä alueilla, joilla on työpaikka- tai palvelukeskittymiä. Yleisesti katsottuna 24 tunnin väestö paljastaa kaupungin pulssin, mikä korostuu erityisesti Helsingin keskustassa, jossa tutkimusalueen väestön suhteellinen osuus kasvaa 50 %:lla yöstä sen huippuun keskipäivällä. Tapaustutkimuksen tulokset havainnollistavat kuinka dynaamisen väestötietojen integroiminen sijaintipohjaiseen saavutettavuustarkasteluun tarjoaa realistisempia tuloksia verrattuna staattiseen väestöaineistoon, mutta dynaamisten väestötietojen integroimisen merkitys riippuu tutkimuksen kontekstista ja tutkimuskysymyksistä. Yhteenvetona voidaan todeta, että aggregoitu verkkopohjainen matkapuhelinaineisto on hyvä vaihtoehto dynaamisen väestön mallintamiseen, mutta soveltuvuus vaihtelee aineistojen välillä, mikä on tärkeä huomioida käytettäessä matkapuhelinaineistoja tutkimuksessa. Tätä vasten aineiston kriittinen tarkastelu ja läpinäkyvä aineiston dokumentointi on olennaista. Kaiken kaikkiaan 24 tunnin yhteiskuntien ymmärtäminen ja kestävän kaupunkisuunnittelun tukeminen edellyttävät dynaamisia väestötietoja, mutta tietopolitiikan ja aineistojen saatavuuden edistäminen on välttämätöntä tämän toteutumiseksi. Tämä työ tarjoaa myös uutta empiiristä tietoa väestön dynamiikasta pääkaupunkiseudulla, jota voidaan käyttää suunnittelun ja päätöksenteon tukena