104 research outputs found

    What makes the city pulse

    Get PDF
    The topics of this thesis are event detection and social network analysis in social media. Our work centres on Geo-tagged User Generated Content (UGC) in Twitter, such as Twitter data generated from the metropolitan area of Dublin Ireland over a one month period of time. In this thesis we address the problem of how to detect small scale unexpected events using UGC both in real-time and retrospectively. We proposed a language-text joint modeling algorithm to cope with the large volume and unstructured nature of UGC. We also demonstrate our discovery of interesting correlations between a Twitter user’s social communities and their mobility patterns. Finally a set of features are proposed for carrying out Twitter user’s account type classification, for the purpose of irrelevant contents filtering. This thesis includes several experimental evaluations using real data from users and shows the performance of our algorithms in event detection and provide evidence for our discoveries

    Sequential assimilation of crowdsourced social media data into a simplified flood inundation model

    Get PDF
    Flooding is the most common natural hazard worldwide. Severe floods can cause significant damage and sometimes loss of life. During a flood event, hydraulic models play an important role in forecasting and identifying potential inundated areas, where emergency responses should be deployed. Nevertheless, hydraulic models are not able to capture all of the processes in flood propagation because flood behaviour is highly dynamic and complex. Thus, there are always uncertainties associated with model simulations. As a result, near-real time observations are required to incorporate with hydraulic models to improve model forecasting skills. Crowdsourced (CS) social media data presents an opportunity for supporting urban flood management as it can provide insightful information collected by individuals in near real-time. In this thesis, approachesto maximise the impact of CS social media data (Twitter) to reduce uncertainty in flood inundation modelling (LISFLOOD-FP) through data assimilation were investigated. The developed methodologies were tested and evaluated using a real flooding case study of Phetchaburi city, Thailand. Firstly, two approaches (binary logistic regression and fuzzy logic) were developed based on Twitter metadata and spatiotemporal analysis to assess the quality of CS social media data. Both methods produced good results, but the binary logistic model was preferred as it involved less subjectivity. Next, the generalized likelihood uncertainty estimation methodology was applied to estimate model uncertainty and identify behavioural parameter ranges. Particle swarm optimisation was also carried out to calibrate for an optimum model parameter set. Following this, an ensemble Kalman filter was applied to assimilate the flood depth information extracted from the CS data into the LISFLOOD-FP simulations using various updating strategies. The findings show that the global state update suffers from inconsistency of predicted water levels due to overestimating the impact of the CS data, whereas a topography based local state update provides encouraging results as the uncertainty in model forecasts narrows, albeit for a short time period. To extend the improvement time span, a combination of state and boundary updating was further investigated to correct both water levels and model inputs, and was found to produce longer lasting improvements in terms of uncertainty reduction. Overall, the results indicate the feasibility of applying CS social media data to reduce model uncertainty in flood forecasting

    Mining microblogs for culture-awareness in web adaptation

    Get PDF
    Prior studies in sociology and human-computer interaction indicate that persons from diïŹ€erent countries and cultural origins tend to have their preferences in real-life communication and the usage of web and social media applications. With Twitter data, statistical and machine learning tools, this study advances our understand ing of microblogging in respect of cultural diïŹ€erences and demonstrates possible solutions of inferring and exploiting cultural origins for building adaptive web ap plications. Our ïŹndings reveal statistically signiïŹcant diïŹ€erences in Twitter feature usage in respect of geographic locations of users. These diïŹ€erences in microblogger behaviour and user language deïŹned in user proïŹles enabled us to infer user country origins with an accuracy of more than 90%. Other user origin predictive solutions we proposed do not require other data sources and human involvement for training the models, enabling the high accuracy of user country inference when exploiting information extracted from a user followers’ network, or with data derived from Twitter proïŹles. With origin predictive models, we analysed communication and privacy preferences and built a culture-aware recommender system. Our analysis of friend responses shows that Twitter users tend to communicate mostly within their cultural regions. Usage of privacy settings showed that privacy perceptions diïŹ€er across cultures. Finally, we created and evaluated movie recommendation strategies considering user cultural groups, and addressed a cold-start scenario with a new user. We believe that the ïŹndings discussed give insights into the sociological and web research, in particular on cultural diïŹ€erences in online communication

    TRAFFIC SPEED MODELLING TO IMPROVE TRAVEL TIME ESTIMATION IN OPENROUTESERVICE

    Get PDF
    Time-dependent traffic speed information at a street level is important for routing services to estimate accurate travel times and to recommend routes which avoid traffic congestion. Still, most open-source routing machines that use OpenStreetMap (OSM) as the primary data source rely on static driving speeds derived from OSM tags, since comprehensive traffic speed data is not openly available. In this study, a method was developed to model traffic speed by hour of day at a street level using open data from OpenStreetMap, Twitter and population data. The modelled traffic speed data was subsequently integrated into the open-source routing engine openrouteservice to improve travel time estimation in route planning. Machine learning models were trained for ten cities worldwide using traffic speed data from Uber Movement as reference data. Different indicators based on geolocation and timestamp of Twitter data as well as a geographically adapted betweeness centrality indicator were evaluated for their potential to improve prediction accuracy. In all cities, the Twitter indicators improved the model, although this effect was only visible for certain road types. The centrality indicator improved the model as well but to a lesser extent. The Google Routing API was used as reference to evaluate the accuracy in travel time estimation. Deviations in travel times were regionally different and were partly alleviated by including the raw traffic data by Uber or the modelled traffic speed data in openrouteservice

    Information credibility perception on Twitter

    Get PDF
    Information on Twitter is vast and varied. Readers must make their own judgements to determine the credibility of the great wealth of information presented on Twitter. This research aims to identify the factors that influence readers' judgements of the credibility of information on Twitter, especially news-related information. Both internal (within the Twitter platform) and external factors are studied in this research. User studies are conducted to collect readers' perceptions of the credibility of news-related tweets, Twitter features, and the impact of reader characteristics, such as a reader's demographic attributes, their personality and behaviour. Twitter readers are found to depend solely on surface tweet features in making these judgements such as the author's Twitter ID, pictures, or the number of retweets and likes, rather than the tweet's metadata as recommended in previous studies. In this study, surface features are related to cognitive heuristics. Cognitive heuristics are features that the mind uses as shortcuts for making quick evaluations such as deciding the credibility of tweets. There are three main types of cognitive heuristic features found on Twitter that readers use to determine credibility: endorsement, reputation and confirmation. This study finds that readers do not use only one single feature to make credibility judgements but rather a combination of features. External factors such as a reader's educational background and geolocation also have a significant positive correlation with their perceptions of a tweet's credibility. Readers with tertiary level education, or living in a certain location or environment, such as in a crisis or conflict area, are observed to be more careful in making credibility judgements. Readers who possess conscientiousness and openness to experience personality traits are also seen to be very cautious in their credibility judgements. Another insight provided by this research is the categorisation of readers' behaviours according to credibility perceptions on Twitter. The behavioural categorisations are defined by readers' behavioural reliance on Twitter's surface features when judging the credibility of tweets. The findings can assist social media authors in designing the surface features of their social media content in order to enhance the content's credibility. Furthermore, findings from this research can help in developing effective credibility evaluation systems by considering readers' personal characteristics

    Quantifying & characterizing information diets of social media users

    Get PDF
    An increasing number of people are relying on online social media platforms like Twitter and Facebook to consume news and information about the world around them. This change has led to a paradigm shift in the way news and information is exchanged in our society – from traditional mass media to online social media. With the changing environment, it’s essential to study the information consumption of social media users and to audit how automated algorithms (like search and recommendation systems) are modifying the information that social media users consume. In this thesis, we fulfill this high-level goal with a two-fold approach. First, we propose the concept of information diets as the composition of information produced or consumed. Next, we quantify the diversity and bias in the information diets that social media users consume via the three main consumption channels on social media platforms: (a) word of mouth channels that users curate for themselves by creating social links, (b) recommendations that platform providers give to the users, and (c) search systems that users use to find interesting information on these platforms. We measure the information diets of social media users along three different dimensions of topics, geographic sources, and political perspectives. Our work is aimed at making social media users aware of the potential biases in their consumed diets, and at encouraging the development of novel mechanisms for mitigating the effects of these biases.Immer mehr Menschen verwenden soziale Medien, z.B. Twitter und Facebook, als Quelle fĂŒr Nachrichten und Informationen aus ihrem Umfeld. Diese Entwicklung hat zu einem Paradigmenwechsel hinsichtlich der Art undWeise, wie Informationen und Nachrichten in unserer Gesellschaft ausgetauscht werden, gefĂŒhrt – weg von klassischen Massenmedien hin zu internetbasierten Sozialen Medien. Angesichts dieser verĂ€nderten (Informations-) Umwelt ist es von entscheidender Bedeutung, den Informationskonsum von Social Media-Nutzern zu untersuchen und zu prĂŒfen, wie automatisierte Algorithmen (z.B. Such- und Empfehlungssysteme) die Informationen verĂ€ndern, die Social Media- Nutzer aufnehmen. In der vorliegenden Arbeit wird diese Aufgabenstellung wie folgt angegangen: ZunĂ€chst wird das Konzept der “Information Diets” eingefĂŒhrt, das eine Zusammensetzung aus produzierten und konsumierten Social Media-Inhalten darstellt. Als nĂ€chstes werden die Vielfalt und die Verzerrung (der sogenannte “Bias”) der “Information Diets” quantifiziert die Social Media-Nutzer ĂŒber die drei hauptsĂ€chlichen Social Media- KanĂ€le konsumieren: (a) persönliche Empfehlungen und Auswahlen, die die Nutzer manuell pflegen und wodurch sie soziale Verbindungen (social links) erzeugen, (b) Empfehlungen, die dem Nutzer von der Social Media-Plattform bereitgestellt werden und (c) Suchsysteme der Plattform, die die Nutzer fĂŒr ihren Informationsbedarf verwenden. Die “Information Diets” der Social Media-Nutzer werden hierbei anhand der drei Dimensionen Themen, geographische Lage und politische Ansichten gemessen. Diese Arbeit zielt zum einen darauf ab, Social Media-Nutzer auf die möglichen Verzerrungen in ihrer “Information Diet” aufmerksam zu machen. Des Weiteren soll diese Arbeit auch dazu anregen, neuartige Mechanismen und Algorithmen zu entwickeln, um solche Verzerrungen abzuschwĂ€chen

    Maastikumeetrika ja ökosĂŒsteemi kultuuriteenused – ressursipĂ”hine integreeriv lĂ€henemine maastikuharmoonia kaardistamisele

    Get PDF
    A Thesis for applying for the degree of Doctor of Philosophy in Environmental Protection.The overall idea of PhD thesis was to explain with objective evidence and using mapping techniques, why and how people value particular visual landscapes. Mainstream mapping research usually refers to uniqueness, diversity and naturalness of landscapes as the main factors for landscape values and preferences. These variables can be easily measured using satellite imagery and cartographic materials: for example, the diversity of landscape elements can be assessed with a function of Shannon information entropy, and naturalness – as the share of relatively natural land cover within the region of interest. However, psychological background suggests other important attributes of landscape experience – harmony, unity or coherence of the scene. Mentioned aspects are usually measured subjectively with questionnaires and surveys. Measuring landscape preferences is also quite a challenging task, requiring many people involved in assessment of photographs or even having a nature trip (with obvious drawbacks in spatial coverage and replicability with other evaluators). Therefore, the PhD research was designed to make all assessments as objective, as possible. Overall landscape coherence, for the first time, was measured as the extent to which total diversity of digital landscape model (composed of landforms and land cover) exceeds the added diversity of landforms and land cover alone. In this way, coherence was directly related to system properties of landscape, making it legible and understandable. Also, for the first time colour harmony of land cover was evaluated with remotely sensed data (satellite imagery). Retrieved map-based indices were examined with geo-located photographs of landscapes and outdoor recreation, uploaded to social media, such as Flickr, VK.com and former Panoramio. The study contributes to the operationalisation of landscape beauty and, therefore, more advanced landscape management, nature protection and sustainability of land use practises.Doktoritöö eesmĂ€rk on kaardistustehnoloogiad kasutades tĂ”enduspĂ”hiselt selgitada, miks ja kuidas inimesed vÀÀrtustavad teatud maastikke visuaalsest seisukohast. Peavoolu kaardistusuuringud tavaliselt keskenduvad maastiku vÀÀrtuste ja eelistuste hindamisel unikaalsusele, mitmekesisusele ja looduslikkusele. Neid muutujaid saab satelliitpiltide ja kartograafilise materjali pĂ”hjal lihtsalt mÔÔta, nĂ€iteks maastikuelementide mitmekesisust saab hinnata Shannoni entroopiavalemiga ning looduslikkust vastava iseloomuga maakatte osakaaluga uuritaval alal. PsĂŒhholoogilisest vaatepunktist lĂ€htudes on maastikukogemusel veel teisi olulisi omadusi, nagu vaate harmoonia, ĂŒhtsus vĂ”i kooskĂ”la sidusus. Uuringute puhul mÔÔdetakse neid muutujaid tavaliselt subjektiivselt. Maastikueelistuste teaduslik hindamine on tĂ”sine metoodiline vĂ€ljakutse, mis nĂ”uab paljude hindajate osalemist nĂ€iteks maastikufotode hindamisel vĂ”i vahetult looduses, kus tuleb arvestada piirangutega ruumilisel esindatusel vĂ”i hinnangute replikatiivsusel. Arvestades eelnimetatud asjaolusid, on dissertatsiooni eesmĂ€rgiks seatud leida vĂ”imalikult objektiivseid teid tavaliselt subjektiivsetena kĂ€sitletavate maastikumuutujate hindamisel. Uudne on ĂŒldise maastiku kooskĂ”la mÔÔdetmine digitaalse pinnavorme ja maakatet hĂ”lmava maastikumudeliga, vĂ”rreldes nende komponentide eraldi mÔÔtmisega. Selliselt menetledes on koherentsus otseselt seostatav maastiku struktuursete parameetritega ja seega muudab hinnangud loetavamaks ja arusaadavamaks. Esmakordselt on kaugseire andmete (satelliitpildid) alusel hinnatud ka maakatte vĂ€rviharmooniat. MÀÀratletud kaardipĂ”hiseid indekseid kontrolliti kohtseotud fotodega maastikuvaadetest ning vĂ€lirekreatsiooni tegevustest sotsiaalmeedias (nt Flickr, VK.com ja varasem Panoramio). Uuring aitab paremini mĂ”ista ja rakendada maastiku ilu hindamise kĂ€iku ja seelĂ€bi kasutada esteetilist kvaliteeti maastiku planeerimisel ja korraldamisel, looduskaitses ja teistes sÀÀstva maakasutuse praktilistes valdkondades.Publication of this dissertation has been supported by the Estonian University of Life Science

    Potential Indirect Relationships in Productive Networks

    Get PDF
    Productive Networks, such as Social Networks Services, organize evidence about human behavior. This evidence is independent of the network content type, and may support the discovery of new relationships between users and content, or with other users. These indirect relationships are important for recommendation systems, and systems where potential relationships between users and content (e.g., locations) is relevant, such as with the emergency management domain, where the discovery of relationships between users and locations on productive networks may enable the identification of population density variations, increasing the accuracy of emergency alerts. This thesis presents a Productive Networks model, which enables the development of a methodology for indirect relationships discovery, using the metadata on the network, and avoiding the computational cost of content analysis. We designed and conducted a set of experiments to evaluate our proposals. Our results are twofold: firstly, the productive network model is sufficiently robust to represent a wide range of networks; secondly, the indirect relationship discovery methodology successfully identifies relevant relationships between users and content. We also present applications of the model and methodology in several contexts
    • 

    corecore