3,324 research outputs found

    Mining Behavioral Patterns from Mobile Big Data

    Get PDF
    Mobile devices connected to the Internet are a ubiquitous platform that can easily record a large amount of data describing human behavior. Specifically, the data collected from mobile devices --- referred to as mobile big data reveal important social and economic information. Therefore, analyzing mobile big data is valuable for several stakeholders, ranging from smartphone manufacturers to network operators and app developers. This thesis aims to discover and understand behavioral patterns from mobile big data based on large real-world datasets. Specifically, this thesis reveals patterns from three domains: people, time, and location. First, we explore mobile big data from the people domain and propose a framework to discover users' daily activity patterns from their mobile app usage. By applying the framework to a real-world dataset consisting of 653,092 users, we successfully extract five common patterns among millions of people, including commuting, pervasive socializing, nightly entertainment, afternoon reading, and nightly socializing. Second, still from the people domain, we derive group health conditions by using their smartphone usage data. In particular, we collect mobile usage records of 452 users in North America. We then demonstrate the potential for inferring group health conditions (i.e., COVID-19 outbreak stages) by leveraging less privacy-sensitive smartphone data, including CPU usage, memory usage, and network connections. Third, we mine the behavior patterns from the time domain. We reveal the evolution of mobile app usage by conducting a longitudinal study on 1,465 users from 2012 to 2017. The results show that users' app usage significantly changes over time. However, the evolution in app-category usage and individual app usage are different in terms of popularity distribution, usage diversity, and correlations. Last, with respect to the location domain, we leverage city-scale spatiotemporal mobile app usage data to reveal urban land usage patterns. We prove the strong correlation between mobile usage behavior and location features, which brings a new angle to urban analytics.Internetiin kytketyt mobiililaitteet ovat kaikkialla läsnä oleva alusta, joka voi helposti tallentaa suuren määrän tietoja, jotka kuvaavat ihmisen käyttäytymistä. Erityisesti mobiililaitteista kerätyt tiedot, joita kutsutaan mobiiliksi massadataksi (big data), paljastavat tärkeitä sosiaalisia ja taloudellisia tietoja. Siksi mobiilin massadatan analysointi on arvokasta useille sidosryhmille älypuhelinvalmistajista verkko-operaattoreihin ja sovelluskehittäjiin. Tämän väitöskirjan tavoitteena on löytää ja ymmärtää käyttäytymismalleja mobiilista massadatasta, joka perustuu suuriin reaalimaailman tietojoukkoihin. Erityisesti tämä väitöskirja tuottaa malleja kolmelta eri alueelta: ihmisiin, aikaan ja sijaintiin liittyen. Ensinnäkin tutkimme mobiilia massadataa ihmisiin liittyen ja ehdotamme viitekehystä, jonka avulla voidaan löytää käyttäjien päivittäisiä toimintamalleja heidän mobiilisovellustensa käytön perusteella. Soveltamalla tätä viitekehystä tosielämän tietojoukkoon, joka koostuu 653 092 käyttäjästä, löysimme onnistuneesti viisi yleistä mallia miljoonien ihmisten tiedoista, joihin kuuluivat mm. tiedot työmatkoista, sosiaalisista kontakteista, yöllisestä viihteestä, iltapäivän lukemisesta ja yöllisestä seurustelusta. Toiseksi, edelleen ihmisiin liittyen, johdamme tietoja ryhmien terveysolosuhteista käyttämällä heidän älypuhelintensa käyttötietoja. Keräsimme erityisesti 452 käyttäjän mobiilikäyttötietoja Pohjois-Amerikassa. Sitten osoitamme, että on mahdollista päätellä ryhmän terveysolosuhteet (eli COVID-19-epidemiavaiheet) hyödyntämällä vähemmän yksityisyyden kannalta arkoja älypuhelintietoja, mukaan lukien suorittimen käyttö, muistin käyttö ja verkkoyhteydet. Kolmanneksi louhimme käyttäytymismalleja aikaan liittyen. Paljastamme mobiilisovellusten käytön kehityksen tekemällä pitkittäistutkimuksen 1 465 käyttäjälle vuosina 2012–2017. Tulokset osoittavat, että käyttäjien sovellusten käyttö muuttuu merkittävästi ajan myötä. Sovellusluokan käytön ja yksittäisten sovellusten käytön kehitys on kuitenkin erilainen niiden suosion jakautumisen, käytön moninaisuuden ja korrelaatioiden suhteen. Lopuksi liittyen sijaintitietoihin hyödynnämme spatiotemporaalisten mobiilisovellusten käyttötietoja suurkaupunkitasolla paljastaaksemme kaupunkien maankäyttömallit. Todistamme vahvan korrelaation mobiililaitteiden käyttöön liittyvän käyttäytymisen ja sijaintiominaisuuksien välillä, mikä tuottaa uuden näkökulman kaupunkianalytiikkaan

    Emerging technologies for learning report (volume 3)

    Get PDF

    Developing a distributed electronic health-record store for India

    Get PDF
    The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India

    Development and Applications of Similarity Measures for Spatial-Temporal Event and Setting Sequences

    Get PDF
    Similarity or distance measures between data objects are applied frequently in many fields or domains such as geography, environmental science, biology, economics, computer science, linguistics, logic, business analytics, and statistics, among others. One area where similarity measures are particularly important is in the analysis of spatiotemporal event sequences and associated environs or settings. This dissertation focuses on developing a framework of modeling, representation, and new similarity measure construction for sequences of spatiotemporal events and corresponding settings, which can be applied to different event data types and used in different areas of data science. The first core part of this dissertation presents a matrix-based spatiotemporal event sequence representation that unifies punctual and interval-based representation of events. This framework supports different event data types and provides support for data mining and sequence classification and clustering. The similarity measure is based on the modified Jaccard index with temporal order constraints and accommodates different event data types. This approach is demonstrated through simulated data examples and the performance of the similarity measures is evaluated with a k-nearest neighbor algorithm (k-NN) classification test on synthetic datasets. These similarity measures are incorporated into a clustering method and successfully demonstrate the usefulness in a case study analysis of event sequences extracted from space time series of a water quality monitoring system. This dissertation further proposes a new similarity measure for event setting sequences, which involve the space and time in which events occur. While similarity measures for spatiotemporal event sequences have been studied, the settings and setting sequences have not yet been considered. While modeling event setting sequences, spatial and temporal scales are considered to define the bounds of the setting and incorporate dynamic variables along with static variables. Using a matrix-based representation and an extended Jaccard index, new similarity measures are developed to allow for the use of all variable data types. With these similarity measures coupled with other multivariate statistical analysis approaches, results from a case study involving setting sequences and pollution event sequences associated with the same monitoring stations, support the hypothesis that more similar spatial-temporal settings or setting sequences may generate more similar events or event sequences. To test the scalability of STES similarity measure in a larger dataset and an extended application in different fields, this dissertation compares and contrasts the prospective space-time scan statistic with the STES similarity approach for identifying COVID-19 hotspots. The COVID-19 pandemic has highlighted the importance of detecting hotspots or clusters of COVID-19 to provide decision makers at various levels with better information for managing distribution of human and technical resources as the outbreak in the USA continues to grow. The prospective space-time scan statistic has been used to help identify emerging disease clusters yet results from this approach can encounter strategic limitations imposed by the spatial constraints of the scanning window. The STES-based approach adapted for this pandemic context computes the similarity of evolving normalized COVID-19 daily cases by county and clusters these to identify counties with similarly evolving COVID-19 case histories. This dissertation analyzes the spread of COVID-19 within the continental US through four periods beginning from late January 2020 using the COVID-19 datasets maintained by John Hopkins University, Center for Systems Science and Engineering (CSSE). Results of the two approaches can complement with each other and taken together can aid in tracking the progression of the pandemic. Overall, the dissertation highlights the importance of developing similarity measures for analyzing spatiotemporal event sequences and associated settings, which can be applied to different event data types and used for data mining, sequence classification, and clustering

    Leveraging Twitter data to analyze the virality of Covid-19 tweets: a text mining approach

    Get PDF
    As the novel coronavirus spreads across the world, work, pleasure, entertainment, social interactions, and meetings have shifted online. The conversations on social media have spiked, and given the uncertainties and new policies, COVID-19 remains the trending topic on all such platforms, including Twitter. This research explores the factors that affect COVID-19 content-sharing by Twitter users. The analysis was conducted using 57,000 plus tweets that mentioned COVID-19 and related keywords. The tweets were subjected to the Natural Language Processing (NLP) techniques like Topic modelling, Named Entity-Relationship, Emotion & Sentiment analysis, and Linguistic feature extraction. These methods generated features that could help explain the retweet count of the tweets. The results indicate that tweets with named entities (person, organisation, and location), expression of negative emotions (anger, disgust, fear, and sadness), reference to mental health, optimistic content, and greater length have higher chances of being shared (retweeted). On the other hand, tweets with more hashtags and user mentions are less likely to be shared

    A Survey on the Web of Things

    Get PDF
    The Web of Things (WoT) paradigm was proposed first in the late 2000s, with the idea of leveraging Web standards to interconnect all types of embedded devices. More than ten years later, the fragmentation of the IoT landscape has dramatically increased as a consequence of the exponential growth of connected devices, making interoperability one of the key issues for most IoT deployments. Contextually, many studies have demonstrated the applicability of Web technologies on IoT scenarios, while the joint efforts from the academia and the industry have led to the proposals of standard specifications for developing WoT systems. Through a systematic review of the literature, we provide a detailed illustration of the WoT paradigm for both researchers and newcomers, by reconstructing the temporal evolution of key concepts and the historical trends, providing an in-depth taxonomy of software architectures and enabling technologies of WoT deployments and, finally, discussing the maturity of WoT vertical markets. Moreover, we identify some future research directions that may open the way to further innovation on WoT systems

    Netizens’ criticism of the government’s policy of “Meme Lockdown” during the Covid-19 pandemic; in Indonesia

    Get PDF
    Indonesia was shocked by the presence of the Corona-19 virus in early 2020. Indonesian people respond to policies related to handling Covid-19 by closing access to their territory and making memes about corona. One of the interesting phenomena that occurred during the Covid-19 pandemic was the number of banners or memes posted in the alleys of human settlements in Indonesia, as a form of freedom of opinion to respond to the policies of the Indonesian Government Program in preventing the more massive spread of Covid-19. This study uses a qualitative descriptive method with the data used in this study is a language game on photo uploads in the form of memes on Instagram accounts. The selected data is adjusted to the research needs and is representative data. The purpose of this study is to describe language games with sound and semantic substitution in the Lockdown Policy Meme on the Covid-19 pandemic in Indonesia through Instagram. The results showed that in the field of phonology tended to use substitution language games, while in the field of semantics, the most widely used was homonym language games. The language game in memes during the Covid-19 Pandemic has not yet become a force affecting the policies implemented by the Indonesian government. In other words, the anxiety and uncertainty were hidden in the Corona meme only meant as a pun or humor that can make the reader smile a little and feel optimistic. This paper has implications for developing criticism of government policies via the internet as a medium of communication and for managing the balance between stability and change due to the Covid-19 pandemic in Indonesia. This paper fulfils an identified need to study how the internet as public sphere and medium to communicate about government policies in the current era

    The Healthgrid White Paper

    Get PDF
    corecore