    Content Spending and Network Quality in Mobile Channels: A Hidden Markov Model of User Engagement and Content Consumption

    Nowadays, individuals increasingly depend on mobile devices and apps for every aspect of their daily lives. Even though the number of mobile users and the time that they spend on mobile apps have grown tremendously, app providers are still struggling with low user engagement and a high attrition rate. This study examines how users’ prior content spending and network delay impact consumers’ mobile content consumption decisions and how the impacts vary in different engagement states. We develop a Hidden Markov Model and calibrate it using a large tapstream data set of individual users’ reading activities from a mobile app provider. We identify three engagement states and heterogeneous impacts of the financial and operational factors. This study will generate important implications on content pricing and user engagement in mobile channel

    A Stochastic Model of Plausibility in Live-Virtual-Constructive Environments

    Distributed live-virtual-constructive simulation promises a number of benefits for the test and evaluation community, including reduced costs, access to simulations of limited availability assets, the ability to conduct large-scale multi-service test events, and recapitalization of existing simulation investments. However, geographically distributed systems are subject to fundamental state consistency limitations that make assessing the data quality of live-virtual-constructive experiments difficult. This research presents a data quality model based on the notion of plausible interaction outcomes. This model explicitly accounts for the lack of absolute state consistency in distributed real-time systems and offers system designers a means of estimating data quality and fitness for purpose. Experiments with World of Warcraft player trace data validate the plausibility model and exceedance probability estimates. Additional experiments with synthetic data illustrate the model\u27s use in ensuring fitness for purpose of live-virtual-constructive simulations and estimating the quality of data obtained from live-virtual-constructive experiments

    Quality of Experience for Online Gaming and Different Traffic Scenarios

    Cilj ovog diplomskog rada jest analizirati iskustvenu kvalitetu korisnika kod umreženog igranja prilikom emulacije prometnih slučajeva koji se pojavljuju u stvarnom okruženju. Radi se o relativno novom načinu mjerenja korisničkog zadovoljstva prilikom korištenja usluge, prije su ti pokazatelji bili isključivo vezani za kvalitetu usluge (mrežni parametri). Sada se uz kvalitetu usluge uzimaju parametri vezani za samog korisnika kao što su korisnički faktori te kontekstni faktori. Kako se korisnički i kontekstni faktori individualno odražavaju za svakog korisnika, koriste se subjektivne metode za ispitivanje iskustvene. Povratni rezultati se koriste za unaprjeđenje usluge i povećavanje zadovoljstva krajnjeg korisnika. Da bi se stvorili mrežni slučajevi iz stvarnog okruženja u ovome radu koristi se program Network Emulator Client pomoću kojega se emulira šest različitih prometnih slučajeva za pet korisnika iz testne skupine. Testirana skupina za svaki slučaj ispunjava anketu od osam pitanja, te su rezultati ankete prezentirani pomoću grafikona i tablica. Iz dobivenih rezultata jasno se vidi kako koji prometni slučaj utječe na percepciju iskustvene kvalitete usluge za svakog pojedinog korisnika iz testne skupine.The purpose of this graduate thesis is to analyze user's Quality of Experience in network gaming during emulation of traffic cases which may appear in real environment. This is relatively new method of measuring user’s perception while using a service. Before this method, these indicators were only related to Quality of Service (network parameters). Now, besides the Quality of Service, there are parameters that are associated with a user like subjective and context factors. While these factors impact each user individually, subjective methods for testing Quality of Experience are used. Results are used for service improvement and increment of end user's contentment. To create real environment network cases in this thesis a program called Network Emulator Client which is used to emulate six different traffic cases for five users from the test group. The test group gave answers on eight questions for every single case in the survey. From the obtained results we can clearly see which traffic case affects on perception of the Quality of Experience for every single user from the test group

    Machine learning applications for censored data

    The amount of data being gathered has increased tremendously as many aspects of our lives are becoming increasingly digital. Data alone is not useful, because the ultimate goal is to use the data to obtain new insights and create new applications. The largest challenge of computer science has been the largest on the algorithmic front: how can we create machines that help us do useful things with the data? To address this challenge, the field of data science has emerged as the systematic and interdisciplinary study of how knowledge can be extracted from both structed and unstructured data sets. Machine learning is a subfield of data science, where the task of building predictive models from data has been automated by a general learning algorithm and high prediction accuracy is the primary goal. Many practical problems can be formulated as questions and there is often data that describes the problem. The solution therefore seems simple: formulate a data set of inputs and outputs, and then apply machine learning to these examples in order to learn to predict the outputs. However, many practical problems are such that the correct outputs are not available because it takes years to collect them. For example, if one wants to predict the total amount of money spent by different customers, in principle one has to wait until all customers have decided to stop buying to add all of the purchases together to get the answers. We say that the data is ’censored’; the correct answers are only partially available because we cannot wait potentially years to collect a data set of historical inputs and outputs. This thesis presents new applications of machine learning to censored data sets, with the goal of answering the most relevant question in each application. These applications include digital marketing, peer-to-peer lending, unemployment, and game recommendation. Our solution takes into account the censoring in the data set, where previous applications have obtained biased results or used older data sets where censoring is not a problem. The solution is based on a three stage process that combines a mathematical description of the problem with machine learning: 1) deconstruct the problem as pairwise data, 2) apply machine learning to predict the missing pairs, 3) reconstruct the correct answer from these pairs. The abstract solution is similar in all domains, but the specific machine learning model and the pairwise description of the problem depends on the application.Kerätyn datan määrä on kasvanut kun digitalisoituminen on edennyt. Itse data ei kuitenkaan ole arvokasta, vaan tavoitteena on käyttää dataa tiedon hankkimiseen ja uusissa sovelluksissa. Suurin haaste onkin menetelmäkehityksessä: miten voidaan kehittää koneita jotka osaavat käyttää dataa hyödyksi? Monien alojen yhtymäkohtaa onkin kutsuttu Datatieteeksi (Data Science). Sen tavoitteena on ymmärtää, miten tietoa voidaan systemaattisesti saada sekä strukturoiduista että strukturoimattomista datajoukoista. Koneoppiminen voidaan nähdä osana datatiedettä, kun tavoitteena on rakentaa ennustavia malleja automaattisesti datasta ns. yleiseen oppimisalgoritmiin perustuen ja menetelmän fokus on ennustustarkkuudessa. Monet käytännön ongelmat voidaan muotoilla kysymyksinä, jota kuvaamaan on kerätty dataa. Ratkaisu vaikuttaakin koneoppimisen kannalta helpolta: määritellään datajoukko syötteitä ja oikeita vastauksia, ja kun koneoppimista sovelletaan tähän datajoukkoon niin vastaus opitaan ennustamaan. Monissa käytännön ongelmissa oikeaa vastausta ei kuitenkaan ole täysin saatavilla, koska datan kerääminen voi kestää vuosia. Jos esimerkiksi halutaan ennustaa miten paljon rahaa eri asiakkaat kuluttavat elinkaarensa aikana, täytyisi periaatteessa odottaa kunnes yrityksen kaikki asiakkaat lopettavat ostosten tekemisen jotta nämä voidaan laskea yhteen lopullisen vastauksen saamiseksi. Kutsumme tämänkaltaista datajoukkoa ’sensuroiduksi’; oikeat vastaukset on havaittu vain osittain koska esimerkkien kerääminen syötteistä ja oikeista vastauksista voi kestää vuosia. Tämä väitös esittelee koneoppimisen uusia sovelluksia sensuroituihin datajoukkoihin, ja tavoitteena on vastata kaikkein tärkeimpään kysymykseen kussakin sovelluksessa. Sovelluksina ovat mm. digitaalinen markkinointi, vertaislainaus, työttömyys ja pelisuosittelu. Ratkaisu ottaa huomioon sensuroinnin, siinä missä edelliset ratkaisut ovat saaneet vääristyneitä tuloksia tai keskittyneet ratkaisemaan yksinkertaisempaa ongelmaa datajoukoissa, joissa sensurointi ei ole ongelma. Ehdottamamme ratkaisu perustuu kolmeen vaiheeseen jossa yhdistyy ongelman matemaattinen ymmärrys ja koneoppiminen: 1) ongelma dekonstruoidaan parittaisena datana 2) koneoppimista sovelletaan puuttuvien parien ennustamiseen 3) oikea vastaus rekonstruoidaan ennustetuista pareista. Abstraktilla tasolla idea on kaikissa paperissa sama, mutta jokaisessa sovelluksessa hyödynnetään sitä varten suunniteltua koneoppimismenetelmää ja parittaista kuvausta

    A Data-driven Statistical Approach to Customer Behaviour Analysis and Modelling in Online Freemium Games

    The video games industry is one of the most attractive and lucrative segments in the entertainment and digital media, with big business of more than $150 billion worldwide. A popular approach in this industry is the online freemium model, wherein the game is downloadable free of cost, while advanced and bonus content have optional charges. Monetisation is through micro payments by customers and the focus is on maintaining average revenue per user and lifetime value of players. The overall aim of this research is to develop suitable data-driven methods to gain insight about customer behaviour in online freemium games, with a view to providing recommendations for successful business in this industry.Three important aspects of user behaviour are modelled in this research - engagement, time until defection, and number of micro transactions made. A multiple logistic regression using penalised likelihood approach is found to be most suitable for modelling and demonstrates good fit and accuracy for assigning observations to engaged and non-engaged categories. Cox’s proportional hazards model is adopted to analyse time to defection, and a negative binomial zero-inflated model results in the best fit to the data on micro payments. Cluster analysis techniques are used to classify the wide variety of customers based on their gameplay styles, and social network models are developed to identify prominent ‘actors’ based on social interactions. Some of the significant predictors of engagement and monetisation are amount of premium in-game currency, success in missions and competency in virtual fights, and quantity of virtual resources used in the game.This research offers extensive insight into what drives the reputation, virality and commercial viability of freemium games. In particular it helps to fill a gap in understanding the behaviour of online game players by demonstrating the effectiveness of applying a data analytic approach. It gives more insight into the determinants of player behaviour than relying on observational studies or those based on survey research. Additionally, it refines statistical models and demonstrates their implementation in R to new and complex data types representing online customer behaviours