181 research outputs found

    Comparison of Imputation Methods for Univariate Time Series

    Get PDF
    In order to predict and forecast with greater accuracy, handling “missing values” in “time series” information is crucial. Complete and accurate historical data are essential. There are many research studies on multivariate time series imputation, however due to the lack of associated factors, imputation in univariate time series data is rarely taken into consideration. It is natural that “missing values” could arise because almost all scientific disciplines that collect, store, and monitor data use "time series" observations. Therefore, time series characteristics must be considered in order to develop an effective and acceptable method for dealing with missing data. This work uses the statistical package R to assess and measure the effectiveness of imputation methods in the context of "univariate time series" data. The “imputation algorithms” explored are evaluated using “root mean square error”, “mean absolute error” and “mean absolute percent error”. Four types of “time series” are taken into consideration. According to experimental findings, “seasonal decomposition” performs better on the time series having seasonality characteristic, followed by “linear interpolation”, and “kalman smoothing” provides values that are more similar to the original time series data set and have lower error rates than other imputation techniques

    Analysis of the Impact of Performance on Apps Retention

    Get PDF
    The non-stopping expansion of mobile technologies has produced the swift increase of smartphones with higher computational power, and sophisticated sensing and communication capabilities have provided the foundations to develop apps on the move with PC-like functionality. Indeed, nowadays apps are almost everywhere, and their number has increased exponentially with Apple AppStore, Google Play and other mobile app marketplaces offering millions of apps to users. In this scenario, it is common to find several apps providing similar functionalities to users. However, only a fraction of these applications has a long-term survival rate in app stores. Retention is a metric widely used to quantify the lifespan of mobile apps. Higher app retention corresponds to higher adoption and level of engagement. While existing scientific studies have analysed mobile users' behaviour and support the existence of factors that influence apps retention, the quantification about how do these factors affect long-term usage is still missing. In this thesis, we contribute to these studies quantifying and modelling one of the critical factors that affect app retention: performance. We deepen the analysis of performance based on two key-related variables: network connectivity and battery consumption. The analysis is performed by combining two large-scale crowdsensed datasets. The first includes measurements about network quality and the second about app usage and energy consumption. Our results show the benefits of data fusion to introduce richer contexts impossible of being discovered when analysing data sources individually. We also demonstrate that, indeed, high variations of these variables together and individually affect the likelihood of long-term app usage. But also, that retention is regulated by what users consider reasonable standards of performance, meaning that the improvement of latency and energy consumption does not guarantee higher retention. To provide further insights, we develop a model to predict retention using performance-related variables. Its accuracy in the results allows generalising the effect of performance in long-term usage across categories, locations and moderating variables

    Tortoise or Hare? : Quantifying the Effects of Performance on Mobile App Retention

    Get PDF
    We contribute by quantifying the effect of network latency and battery consumption on mobile app performance and retention, i.e., user’s decisions to continue or stop using apps. We perform our analysis by fusing two large-scale crowdsensed datasets collected by piggybacking on information captured by mobile apps. We find that app performance has an impact in its retention rate. Our results demonstrate that high energy consumption and high latency decrease the likelihood of retaining an app. Conversely, we show that reducing latency or energy consumption does not guarantee higher likelihood of retention as long as they are within reasonable standards of performance. However, we also demonstrate that what is considered reasonable depends on what users have been accustomed to, with device and network characteristics, and app category playing a role. As our second contribution, we develop a model for predicting retention based on performance metrics. We demonstrate the benefits of our model through empirical benchmarks which show that our model not only predicts retention accurately, but generalizes well across application categories, locations and other factors moderating the effect of performance.Peer reviewe

    Addressing training data sparsity and interpretability challenges in AI based cellular networks

    Get PDF
    To meet the diverse and stringent communication requirements for emerging networks use cases, zero-touch arti cial intelligence (AI) based deep automation in cellular networks is envisioned. However, the full potential of AI in cellular networks remains hindered by two key challenges: (i) training data is not as freely available in cellular networks as in other fields where AI has made a profound impact and (ii) current AI models tend to have black box behavior making operators reluctant to entrust the operation of multibillion mission critical networks to a black box AI engine, which allow little insights and discovery of relationships between the configuration and optimization parameters and key performance indicators. This dissertation systematically addresses and proposes solutions to these two key problems faced by emerging networks. A framework towards addressing the training data sparsity challenge in cellular networks is developed, that can assist network operators and researchers in choosing the optimal data enrichment technique for different network scenarios, based on the available information. The framework encompasses classical interpolation techniques, like inverse distance weighted and kriging to more advanced ML-based methods, like transfer learning and generative adversarial networks, several new techniques, such as matrix completion theory and leveraging different types of network geometries, and simulators and testbeds, among others. The proposed framework will lead to more accurate ML models, that rely on sufficient amount of representative training data. Moreover, solutions are proposed to address the data sparsity challenge specifically in Minimization of drive test (MDT) based automation approaches. MDT allows coverage to be estimated at the base station by exploiting measurement reports gathered by the user equipment without the need for drive tests. Thus, MDT is a key enabling feature for data and artificial intelligence driven autonomous operation and optimization in current and emerging cellular networks. However, to date, the utility of MDT feature remains thwarted by issues such as sparsity of user reports and user positioning inaccuracy. For the first time, this dissertation reveals the existence of an optimal bin width for coverage estimation in the presence of inaccurate user positioning, scarcity of user reports and quantization error. The presented framework can enable network operators to configure the bin size for given positioning accuracy and user density that results in the most accurate MDT based coverage estimation. The lack of interpretability in AI-enabled networks is addressed by proposing a first of its kind novel neural network architecture leveraging analytical modeling, domain knowledge, big data and machine learning to turn black box machine learning models into more interpretable models. The proposed approach combines analytical modeling and domain knowledge to custom design machine learning models with the aim of moving towards interpretable machine learning models, that not only require a lesser training time, but can also deal with issues such as sparsity of training data and determination of model hyperparameters. The approach is tested using both simulated data and real data and results show that the proposed approach outperforms existing mathematical models, while also remaining interpretable when compared with black-box ML models. Thus, the proposed approach can be used to derive better mathematical models of complex systems. The findings from this dissertation can help solve the challenges in emerging AI-based cellular networks and thus aid in their design, operation and optimization

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Fast transport simulations with higher-fidelity surrogate models for ITER

    Full text link
    A fast and accurate turbulence transport model based on quasilinear gyrokinetics is developed. The model consists of a set of neural networks trained on a bespoke quasilinear GENE dataset, with a saturation rule calibrated to dedicated nonlinear simulations. The resultant neural network is approximately eight orders of magnitude faster than the original GENE quasilinear calculations. ITER predictions with the new model project a fusion gain in line with ITER targets. While the dataset is currently limited to the ITER baseline regime, this approach illustrates a pathway to develop reduced-order turbulence models both faster and more accurate than the current state-of-the-art
    • …
    corecore