126,984 research outputs found

    Enhancing the selection of a model-based clustering with external qualitative variables

    Get PDF
    In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a model and a number of clusters which both fit the data well and take advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion

    Forecasting Player Behavioral Data and Simulating in-Game Events

    Full text link
    Understanding player behavior is fundamental in game data science. Video games evolve as players interact with the game, so being able to foresee player experience would help to ensure a successful game development. In particular, game developers need to evaluate beforehand the impact of in-game events. Simulation optimization of these events is crucial to increase player engagement and maximize monetization. We present an experimental analysis of several methods to forecast game-related variables, with two main aims: to obtain accurate predictions of in-app purchases and playtime in an operational production environment, and to perform simulations of in-game events in order to maximize sales and playtime. Our ultimate purpose is to take a step towards the data-driven development of games. The results suggest that, even though the performance of traditional approaches such as ARIMA is still better, the outcomes of state-of-the-art techniques like deep learning are promising. Deep learning comes up as a well-suited general model that could be used to forecast a variety of time series with different dynamic behaviors

    Convergence of economic growth in Russian megacities

    Get PDF
    Purpose: The article presents the results of an empirical analysis of the economic growth of Russian cities with a population of over 1 million people (megacities). Design/Methodology/Approach: The analyzed indicator is the city product calculated according to the UN methodology for the period from 2010 to 2016. The paper analyses the process of β- and σ-convergence across Russian megacities using methods of spatial econometrics in addition to the traditional β-convergence techniques from the neoclassical theoretical framework. Findings: The dynamics of the coefficient of variation confirmed the presence of σ-convergence in city product. Empirically, positive spatial autocorrelation has been confirmed. Beta-convergence for Russian megacities is found to be significant and the spatial location of megacities significantly affects β-convergence. Control factors such as fixed capital investment per capita in 2010, average retail volume per capita in 2010, average annual number of employees of enterprises and organizations in 2010 and the dummy variable introduced for “federal cities” Moscow and St. Petersburg are all found to have positive and statistically significant impact on economic growth. Practical Implications: Policymakers may take the results into account under the planning of economical strategies for megacities and regions in Russia in order to facilitate the regional economic growth and the speed of convergence. Originality/Value: The main contribution of the study is the consideration of the economical growth for the megacities and not for the regions as it often used to be the case in similar studies. The important finding is that megacities‘ economies do converge and the influence of control factors is pronounced.peer-reviewe

    Beyond subjective and objective in statistics

    Full text link
    We argue that the words "objectivity" and "subjectivity" in statistics discourse are used in a mostly unhelpful way, and we propose to replace each of them with broader collections of attributes, with objectivity replaced by transparency, consensus, impartiality, and correspondence to observable reality, and subjectivity replaced by awareness of multiple perspectives and context dependence. The advantage of these reformulations is that the replacement terms do not oppose each other. Instead of debating over whether a given statistical method is subjective or objective (or normatively debating the relative merits of subjectivity and objectivity in statistical practice), we can recognize desirable attributes such as transparency and acknowledgment of multiple perspectives as complementary goals. We demonstrate the implications of our proposal with recent applied examples from pharmacology, election polling, and socioeconomic stratification.Comment: 35 page

    Non-Gaussian statistics of pencil beam surveys

    Full text link
    We study the effect of the non-Gaussian clustering of galaxies on the statistics of pencil beam surveys. We find that the higher order moments of the galaxy distribution play an important role in the probability distribution for the power spectrum peaks. Taking into account the observed values for the kurtosis of galaxy distribution we derive the general probability distribution for the power spectrum modes in non-Gaussian models and show that the probability to obtain the 128\hm periodicity found in pencil beam surveys is raised by roughly one order of magnitude. The non-Gaussianity of the galaxy distribution is however still insufficient to explain the reported peak-to-noise ratio of the periodicity, so that extra power on large scales seems required.Comment: 9 pages,2 figs available on request,Latex, revised version with significant changes, preprint Fermilab-Pub-94-043-

    Analysis of the evolution of the Spanish labour market through unsupervised learning

    Get PDF
    Unemployment in Spain is one of the biggest concerns of its inhabitants. Its unemployment rate is the second highest in the European Union, and in the second quarter of 2018 there is a 15.2% unemployment rate, some 3.4 million unemployed. Construction is one of the activity sectors that have suffered the most from the economic crisis. In addition, the economic crisis affected in different ways to the labour market in terms of occupation level or location. The aim of this paper is to discover how the labour market is organised taking into account the jobs that workers get during two periods: 2011-2013, which corresponds to the economic crisis period, and 2014-2016, which was a period of economic recovery. The data used are official records of the Spanish administration corresponding to 1.9 and 2.4 million job placements, respectively. The labour market was analysed by applying unsupervised machine learning techniques to obtain a clear and structured information on the employment generation process and the underlying labour mobility. We have applied two clustering methods with two different technologies, and the results indicate that there were some movements in the Spanish labour market which have changed the physiognomy of some of the jobs. The analysis reveals the changes in the labour market: the crisis forces greater geographical mobility and favours the subsequent emergence of new job sources. Nevertheless, there still exist some clusters that remain stable despite the crisis. We may conclude that we have achieved a characterisation of some important groups of workers in Spain. The methodology used, being supported by Big Data techniques, would serve to analyse any alternative job market.Ministerio de Economía y Competitividad TIN2014-55894-C2-R y TIN2017-88209-C2-2-R, CO2017-8678
    corecore