196 research outputs found

    Normalized entropy aggregation for inhomogeneous large-scale data

    Get PDF
    It was already in the fifties of the last century that the relationship between information theory, statistics, and maximum entropy was established, following the works of Kullback, Leibler, Lindley and Jaynes. However, the applications were restricted to very specific domains and it was not until recently that the convergence between information processing, data analysis and inference demanded the foundation of a new scientific area, commonly referred to as Info-Metrics. As huge amount of information and large-scale data have become available, the term "big data" has been used to refer to the many kinds of challenges presented in its analysis: many observations, many variables (or both), limited computational resources, different time regimes or multiple sources. In this work, we consider one particular aspect of big data analysis which is the presence of inhomogeneities, compromising the use of the classical framework in regression modelling. A new approach is proposed, based on the introduction of the concepts of info-metrics to the analysis of inhomogeneous large-scale data. The framework of information-theoretic estimation methods is presented, along with some information measures. In particular, the normalized entropy is tested in aggregation procedures and some simulation results are presented.publishe

    From Non-Paying to Premium: Predicting User Conversion in Video Games with Ensemble Learning

    Full text link
    Retaining premium players is key to the success of free-to-play games, but most of them do not start purchasing right after joining the game. By exploiting the exceptionally rich datasets recorded by modern video games--which provide information on the individual behavior of each and every player--survival analysis techniques can be used to predict what players are more likely to become paying (or even premium) users and when, both in terms of time and game level, the conversion will take place. Here we show that a traditional semi-parametric model (Cox regression), a random survival forest (RSF) technique and a method based on conditional inference survival ensembles all yield very promising results. However, the last approach has the advantage of being able to correct the inherent bias in RSF models by dividing the procedure into two steps: first selecting the best predictor to perform the splitting and then the best split point for that covariate. The proposed conditional inference survival ensembles method could be readily used in operational environments for early identification of premium players and the parts of the game that may prompt them to become paying users. Such knowledge would allow developers to induce their conversion and, more generally, to better understand the needs of their players and provide them with a personalized experience, thereby increasing their engagement and paving the way to higher monetization.Comment: social games, conversion prediction, ensemble methods, survival analysis, online games, user behavio

    What is the correct cost functional for variational data assimilation?

    Get PDF
    Variational approaches to data assimilation, and weakly constrained four dimensional variation (WC-4DVar) in particular, are important in the geosciences but also in other communities (often under different names). The cost functions and the resulting optimal trajectories may have a probabilistic interpretation, for instance by linking data assimilation with maximum aposteriori (MAP) estimation. This is possible in particular if the unknown trajectory is modelled as the solution of a stochastic differential equation (SDE), as is increasingly the case in weather forecasting and climate modelling. In this situation, the MAP estimator (or “most probable path” of the SDE) is obtained by minimising the Onsager–Machlup functional. Although this fact is well known, there seems to be some confusion in the literature, with the energy (or “least squares”) functional sometimes been claimed to yield the most probable path. The first aim of this paper is to address this confusion and show that the energy functional does not, in general, provide the most probable path. The second aim is to discuss the implications in practice. Although the mentioned results pertain to stochastic models in continuous time, they do have consequences in practice where SDE’s are approximated by discrete time schemes. It turns out that using an approximation to the SDE and calculating its most probable path does not necessarily yield a good approximation to the most probable path of the SDE proper. This suggest that even in discrete time, a version of the Onsager–Machlup functional should be used, rather than the energy functional, at least if the solution is to be interpreted as a MAP estimator

    Owl Eyes: Spotting UI Display Issues via Visual Understanding

    Full text link
    Graphical User Interface (GUI) provides a visual bridge between a software application and end users, through which they can interact with each other. With the development of technology and aesthetics, the visual effects of the GUI are more and more attracting. However, such GUI complexity posts a great challenge to the GUI implementation. According to our pilot study of crowdtesting bug reports, display issues such as text overlap, blurred screen, missing image always occur during GUI rendering on different devices due to the software or hardware compatibility. They negatively influence the app usability, resulting in poor user experience. To detect these issues, we propose a novel approach, OwlEye, based on deep learning for modelling visual information of the GUI screenshot. Therefore, OwlEye can detect GUIs with display issues and also locate the detailed region of the issue in the given GUI for guiding developers to fix the bug. We manually construct a large-scale labelled dataset with 4,470 GUI screenshots with UI display issues and develop a heuristics-based data augmentation method for boosting the performance of our OwlEye. The evaluation demonstrates that our OwlEye can achieve 85% precision and 84% recall in detecting UI display issues, and 90% accuracy in localizing these issues. We also evaluate OwlEye with popular Android apps on Google Play and F-droid, and successfully uncover 57 previously-undetected UI display issues with 26 of them being confirmed or fixed so far.Comment: Accepted to 35th IEEE/ACM International Conference on Automated Software Engineering (ASE 20

    Search for the lepton-flavor-violating decays Bs0→e±Ό∓ and B0→e±Ό∓

    Get PDF
    A search for the lepton-flavor-violating decays Bs0→e±Ό∓ and B0→e±Ό∓ is performed with a data sample, corresponding to an integrated luminosity of 1.0  fb-1 of pp collisions at √s=7  TeV, collected by the LHCb experiment. The observed number of Bs0→e±Ό∓ and B0→e±Ό∓ candidates is consistent with background expectations. Upper limits on the branching fractions of both decays are determined to be B(Bs0→e±Ό∓)101  TeV/c2 and MLQ(B0→e±Ό∓)>126  TeV/c2 at 95% C.L., and are a factor of 2 higher than the previous bounds

    Observation of the decay B+c→BÂșsπ+

    Get PDF
    The result of a search for the decay B+c→BÂșsπ+ is presented, using the BÂșs→Ds-π+ and BÂșs→J/ψϕ channels. The analysis is based on a data sample of pp collisions collected with the LHCb detector, corresponding to an integrated luminosity of 1  fb-1 taken at a center-of-mass energy of 7 TeV, and 2  fb-1 taken at 8 TeV. The decay B+c→BÂșsπ+ is observed with significance in excess of 5 standard deviations independently in both decay channels. The measured product of the ratio of cross sections and branching fraction is [σ(Bc+)/σ(BÂșs)]×B(Bc+→BÂșsπ+)=[2.37±0.31 (stat)±0.11 (syst)-0.13+0.17(τBc+)]×10-3, in the pseudorapidity range 2<η(B)<5, where the first uncertainty is statistical, the second is systematic, and the third is due to the uncertainty on the Bc+ lifetime. This is the first observation of a B meson decaying to another B meson via the weak interaction

    Branching fraction and CP asymmetry of the decays B+→K0Sπ+ and B+→K0SK+

    Get PDF
    An analysis of B+ → K0 Sπ+ and B+ → K0 S K+ decays is performed with the LHCb experiment. The pp collision data used correspond to integrated luminosities of 1 fb−1 and 2 fb−1 collected at centre-ofmass energies of √ s = 7 TeV and √ s = 8 TeV, respectively. The ratio of branching fractions and the direct CP asymmetries are measured to be B(B+ → K0 S K+ )/B(B+ → K0 Sπ+ ) = 0.064 ± 0.009 (stat.) ± 0.004 (syst.), ACP(B+ → K0 Sπ+ ) = −0.022 ± 0.025 (stat.) ± 0.010 (syst.) and ACP(B+ → K0 S K+ ) = −0.21 ± 0.14 (stat.) ± 0.01 (syst.). The data sample taken at √ s = 7 TeV is used to search for B+ c → K0 S K+ decays and results in the upper limit ( fc · B(B+ c → K0 S K+ ))/( fu · B(B+ → K0 Sπ+ )) < 5.8 × 10−2 at 90% confidence level, where fc and fu denote the hadronisation fractions of a ÂŻb quark into a B+ c or a B+ meson, respectively
    • 

    corecore