196 research outputs found
Normalized entropy aggregation for inhomogeneous large-scale data
It was already in the fifties of the last century that the relationship between information theory, statistics, and maximum entropy was established, following the works of Kullback, Leibler, Lindley and Jaynes. However, the applications were restricted to very specific domains and it was not until recently that the convergence between information processing, data analysis and inference demanded the foundation of a new scientific area, commonly referred to as Info-Metrics. As huge amount of information and large-scale data have become available, the term "big data" has been used to refer to the many kinds of challenges presented in its analysis: many observations, many variables (or both), limited computational resources, different time regimes or multiple sources. In this work, we consider one particular aspect of big data analysis which is the presence of inhomogeneities, compromising the use of the classical framework in regression modelling. A new approach is proposed, based on the introduction of the concepts of info-metrics to the analysis of inhomogeneous large-scale data. The framework of information-theoretic estimation methods is presented, along with some information measures. In particular, the normalized entropy is tested in aggregation procedures and some simulation results are presented.publishe
From Non-Paying to Premium: Predicting User Conversion in Video Games with Ensemble Learning
Retaining premium players is key to the success of free-to-play games, but
most of them do not start purchasing right after joining the game. By
exploiting the exceptionally rich datasets recorded by modern video
games--which provide information on the individual behavior of each and every
player--survival analysis techniques can be used to predict what players are
more likely to become paying (or even premium) users and when, both in terms of
time and game level, the conversion will take place. Here we show that a
traditional semi-parametric model (Cox regression), a random survival forest
(RSF) technique and a method based on conditional inference survival ensembles
all yield very promising results. However, the last approach has the advantage
of being able to correct the inherent bias in RSF models by dividing the
procedure into two steps: first selecting the best predictor to perform the
splitting and then the best split point for that covariate. The proposed
conditional inference survival ensembles method could be readily used in
operational environments for early identification of premium players and the
parts of the game that may prompt them to become paying users. Such knowledge
would allow developers to induce their conversion and, more generally, to
better understand the needs of their players and provide them with a
personalized experience, thereby increasing their engagement and paving the way
to higher monetization.Comment: social games, conversion prediction, ensemble methods, survival
analysis, online games, user behavio
What is the correct cost functional for variational data assimilation?
Variational approaches to data assimilation, and weakly constrained four dimensional variation (WC-4DVar) in particular, are important in the geosciences but also in other communities (often under different names). The cost functions and the resulting optimal trajectories may have a probabilistic interpretation, for instance by linking data assimilation with maximum aposteriori (MAP) estimation. This is possible in particular if the unknown trajectory is modelled as the solution of a stochastic differential equation (SDE), as is increasingly the case in weather forecasting and climate modelling. In this situation, the MAP estimator (or âmost probable pathâ of the SDE) is obtained by minimising the OnsagerâMachlup functional. Although this fact is well known, there seems to be some confusion in the literature, with the energy (or âleast squaresâ) functional sometimes been claimed to yield the most probable path. The first aim of this paper is to address this confusion and show that the energy functional does not, in general, provide the most probable path. The second aim is to discuss the implications in practice. Although the mentioned results pertain to stochastic models in continuous time, they do have consequences in practice where SDEâs are approximated by discrete time schemes. It turns out that using an approximation to the SDE and calculating its most probable path does not necessarily yield a good approximation to the most probable path of the SDE proper. This suggest that even in discrete time, a version of the OnsagerâMachlup functional should be used, rather than the energy functional, at least if the solution is to be interpreted as a MAP estimator
Owl Eyes: Spotting UI Display Issues via Visual Understanding
Graphical User Interface (GUI) provides a visual bridge between a software
application and end users, through which they can interact with each other.
With the development of technology and aesthetics, the visual effects of the
GUI are more and more attracting. However, such GUI complexity posts a great
challenge to the GUI implementation. According to our pilot study of
crowdtesting bug reports, display issues such as text overlap, blurred screen,
missing image always occur during GUI rendering on different devices due to the
software or hardware compatibility. They negatively influence the app
usability, resulting in poor user experience. To detect these issues, we
propose a novel approach, OwlEye, based on deep learning for modelling visual
information of the GUI screenshot. Therefore, OwlEye can detect GUIs with
display issues and also locate the detailed region of the issue in the given
GUI for guiding developers to fix the bug. We manually construct a large-scale
labelled dataset with 4,470 GUI screenshots with UI display issues and develop
a heuristics-based data augmentation method for boosting the performance of our
OwlEye. The evaluation demonstrates that our OwlEye can achieve 85% precision
and 84% recall in detecting UI display issues, and 90% accuracy in localizing
these issues. We also evaluate OwlEye with popular Android apps on Google Play
and F-droid, and successfully uncover 57 previously-undetected UI display
issues with 26 of them being confirmed or fixed so far.Comment: Accepted to 35th IEEE/ACM International Conference on Automated
Software Engineering (ASE 20
Search for the lepton-flavor-violating decays Bs0âe±Όâ and B0âe±Όâ
A search for the lepton-flavor-violating decays Bs0âe±Όâ and B0âe±Όâ is performed with a data sample, corresponding to an integrated luminosity of 1.0ââfb-1 of pp collisions at âs=7ââTeV, collected by the LHCb experiment. The observed number of Bs0âe±Όâ and B0âe±Όâ candidates is consistent with background expectations. Upper limits on the branching fractions of both decays are determined to be B(Bs0âe±Όâ)101ââTeV/c2 and MLQ(B0âe±Όâ)>126ââTeV/c2 at 95% C.L., and are a factor of 2 higher than the previous bounds
Observation of the decay B+câBÂșsÏ+
The result of a search for the decay B+câBÂșsÏ+ is presented, using the BÂșsâDs-Ï+ and BÂșsâJ/ÏÏ channels. The analysis is based on a data sample of pp collisions collected with the LHCb detector, corresponding to an integrated luminosity of 1ââfb-1 taken at a center-of-mass energy of 7 TeV, and 2ââfb-1 taken at 8 TeV. The decay B+câBÂșsÏ+ is observed with significance in excess of 5 standard deviations independently in both decay channels. The measured product of the ratio of cross sections and branching fraction is [Ï(Bc+)/Ï(BÂșs)]ĂB(Bc+âBÂșsÏ+)=[2.37±0.31â(stat)±0.11â(syst)-0.13+0.17(ÏBc+)]Ă10-3, in the pseudorapidity range 2<η(B)<5, where the first uncertainty is statistical, the second is systematic, and the third is due to the uncertainty on the Bc+ lifetime. This is the first observation of a B meson decaying to another B meson via the weak interaction
Branching fraction and CP asymmetry of the decays B+âK0SÏ+ and B+âK0SK+
An analysis of B+ â K0
SÏ+ and B+ â K0
S K+ decays is performed with the LHCb experiment. The pp
collision data used correspond to integrated luminosities of 1 fbâ1 and 2 fbâ1 collected at centre-ofmass
energies of
â
s = 7 TeV and
â
s = 8 TeV, respectively. The ratio of branching fractions and the
direct CP asymmetries are measured to be B(B+ â K0
S K+
)/B(B+ â K0
SÏ+
) = 0.064 ± 0.009 (stat.) ±
0.004 (syst.), ACP(B+ â K0
SÏ+
) = â0.022 ± 0.025 (stat.) ± 0.010 (syst.) and ACP(B+ â K0
S K+
) =
â0.21 ± 0.14 (stat.) ± 0.01 (syst.). The data sample taken at
â
s = 7 TeV is used to search for
B+
c
â K0
S K+ decays and results in the upper limit ( fc · B(B+
c
â K0
S K+
))/( fu · B(B+ â K0
SÏ+
)) <
5.8 Ă 10â2 at 90% confidence level, where fc and fu denote the hadronisation fractions of a ÂŻb
quark
into a B+
c or a B+ meson, respectively
- âŠ