4,560 research outputs found

    Statistics in the Big Data era

    Get PDF
    It is estimated that about 90% of the currently available data have been produced over the last two years. Of these, only 0.5% is effectively analysed and used. However, this data can be a great wealth, the oil of 21st century, when analysed with the right approach. In this article, we illustrate some specificities of these data and the great interest that they can represent in many fields. Then we consider some challenges to statistical analysis that emerge from their analysis, suggesting some strategies

    Predicting B2B Customer Churn for Software Maintenance Contracts

    Get PDF
    Customer churn prediction is a well-known application of machine learning and data mining in Customer Relationship Management, which allows a company to predict the probability of its customer churning. In this study, we extended the application of customer churn prediction to the context of software maintenance contract. In addition, we examined the predictive power of economic factors. Random forest, gradient boosting machine, stacking of random forest and gradient boosting machine, XGBoost, and long short-term memory networks were applied. While an ensemble model and XGBoost performed best, macroeconomic variables did not yield statistically significant improvement in any prediction

    Maximizing Welfare in Social Networks under a Utility Driven Influence Diffusion Model

    Full text link
    Motivated by applications such as viral marketing, the problem of influence maximization (IM) has been extensively studied in the literature. The goal is to select a small number of users to adopt an item such that it results in a large cascade of adoptions by others. Existing works have three key limitations. (1) They do not account for economic considerations of a user in buying/adopting items. (2) Most studies on multiple items focus on competition, with complementary items receiving limited attention. (3) For the network owner, maximizing social welfare is important to ensure customer loyalty, which is not addressed in prior work in the IM literature. In this paper, we address all three limitations and propose a novel model called UIC that combines utility-driven item adoption with influence propagation over networks. Focusing on the mutually complementary setting, we formulate the problem of social welfare maximization in this novel setting. We show that while the objective function is neither submodular nor supermodular, surprisingly a simple greedy allocation algorithm achieves a factor of (1−1/e−ϵ)(1-1/e-\epsilon) of the optimum expected social welfare. We develop \textsf{bundleGRD}, a scalable version of this approximation algorithm, and demonstrate, with comprehensive experiments on real and synthetic datasets, that it significantly outperforms all baselines.Comment: 33 page

    Web Query Reformulation via Joint Modeling of Latent Topic Dependency and Term Context

    Get PDF
    An important way to improve users’ satisfaction in Web search is to assist them by issuing more effective queries. One such approach is query reformulation, which generates new queries according to the current query issued by users. A common procedure for conducting reformulation is to generate some candidate queries first, then a scoring method is employed to assess these candidates. Currently, most of the existing methods are context based. They rely heavily on the context relation of terms in the history queries and cannot detect and maintain the semantic consistency of queries. In this article, we propose a graphical model to score queries. The proposed model exploits a latent topic space, which is automatically derived from the query log, to detect semantic dependency of terms in a query and dependency among topics. Meanwhile, the graphical model also captures the term context in the history query by skip-bigram and n-gram language models. In addition, our model can be easily extended to consider users’ history search interests when we conduct query reformulation for different users. In the task of candidate query generation, we investigate a social tagging data resource—Delicious bookmark—to generate addition and substitution patterns that are employed as supplements to the patterns generated from query log data
    • …
    corecore