1,300 research outputs found

    Distributed Improved Deep Prediction for Recommender System using an Ensemble Learning

    Get PDF
    If online businesses possess valuable interest for suggesting their items by scoring them, then digital advertising gains their profits depending on their promotions or marketing task. Web users cannot be certain that the products handled via big-data recommendation are either advanced or interesting to their needs. In recent decades, recommender system models have been widely used to analyses large quantities of information. Amongst, a Distributed Improved Prediction with Matrix Factorization (MF) and Random Forest (RF) called DIPMF model exploits individual’s desires, choices and social context together for predicting the ratings of a particular item. But, the RF scheme needs high computation power and time for learning process. Also, its outcome was influenced by the training parameters. Hence this article proposes a Distributed Improved Deep Prediction with MF and ensemble learning (DIDPMF) model is proposed to decrease the computational difficulty of RF learning and increasing the efficiency of rating prediction. In this DIDPMF, a forest attribute extractor is ensemble with the Deep Neural Network (fDNN) for extracting the sparse attribute correlations from an extremely large attribute space. So, incorporating RF over DNN has the ability to provide prediction outcomes from all its base trainers instead of a single estimated possibility rate. This fDNN encompasses forest module and DNN module. The forest module is employed as an attribute extractor to extract the sparse representations from the given raw input data with the supervision of learning outcomes. First, independent decision trees are constructed and then ensemble those trees to obtain the forest. After, this forest is fed to the DNN module which acts as a learner to predict the individual’s ratings with the aid of novel attribute representations. Finally, the experimental results reveal that the DIDPMF outperforms than the other conventional recommender systems

    CASPR: Customer Activity Sequence-based Prediction and Representation

    Full text link
    Tasks critical to enterprise profitability, such as customer churn prediction, fraudulent account detection or customer lifetime value estimation, are often tackled by models trained on features engineered from customer data in tabular format. Application-specific feature engineering adds development, operationalization and maintenance costs over time. Recent advances in representation learning present an opportunity to simplify and generalize feature engineering across applications. When applying these advancements to tabular data researchers deal with data heterogeneity, variations in customer engagement history or the sheer volume of enterprise datasets. In this paper, we propose a novel approach to encode tabular data containing customer transactions, purchase history and other interactions into a generic representation of a customer's association with the business. We then evaluate these embeddings as features to train multiple models spanning a variety of applications. CASPR, Customer Activity Sequence-based Prediction and Representation, applies Transformer architecture to encode activity sequences to improve model performance and avoid bespoke feature engineering across applications. Our experiments at scale validate CASPR for both small and large enterprise applications.Comment: Presented at the Table Representation Learning Workshop, NeurIPS 2022, New Orleans. Authors listed in random orde

    Leveraging Deep-learning and Field Experiment Response Heterogeneity to Enhance Customer Targeting Effectiveness

    Get PDF
    Firms seek to better understand heterogeneity in the customer response to marketing campaigns, which can boost customer targeting effectiveness. Motivated by the success of modern machine learning techniques, this paper presents a framework that leverages deep-learning algorithms and field experiment response heterogeneity to enhance customer targeting effectiveness. We recommend firms run a pilot randomized experiment and use the data to train various deep-learning models. By incorporating recurrent neural nets and deep perceptron nets, our optimal deep-learning model can capture both temporal and network effects in the purchase history, after addressing the common issues in most predictive models such as imbalanced training, data sparsity, temporality, and scalability. We then apply the learned optimal model to identify customer targets from the large amount of remaining customers with the highest predicted purchase probabilities. Our application with a large department store on a total of 2.8 million customers supports that optimal deep-learning models can identify higher-value customer targets and lead to better sales performance of marketing campaigns, compared to industry common practices of targeting by past purchase frequency or spending amount. We demonstrate that companies may achieve sub-optimal customer targeting not because they offer inferior campaign incentives, but because they leverage worse targeting rules and select low-value customer targets. The results inform managers that beyond gauging the causal impact of marketing interventions, data from field experiments can also be leveraged to identify high-value customer targets. Overall, deep-learning algorithms can be integrated with field experiment response heterogeneity to improve the effectiveness of targeted campaigns

    Appropriate Machine Learning Algorithm for Big Data Processing

    Get PDF
    MLlib is Spark’s library of machine learning functions developed to operate in parallel on clusters. MLlib comprises of different types of learning algorithms and is available from all of Spark’s programming languages. Machine Learning is important to data scientists with a machine learning background considering using Spark, as well as engineers working with a machine learning professionals. A lot of algorithms in MLlib function better in terms of forecasting precision with regularization when that choice is accessible. Again, a lot of the SGDbased algorithms demand around 100 iterations to obtain good outcome. The paper presents the types of algorithms on distributed data sets, indicating all data as RDDs and recommends one which is more appropriate and effective for huge data processing. An assessment will be made based on their strength and weakness on the number of machine learning algorithms and come out with one which is effective for big data processing. The appropriate and effective machine learning algorithm is HashingTF as it takes the hash code of each word modulo a desired vector size, S, and thus maps each word to a number between 0 and S–1. This always provides an S-dimensional vector, and in practice is quite robust even if multiple words map to the same hash code. The MLlib inventors recommend setting S between 2 HashingTF can run either on one document at a time or on a whole RDD. It demands each “document” to be represented as an iterable order of objects for example, a list in Python or a Collection in Java
    • …
    corecore