236 research outputs found

    Personalized marketing campaign for upselling using predictive modeling in the health insurance sector

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsNowadays, with the oversupply of several different solutions in the private Health Insurance sector and the constantly increasing demand for value for money services from the client’s perspective, it becomes clear that Insurance Companies shouldn’t only strive for excellence but also engage their client base by offering solutions that are more suitable to their needs. This project aims, using the power that predictive models can provide, to predict the existing Health Insurance clients who are willing to move in a higher tier product. The case presented above could be described under the term of upselling. The final model will be used for a personalized marketing campaign in one of the most prominent bancassurances in Portugal. At the moment the ongoing upselling campaign, uses only few eligibility criteria. The outcome of the model has as a goal to assign a probability to each client who is eligible to be contacted for this campaign. The data that were retrieved to train the model, had a buffer period of one week from when the ‘event’ took place. This is crucial for the business, because there is always the time-to-market parameter which should be taken into consideration in the real world. The tools that were used for completing this Data Mining project were mostly SAS Enterprise Guide and SAS Enterprise Miner. All the Data Marts that were needed for the particular project, were built and loaded in SAS, so there were no obstacles or connectivity issues. For data visualization and reporting, Microsoft PowerBI was used. Some of the tables in the Data Marts, are being updated in a daily and other in a monthly basis. Of course, all the historical information is being stored in separate tables, so there is no information loss or discrepancies. Finally, the methodology that was followed for the implementation of the Data Mining project was a hybrid framework between the SEMMA approach as it is the one that is proposed by SAS Institute to carry out the core tasks of model development and CRISP-DM

    Advances in SCA and RF-DNA Fingerprinting Through Enhanced Linear Regression Attacks and Application of Random Forest Classifiers

    Get PDF
    Radio Frequency (RF) emissions from electronic devices expose security vulnerabilities that can be used by an attacker to extract otherwise unobtainable information. Two realms of study were investigated here, including the exploitation of 1) unintentional RF emissions in the field of Side Channel Analysis (SCA), and 2) intentional RF emissions from physical devices in the field of RF-Distinct Native Attribute (RF-DNA) fingerprinting. Statistical analysis on the linear model fit to measured SCA data in Linear Regression Attacks (LRA) improved performance, achieving 98% success rate for AES key-byte identification from unintentional emissions. However, the presence of non-Gaussian noise required the use of a non-parametric classifier to further improve key guessing attacks. RndF based profiling attacks were successful in very high dimensional data sets, correctly guessing all 16 bytes of the AES key with a 50,000 variable dataset. With variable reduction, Random Forest still outperformed Template Attack for this data set, requiring fewer traces and achieving higher success rates with lower misclassification rate. Finally, the use of a RndF classifier is examined for intentional RF emissions from ZigBee devices to enhance security using RF-DNA fingerprinting. RndF outperformed parametric MDA/ML and non-parametric GRLVQI classifiers, providing up to GS =18.0 dB improvement (reduction in required SNR). Network penetration, measured using rogue ZigBee devices, show that the RndF method improved rogue rejection in noisier environments - gains of up to GS =18.0 dB are realized over previous methods

    Winsorize tree algorithm for handling outliers in classification problem

    Get PDF
    Classification and Regression Tree (CART) is designed to predict or classify the objects in the predetermined classes from a set of predictors. However, having outliers could affect the structures of CART, purity and predictive accuracy in classification. Some researchers opt to perform pre-pruning or post-pruning of the CART in handling the outliers. This study proposes a modified classification tree algorithm called Winsorize tree based on the distribution of classes in the training dataset. The Winsorize tree investigates all possible outliers from node to node before checking the potential splitting point to gain the node with the highest purity of the nodes. The upper fence and lower fence of a boxplot are used to detect potential outliers whose values exceeding the tail of Q ± (1.5×Interquartile range). The identified outliers are neutralized using the Winsorize method whilst the Winsorize Gini index is then used to compute the divergences among probability distributions of the target predictor’s values until stopping criteria are met. This study uses three stopping rules: node achieved the minimum 10% of total training set

    Impact of evaluation methods on decision tree accuracy

    Get PDF
    Decision trees are one of the most powerful and commonly used supervised learning algorithms in the field of data mining. It is important that a decision tree performs accurately when employed on unseen data; therefore, evaluation methods are used to measure the predictive performance of a decision tree classifier. However, the predictive accuracy of a decision tree is also dependant on the evaluation method chosen since training and testing sets of decision tree models are selected according to the evaluation methods. The aim of this thesis was to study and understand how using different evaluation methods might have an impact on decision tree accuracies when they are applied to different decision tree algorithms. Consequently, comprehensive research was made on decision trees and evaluation methods. Additionally, an experiment was conducted using ten different datasets, five decision tree algorithms and five different evaluation methods in order to study the relationship between evaluation methods and decision tree accuracies. The decision tree inducers were tested with Leave-one-out, 5-Fold Cross Validation, 10-Fold Cross Validation, Holdout 50 split and Holdout 66 split evaluation methods. According to the results, cross validation methods were superior to holdout methods in overall. Moreover, Holdout 50 split has performed the poorest in most of the datasets. The possible reasons behind these results have also been discussed in the thesis

    Impact of evaluation methods on decision tree accuracy

    Get PDF
    Decision trees are one of the most powerful and commonly used supervised learning algorithms in the field of data mining. It is important that a decision tree performs accurately when employed on unseen data; therefore, evaluation methods are used to measure the predictive performance of a decision tree classifier. However, the predictive accuracy of a decision tree is also dependant on the evaluation method chosen since training and testing sets of decision tree models are selected according to the evaluation methods. The aim of this thesis was to study and understand how using different evaluation methods might have an impact on decision tree accuracies when they are applied to different decision tree algorithms. Consequently, comprehensive research was made on decision trees and evaluation methods. Additionally, an experiment was conducted using ten different datasets, five decision tree algorithms and five different evaluation methods in order to study the relationship between evaluation methods and decision tree accuracies. The decision tree inducers were tested with Leave-one-out, 5-Fold Cross Validation, 10-Fold Cross Validation, Holdout 50 split and Holdout 66 split evaluation methods. According to the results, cross validation methods were superior to holdout methods in overall. Moreover, Holdout 50 split has performed the poorest in most of the datasets. The possible reasons behind these results have also been discussed in the thesis

    Plan Projection, Execution, and Learning for Mobile Robot Control

    Get PDF
    Most state-of-the-art hybrid control systems for mobile robots are decomposed into different layers. While the deliberation layer reasons about the actions required for the robot in order to achieve a given goal, the behavioral layer is designed to enable the robot to quickly react to unforeseen events. This decomposition guarantees a safe operation even in the presence of unforeseen and dynamic obstacles and enables the robot to cope with situations it was not explicitly programmed for. The layered design, however, also leaves us with the problem of plan execution. The problem of plan execution is the problem of arbitrating between the deliberation- and the behavioral layer. Abstract symbolic actions have to be translated into streams of local control commands. Simultaneously, execution failures have to be handled on an appropriate level of abstraction. It is now widely accepted that plan execution should form a third layer of a hybrid robot control system. The resulting layered architectures are called three-tiered architectures, or 3T architectures for short. Although many high level programming frameworks have been proposed to support the implementation of the intermediate layer, there is no generally accepted algorithmic basis for plan execution in three-tiered architectures. In this thesis, we propose to base plan execution on plan projection and learning and present a general framework for the self-supervised improvement of plan execution. This framework has been implemented in APPEAL, an Architecture for Plan Projection, Execution And Learning, which extends the well known RHINO control system by introducing an execution layer. This thesis contributes to the field of plan-based mobile robot control which investigates the interrelation between planning, reasoning, and learning techniques based on an explicit representation of the robot's intended course of action, a plan. In McDermott's terminology, a plan is that part of a robot control program, which the robot cannot only execute, but also reason about and manipulate. According to that broad view, a plan may serve many purposes in a robot control system like reasoning about future behavior, the revision of intended activities, or learning. In this thesis, plan-based control is applied to the self-supervised improvement of mobile robot plan execution

    Tools and Techniques for Decision Tree Learning

    Get PDF
    Decision tree learning is an important field of machine learning. In this study we examine both formal and practical aspects of decision tree learning. We aim at answering to two important needs: The need for better motivated decision tree learners and an environment facilitating experimentation with inductive learning algorithms. As results we obtain new practical tools and useful techniques for decision tree learning. First, we derive the practical decision tree learner Rank based on the Findmin protocol of Ehrenfeucht and Haussler. The motivation for the changes introduced to the method comes from empirical experience, but we prove the correctness of the modifications in the probably approximately correct learning framework. The algorithm is enhanced by extending it to operate in the multiclass situations, making it capable of working within the incremental setting, and providing noise tolerance into it. Together these modifications entail practicability through a formal development..

    Tools and Techniques for Decision Tree Learning

    Get PDF

    Machine learning methods for calibrating radio interferometric data

    Get PDF
    The applications of machine learning have created an opportunity to deal with complex problems currently encountered in radio astronomy data processing. Calibration is one of the most important data processing steps required to produce high dynamic range images. This process involves the determination of calibration parameters, both instrumental and astronomical, to correct the collected data. Typically, astronomers use a package such as Common Astronomy Software Applications (CASA) to compute the gain solutions based on regular observations of a known calibrator source. In this work we present applications of machine learning to first generation calibration (1GC), using the KAT-7 telescope environmental and pointing sensor data recorded during observations. Applying machine learning to 1GC, as opposed to calculating the gain solutions in CASA, has shown evidence of reducing computation, as well as accurately predict the 1GC gain solutions representing the behaviour of the antenna during an observation. These methods are computationally less expensive, however they have not fully learned to generalise in predicting accurate 1GC solutions by looking at environmental and pointing sensors. We call this multi-output regression model ZCal, which is based on random forest, decision trees, extremely randomized trees and K-nearest neighbor algorithms. The prediction error obtained during the testing of our model on testing data is ≈ 0.01 < rmse < 0.09 for gain amplitude per antenna, and 0.2 rad < rmse <0.5 rad for gain phase. This shows that the instrumental parameters used to train our model more strongly correlate with gain amplitude effects than phase
    • …
    corecore