1,576 research outputs found

    Towards a framework for designing full model selection and optimization systems

    Get PDF
    People from a variety of industrial domains are beginning to realise that appropriate use of machine learning techniques for their data mining projects could bring great benefits. End-users now have to face the new problem of how to choose a combination of data processing tools and algorithms for a given dataset. This problem is usually termed the Full Model Selection (FMS) problem. Extended from our previous work [10], in this paper, we introduce a framework for designing FMS algorithms. Under this framework, we propose a novel algorithm combining both genetic algorithms (GA) and particle swarm optimization (PSO) named GPS (which stands for GA-PSO-FMS), in which a GA is used for searching the optimal structure for a data mining solution, and PSO is used for searching optimal parameters for a particular structure instance. Given a classification dataset, GPS outputs a FMS solution as a directed acyclic graph consisting of diverse data mining operators that are available to the problem. Experimental results demonstrate the benefit of the algorithm. We also present, with detailed analysis, two model-tree-based variants for speeding up the GPS algorithm

    An Application of Decision Tree Models toExamine Motor Vehicle Crash Severity Outcomes

    Get PDF
    Classification and Regression Tree (CART) and chi-square automatic interaction detection (CHAID) decision tree models are estimated and compared to examine the effect of driver characteristics and behaviors, temporal factors, weather conditions, and road characteristics on motor vehicle crash severity levels using Missouri crash data from 2002 to 2012. The CHAID model is found to significantly better discriminate among severity outcomes, and results suggest that the presence of alcohol, speeding, and failing to yield lead to many fatalities each year and likely have interactive effects. Decision rules are used to identify changes in driving policies expected to reduce severity outcomes

    Identifying and Quantifying Factors Affecting Injury Severity of Young Drivers Involved in Single Vehicle Crashes Occurring within Curves on Rural Two-Lane Roads in Louisiana

    Get PDF
    This study investigates factors affecting young driver injury levels for single vehicle crashes occurring within curves on rural two-lane roads in Louisiana. Although the number of fatal and serious injury crashes involving young drivers is declining, young drivers are still overrepresented in crashes and crashes are still the leading cause of death for young drivers. Driver injury prediction models are formulated using binary logistic regression and Bayesian Network (BN) modeling. Binary logistic regression models have commonly been used in safety studies to analyze injury levels of occupants involved in crashes over the past few decades. More recently, a few safety studies have begun to use BN models to evaluate injury levels. This study identifies eight significant factors affecting youth driver injury levels: air bag, distracted, ejected, gender, protection system, substance suspected, violation, and most harmful event. Of these factors distracted, protection system, substance suspected, and violation are human factors which can be modified through educational programs. While both models are able to identify statistical significant variables, more insight is gained from the BN model. For instance, both models found gender to be statistically significant. While the logistical regression model finds males are 0.751 times less likely to be injured than female, the BN finds gender only has a 0.02% direct effect on injury. The BN shows that it is not gender itself that affects driver injury level, but the different behavior characteristics of males versus females which affect injury levels. Males are less likely to wear seatbelts and more likely to be suspected of alcohol in crashes. It is these driver behaviors, not the gender of the driver, which affects injuries. This study also has a number of theoretical and practical implications. As the first study to utilize BN modeling in evaluating driver injury levels in Louisiana, it expands the literature of BN models being used for analyzing injury levels in car crashes. The findings are also important to driver educational and safety professionals. By identifying factors affecting young driver injury levels, educational and training programs can be enhanced to target specific human behaviors to save more lives

    Improving Accuracy and Performance of Customer Churn Prediction Using Feature Reduction Algorithms

    Get PDF
    Prediction of customer churn is one of the most essential activities in Customer Relationship Management (CRM). However, the state-of-the-art of the customer churn prediction approach only focuses on the classifier selection in improving the accuracy and performance of churn prediction, but rarely contemplate the feature reduction algorithms. Furthermore, there are numerous attributes that contribute to customer churn and it is crucial to determine the most substantial features in order to acquire the highest prediction accuracy and to improve the prediction performance. Feature reduction decreases the dimensionality of the information and may allow learning algorithms to function faster and more effectively and able to produce predictive models that deliver the highest rate of accuracy. In this research, we investigated and proposed two (2) different feature reduction algorithms which are Correlation based Feature Selection (CFS) and Information Gain (IG) and built classification models based on three 3) different classifiers, namely Bayes Net, Simple Logistic and Decision Table. Experimental results demonstrate that the performance of classifiers improves with the application of features reduction of the customer churn data set. A CFS feature reduction algorithm with the Decision Table classifier yields the highest accuracy of 92.08% and has the lowest RMSE of 0.2554. This study recommends the use of feature reduction algorithms in the context of CRM for churn prediction to improve accuracy and performance of customer churn prediction

    An Overview of the Use of Neural Networks for Data Mining Tasks

    Get PDF
    In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks
    corecore