5,701 research outputs found

    A Hybrid Random Forest based Support Vector Machine Classification Supplemented by Boosting

    Get PDF
    This paper presents an approach to classify remote sensed data using a hybrid classifier. Random forest, Support Vector machines and boosting methods are used to build the said hybrid classifier. The central idea is to subdivide the input data set into smaller subsets and classify individual subsets. The individual subset classification is done using support vector machines classifier. Boosting is used at each subset to evaluate the learning by using a weight factor for every data item in the data set. The weight factor is updated based on classification accuracy. Later the final outcome for the complete data set is computed by implementing a majority voting mechanism to the individual subset classification outcomes

    A knowledge-intensive methodology for explainable sales prediction

    Get PDF
    Sales prediction in food market is a complex issue that has been addressed in the recent past with machine learning techniques. Although some promising results, an experimental work that we describe in this paper shows some drawbacks of the above mentioned data-driven method and habilitates the definition of a novel methodology, strongly involving a piori knowledg

    A genetic algorithm approach to optimising random forests applied to class engineered data

    Get PDF
    In numerous applications and especially in the life science domain, examples are labelled at a higher level of granularity. For example, binary classification is dominant in many of these data sets, with the positive class denoting the existence of a particular disease in medical diagnosis applications. Such labelling does not depict the reality of having different categories of the same disease; a fact evidenced in the continuous research in root causes and variations of symptoms in a number of diseases. In a quest to enhance such diagnosis, data sets were decomposed using clustering of each class to reveal hidden categories. We then apply the widely adopted ensemble classification technique Random Forests. Such class decomposition has two advantages: (1) diversification of the input that enhances the ensemble classification; and (2) improving class separability, easing the follow-up classification process. However, to be able to apply Random Forests on such class decomposed data, three main parameters need to be set: number of trees forming the ensemble, number of features to split on at each node, and a vector representing the number of clusters in each class. The large search space for tuning these parameters has motivated the use of Genetic Algorithm to optimise the solution. A thorough experimental study on 22 real data sets was conducted, predominantly in a variety of life science applications. To prove the applicability of the method to other areas of application, the proposed method was tested on a number of data sets from other domains. Three variations of Random Forests including the proposed method as well as a boosting ensemble classifier were used in the experimental study. The results prove the superiority of the proposed method in boosting up the accuracy

    An Investigation of Methods for CT Synthesis in MR-only Radiotherapy

    Get PDF

    Comparison of Machine Learning Algorithms and Their Ensembles for Botnet Detection

    Get PDF
    A Botnet is a network of compromised devices controlled by a botmaster often for nefarious purposes. Analyzing network traffc to detect Botnet traffc has historically been an effective approach for systems monitoring for network intrusion. Although such system have been applying various machine learning techniques, little investigation into a comparison of machine algorithms and their ensembles has been undertaken. In this study, three popular classifcation machine learning algorithms – Naive Bayes, Decision tree, and Neural network – as well as the ensemble methods known to strengthen said classifers are evaluated for enhanced results related to Botnet detection. This evaluation is conducted with the CTU-13 public dataset, measuring the training time and accuracy scores of each classifer

    Random Forest Prediction of IPO Underpricing

    Get PDF
    The prediction of initial returns on initial public offerings (IPOs) is a complex matter. The independent variables identified in the literature mix strong and weak predictors, their explanatory power is limited, and samples include a sizable number of outliers. In this context, we suggest that random forests are a potentially powerful tool. In this paper, we benchmark this algorithm against a set of eight classic machine learning algorithms. The results of this comparison show that random forests outperform the alternatives in terms of mean and median predictive accuracy. The technique also provided the second smallest error variance among the stochastic algorithms. The experimental work also supports the potential of random forests for two practical applications: IPO pricing and IPO trading.The authors acknowledge financial support granted by the Spanish Ministry of Science under grant ENE2014-56126-C2-2-R

    Data envelopment analysis and data mining to efficiency estimation and evaluation

    Get PDF
    Purpose: This paper aims to assess the application of seven statistical and data mining techniques to second-stage data envelopment analysis (DEA) for bank performance. Design/methodology/approach: Different statistical and data mining techniques are used to second-stage DEA for bank performance as a part of an attempt to produce a powerful model for bank performance with effective predictive ability. The projected data mining tools are classification and regression trees (CART), conditional inference trees (CIT), random forest based on CART and CIT, bagging, artificial neural networks and their statistical counterpart, logistic regression. Findings: The results showed that random forests and bagging outperform other methods in terms of predictive power. Originality/value: This is the first study to assess the impact of environmental factors on banking performance in Middle East and North Africa countries.Scopu
    • …
    corecore