793 research outputs found

    Comparative study of state-of-the-art machine learning models for analytics-driven embedded systems

    Get PDF
    Analytics-driven embedded systems are gaining foothold faster than ever in the current digital era. The innovation of Internet of Things(IoT) has generated an entire ecosystem of devices, communicating and exchanging data automatically in an interconnected global network. The ability to efficiently process and utilize the enormous amount of data being generated from an ensemble of embedded devices like RFID tags, sensors etc., enables engineers to build smart real-world systems. Analytics-driven embedded system explores and processes the data in-situ or remotely to identify a pattern in the behavior of the system and in turn can be used to automate actions and embark decision making capability to a device. Designing an intelligent data processing model is paramount for reaping the benefits of data analytics, because a poorly designed analytics infrastructure would degrade the system’s performance and effectiveness. There are many different aspects of this data that make it a more complex and challenging analytics task and hence a suitable candidate for big data. Big data is mainly characterized by its high volume, hugely varied data types and high speed of data receipt; all these properties mandate the choice of correct data mining techniques to be used for designing the analytics model. Datasets with images like face recognition, satellite images would perform better with deep learning algorithms, time-series datasets like sensor data from wearable devices would give better results with clustering and supervised learning models. A regression model would suit best for a multivariate dataset like appliances energy prediction data, forest fire data etc. Each machine learning task has a varied range of algorithms which can be used in combination to create an intelligent data analysis model. In this study, a comprehensive comparative analysis was conducted using different datasets freely available on online machine learning repository, to analyze the performance of state-of-art machine learning algorithms. WEKA data mining toolkit was used to evaluate C4.5, Naïve Bayes, Random Forest, kNN, SVM and Multilayer Perceptron for classification models. Linear regression, Gradient Boosting Machine(GBM), Multilayer Perceptron, kNN, Random Forest and Support Vector Machines (SVM) were applied to dataset fit for regression machine learning. Datasets were trained and analyzed in different experimental setups and a qualitative comparative analysis was performed with k-fold Cross Validation(CV) and paired t-test in Weka experimenter

    Modelling atmospheric ozone concentration using machine learning algorithms

    Get PDF
    Air quality monitoring is one of several important tasks carried out in the area of environmental science and engineering. Accordingly, the development of air quality predictive models can be very useful as such models can provide early warnings of pollution levels increasing to unsatisfactory levels. The literature review conducted within the research context of this thesis revealed that only a limited number of widely used machine learning algorithms have been employed for the modelling of the concentrations of atmospheric gases such as ozone, nitrogen oxides etc. Despite this observation the research and technology area of machine learning has recently advanced significantly with the introduction of ensemble learning techniques, convolutional and deep neural networks etc. Given these observations the research presented in this thesis aims to investigate the effective use of ensemble learning algorithms with optimised algorithmic settings and the appropriate choice of base layer algorithms to create effective and efficient models for the prediction and forecasting of specifically, ground level ozone (O3). Three main research contributions have been made by this thesis in the application area of modelling O3 concentrations. As the first contribution, the performance of several ensemble learning (Homogeneous and Heterogonous) algorithms were investigated and compared with all popular and widely used single base learning algorithms. The results have showed impressive prediction performance improvement obtainable by using meta learning (Bagging, Stacking, and Voting) algorithms. The performances of the three investigated meta learning algorithms were similar in nature giving an average 0.91 correlation coefficient, in prediction accuracy. Thus as a second contribution, the effective use of feature selection and parameter based optimisation was carried out in conjunction with the application of Multilayer Perceptron, Support Vector Machines, Random Forest and Bagging based learning techniques providing significant improvements in prediction accuracy. The third contribution of research presented in this thesis includes the univariate and multivariate forecasting of ozone concentrations based of optimised Ensemble Learning algorithms. The results reported supersedes the accuracy levels reported in forecasting Ozone concentration variations based on widely used, single base learning algorithms. In summary the research conducted within this thesis bridges an existing research gap in big data analytics related to environment pollution modelling, prediction and forecasting where present research is largely limited to using standard learning algorithms such as Artificial Neural Networks and Support Vector Machines often available within popular commercial software packages

    A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates

    Get PDF
    Our hypothesis is that building ensembles of small sets of strong classifiers constructed with different learning algorithms is, on average, the best approach to classification for real world problems. We propose a simple mechanism for building small heterogeneous ensembles based on exponentially weighting the probability estimates of the base classifiers with an estimate of the accuracy formed through cross-validation on the train data. We demonstrate through extensive experimentation that, given the same small set of base classifiers, this method has measurable benefits over commonly used alternative weighting, selection or meta classifier approaches to heterogeneous ensembles. We also show how an ensemble of five well known, fast classifiers can produce an ensemble that is not significantly worse than large homogeneous ensembles and tuned individual classifiers on datasets from the UCI archive. We provide evidence that the performance of the Cross-validation Accuracy Weighted Probabilistic Ensemble (CAWPE) generalises to a completely separate set of datasets, the UCR time series classification archive, and we also demonstrate that our ensemble technique can significantly improve the state-of-the-art classifier for this problem domain. We investigate the performance in more detail, and find that the improvement is most marked in problems with smaller train sets. We perform a sensitivity analysis and an ablation study to demonstrate the robustness of the ensemble and the significant contribution of each design element of the classifier. We conclude that it is, on average, better to ensemble strong classifiers with a weighting scheme rather than perform extensive tuning and that CAWPE is a sensible starting point for combining classifiers

    Ensemble deep learning: A review

    Get PDF
    Ensemble learning combines several individual models to obtain better generalization performance. Currently, deep learning models with multilayer processing architecture is showing better performance as compared to the shallow or traditional classification models. Deep ensemble learning models combine the advantages of both the deep learning models as well as the ensemble learning such that the final model has better generalization performance. This paper reviews the state-of-art deep ensemble models and hence serves as an extensive summary for the researchers. The ensemble models are broadly categorised into ensemble models like bagging, boosting and stacking, negative correlation based deep ensemble models, explicit/implicit ensembles, homogeneous /heterogeneous ensemble, decision fusion strategies, unsupervised, semi-supervised, reinforcement learning and online/incremental, multilabel based deep ensemble models. Application of deep ensemble models in different domains is also briefly discussed. Finally, we conclude this paper with some future recommendations and research directions
    • …
    corecore