4 research outputs found

    On pruning and feature engineering in Random Forests.

    Get PDF
    Random Forest (RF) is an ensemble classification technique that was developed by Leo Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for optimizing RF further by enhancing and improving its performance accuracy. This explains why there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. The main focus of this dissertation is to develop new extensions of RF using new optimization techniques that, to the best of our knowledge, have never been used before to optimize RF. These techniques are clustering, the local outlier factor, diversified weighted subspaces, and replicator dynamics. Applying these techniques on RF produced four extensions which we have termed CLUB-DRF, LOFB-DRF, DSB-RF, and RDB-DR respectively. Experimental studies on 15 real datasets showed favorable results, demonstrating the potential of the proposed methods. Performance-wise, CLUB-DRF is ranked first in terms of accuracy and classifcation speed making it ideal for real-time applications, and for machines/devices with limited memory and processing power

    Evaluation of Random Forest-Genetic Algorithm Hybrid Model in Estimating Daily Solar Radiation

    Get PDF
    Solar energy is the most important source of renewable energy, in other words, the main source of energy on Earth. Therefore, estimating the solar radiation parameter with high accuracy is very important. In this regard, in the present study, meteorological data of 3 meteorological stations of Ardabil province, including Meshginshahr, Germi, and Nir for a period of 2 years (2017-2018) on a daily scale were used. Then, the intensity of daily solar radiation in each of the mentioned stations was estimated using random forest and random forest methods-genetic algorithm. The meteorological variables used included minimum, maximum and average temperature, relative humidity, and wind speed, which in eight different combinations were considered as input data in the model calculations. The obtained results were compared with each other using statistical parameters and the best models were selected. By comparing the results, the models of Nir, Meshginshahr, and Germi stations were ranked from highest to lowest modeling accuracy, respectively; So that the GA-RF-V model in Nir station with the root mean square error of 0.346 MJ/m2d and Kling-Gupta efficiency of 0.687 with the least error was introduced as the best model in this study. Also, the results showed that the genetic algorithm has helped to increase the accuracy of all utilized models

    Cardiovascular data analytics for real time patient monitoring

    Get PDF
    Improvements in wearable sensor devices make it possible to constantly monitor physiological parameters such as electrocardiograph (ECG) signals for long periods. Remote patient monitoring with wearable sensors has an important role to play in health care, particularly given the prevalence of chronic conditions such as cardiovascular disease (CVD)—one of the prominent causes of morbidity and mortality worldwide. Approximately 4.2 million Australians suffer from long-term CVD with approximately one death every 12 minutes. The assessment of ECG features, especially heart rate variability (HRV), represents a non-invasive technique which provides an indication of the autonomic nervous system (ANS) function. Conditions such as sudden cardiac death, hypertension, heart failure, myocardial infarction, ischaemia, and coronary heart disease can be detected from HRV analysis. In addition, the analysis of ECG features can also be used to diagnose many types of life-threatening arrhythmias, including ventricular fibrillation and ventricular tachycardia. Non-cardiac conditions, such as diabetes, obesity, metabolic syndrome, insulin resistance, irritable bowel syndrome, dyspepsia, anorexia nervosa, anxiety, and major depressive disorder have also been shown to be associated with HRV. The analysis of ECG features from real time ECG signals generated from wearable sensors provides distinctive challenges. The sensors that receive and process the signals have limited power, storage and processing capacity. Consequently, algorithms that process ECG signals need to be lightweight, use minimal storage resources and accurately detect abnormalities so that alarms can be raised. The existing literature details only a few algorithms which operate within the constraints of wearable sensor networks. This research presents four novel techniques that enable ECG signals to be processed within the limitations of resource constraints on devices to detect some key abnormalities in heart function. - The first technique is a novel real-time ECG data reduction algorithm, which detects and transmits only those key points that are critical for the generation of ECG features for diagnoses. - The second technique accurately predicts the five-minute HRV measure using only three minutes of data with an algorithm that executes in real-time using minimal computational resources. - The third technique introduces a real-time ECG feature recognition system that can be applied to diagnose life threatening conditions such as premature ventricular contractions (PVCs). - The fourth technique advances a classification algorithm to enhance the performance of automated ECG classification to determine arrhythmic heart beats based on noisy ECG signals. The four novel techniques are evaluated in comparison with benchmark algorithms for each task on the standard MIT-BIH Arrhythmia Database and with data generated from patients in a major hospital using Shimmer3 wearable ECG sensors. The four techniques are integrated to demonstrate that remote patient monitoring of ECG using HRV and ECG features is feasible in real time using minimal computational resources. The evaluation show that the ECG reduction algorithm is significantly better than existing algorithms that can be applied within sensor nodes, such as time-domain methods, transformation methods and compressed sensing methods. Furthermore, the proposed ECG reduction is found to be computationally less complex for resource constrained sensors and achieves higher compression ratios than existing algorithms. The prediction of a common HRV measure, the five-minute standard deviation of inter-beat variations (SDNN) and the accurate detection of PVC beats was achieved using a Count Data Model, combined with a Poisson-generated function from three-minute ECG recordings. This was achieved with minimal computational resources and was well suited to remote patient monitoring with wearable sensors. The PVC beats detection was implemented using the same count data model together with knowledge-based rules derived from clinical knowledge. A real-time cardiac patient monitoring system was implemented using an ECG sensor and smartphone to detect PVC beats within a few seconds using artificial neural networks (ANN), and it was proven to provide highly accurate results. The automated detection and classification were implemented using a new wrapper-based hybrid approach that utilized t-distributed stochastic neighbour embedding (t-SNE) in combination with self-organizing maps (SOM) to improve classification performance. The t-SNE-SOM hybrid resulted in improved sensitivity, specificity and accuracy compared to most common hybrid methods in the presence of noise. It also provided a better, more accurate identification for the presence of many types of arrhythmias from the ECG recordings, leading to a more timely diagnosis and treatment outcome.Doctor of Philosoph
    corecore