53 research outputs found

    Classifiers accuracy improvement based on missing data imputation

    Get PDF
    In this paper we investigate further and extend our previous work on radar signal identification and classification based on a data set which comprises continuous, discrete and categorical data that represent radar pulse train characteristics such as signal frequencies, pulse repetition, type of modulation, intervals, scan period, scanning type, etc. As the most of the real world datasets, it also contains high percentage of missing values and to deal with this problem we investigate three imputation techniques: Multiple Imputation (MI); K-Nearest Neighbour Imputation (KNNI); and Bagged Tree Imputation (BTI). We apply these methods to data samples with up to 60% missingness, this way doubling the number of instances with complete values in the resulting dataset. The imputation models performance is assessed with Wilcoxon’s test for statistical significance and Cohen’s effect size metrics. To solve the classification task, we employ three intelligent approaches: Neural Networks (NN); Support Vector Machines (SVM); and Random Forests (RF). Subsequently, we critically analyse which imputation method influences most the classifiers’ performance, using a multiclass classification accuracy metric, based on the area under the ROC curves. We consider two superclasses (‘military’ and ‘civil’), each containing several ‘subclasses’, and introduce and propose two new metrics: inner class accuracy (IA); and outer class accuracy (OA), in addition to the overall classification accuracy (OCA) metric. We conclude that they can be used as complementary to the OCA when choosing the best classifier for the problem at hand

    Cape Town road traffic accident analysis: Utilising supervised learning techniques and discussing their effectiveness

    Full text link
    [EN] Road traffic accidents (RTA) are a major cause of death and injury around the world. The use of Supervised learning (SL) methods to understand the frequency and injury-severity of RTAs are of utmost importance in designing appropriate interventions. Data on RTAs that occurred in the city of Cape Town during 2015-2017 are used for this study. The data contain the injury-severity (no injury, slight, serious and fatal injury) of the RTAs as well as several accident-related variables. Additional locational and situational variables were added to the dataset. Four training datasets were analysed: the original imbalanced data, data with the minority class over-sampled, data with the majority class under-sampled and data with synthetically created observations. The performance of different SL methods were compared using accuracy, recall, precision and F1 score evaluation metrics and based on the average recall the ANN was selected as the best performing model on the validation data.Du Toit, C.; Salau, S.; Er, S. (2022). Cape Town road traffic accident analysis: Utilising supervised learning techniques and discussing their effectiveness. En 4th International Conference on Advanced Research Methods and Analytics (CARMA 2022). Editorial Universitat Politècnica de València. 57-64. https://doi.org/10.4995/CARMA2022.2022.15041576

    NON-INTRUSIVE LOAD MONITORING USING CURRENT HARMONIC VECTORS AND ADAPTIVE FEATURE SELECTION

    Get PDF
    The non-intrusive load monitoring method presented in this paper uses changes in current harmonic vectors to identify the operational state of appliances. The algorithm based on this feature has low complexity, but it may suffer from an information loss caused by a random fluctuation of the current harmonic vectors. In order to deal with this problem, we propose the algorithm which includes a stage which identifies and select a subset of relevant features in the set of available appliance features. The proposed load disaggregation algorithm is demonstrated through experiments on a representative set of household appliances

    Data Preparation in the Big Data Era

    Get PDF
    Preparing and cleaning data is notoriously expensive, prone to error, and time consuming: the process accounts for roughly 80% of the total time spent on analysis. As this O’Reilly report points out, enterprises have already invested billions of dollars in big data analytics, so there’s great incentive to modernize methods for cleaning, combining, and transforming data. Author Federico Castanedo, Chief Data Scientist at WiseAthena.com, details best practices for reducing the time it takes to convert raw data into actionable insights. With these tools and techniques in mind, your organization will be well positioned to translate big data into big decisions. • Explore the problems organizations face today with traditional prep and integration • Define the business questions you want to address before selecting, prepping, and analyzing data • Learn new methods for preparing raw data, including date-time and string data • Understand how some cleaning actions (like replacing missing values) affect your analysis • Examine data curation products: modern approaches that scale • Consider your business audience when choosing ways to deliver your analysis Federico Castanedo is the Chief Data Scientist at WiseAthena.com. Involved in projects related to data analysis in academia and industry for more than a decade, he’s published several scientific papers about data fusion techniques, visual sensor networks, and machine learning

    Service-Oriented Cognitive Analytics for Smart Service Systems: A Research Agenda

    Get PDF
    The development of analytical solutions for smart services systems relies on data. Typically, this data is distributed across various entities of the system. Cognitive learning allows to find patterns and to make predictions across these distributed data sources, yet its potential is not fully explored. Challenges that impede a cross-entity data analysis concern organizational challenges (e.g., confidentiality), algorithmic challenges (e.g., robustness) as well as technical challenges (e.g., data processing). So far, there is no comprehensive approach to build cognitive analytics solutions, if data is distributed across different entities of a smart service system. This work proposes a research agenda for the development of a service-oriented cognitive analytics framework. The analytics framework uses a centralized cognitive aggregation model to combine predictions being made by each entity of the service system. Based on this research agenda, we plan to develop and evaluate the cognitive analytics framework in future research
    corecore