736 research outputs found

    A Hybrid Tabu Search and Genetic Algorithm Imputation Approach for Incomplete Data

    Get PDF
    The common problem for data collection is happening missing value during the data collection and processing process that the quality of the data testing is decreased. A computational based technique for dealing with missing values, namely Genetic Algorithm Imputation (GAI). The usage was used to estimate the dataset's missing values. GAI generates the optimal set of missing values with the acquisition of information as a function of fitness to measure individual solutions' performance. GAI conducts continuous searching until the missing criteria value is found according to best fitness. So, it is trapped in optimal conditions temporarily. The improvement of GAI with tabu search is known as TS-GAI, that strength is two metaheuristic techniques modified at the mutase stage to distract the local optima's search.  In applying missing values, this technique works better when many possible values are used instead of the mixed attribute having missing values. Because the new generation chromosome values generate many opportunities to make up for the missing values. The experimental results show that the TS-GAI shows better performance on 30% MV with a fitness value of 0.212. It converges at 159 iterations. Generally, TS-GAI is a faster iteration than simple GAI and it has a lower RMSE level than other imputation techniques

    Proceedings of Abstracts Engineering and Computer Science Research Conference 2019

    Get PDF
    © 2019 The Author(s). This is an open-access work distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. For further details please see https://creativecommons.org/licenses/by/4.0/. Note: Keynote: Fluorescence visualisation to evaluate effectiveness of personal protective equipment for infection control is © 2019 Crown copyright and so is licensed under the Open Government Licence v3.0. Under this licence users are permitted to copy, publish, distribute and transmit the Information; adapt the Information; exploit the Information commercially and non-commercially for example, by combining it with other Information, or by including it in your own product or application. Where you do any of the above you must acknowledge the source of the Information in your product or application by including or linking to any attribution statement specified by the Information Provider(s) and, where possible, provide a link to this licence: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/This book is the record of abstracts submitted and accepted for presentation at the Inaugural Engineering and Computer Science Research Conference held 17th April 2019 at the University of Hertfordshire, Hatfield, UK. This conference is a local event aiming at bringing together the research students, staff and eminent external guests to celebrate Engineering and Computer Science Research at the University of Hertfordshire. The ECS Research Conference aims to showcase the broad landscape of research taking place in the School of Engineering and Computer Science. The 2019 conference was articulated around three topical cross-disciplinary themes: Make and Preserve the Future; Connect the People and Cities; and Protect and Care

    A Review of Missing Data Handling Techniques for Machine Learning

    Get PDF
    Real-world data are commonly known to contain missing values, and consequently affect the performance of most machine learning algorithms adversely when employed on such datasets. Precisely, missing values are among the various challenges occurring in real-world data. Since the accuracy and efficiency of machine learning models depend on the quality of the data used, there is a need for data analysts and researchers working with data, to seek out some relevant techniques that can be used to handle these inescapable missing values. This paper reviews some state-of-art practices obtained in the literature for handling missing data problems for machine learning. It lists some evaluation metrics used in measuring the performance of these techniques. This study tries to put these techniques and evaluation metrics in clear terms, followed by some mathematical equations. Furthermore, some recommendations to consider when dealing with missing data handling techniques were provided

    Comparative study of imputation algorithms applied to the prediction of student performance

    Get PDF
    [Abstract]: Student performance and its evaluation remain a serious challenge for education systems. Frequently, the recording and processing of students’ scores in a specific curriculum have several f laws for various reasons. In this context, the absence of data from some of the student scores undermines the efficiency of any future analysis carried out in order to reach conclusions. When this is the case, missing data imputation algorithms are needed. These algorithms are capable of substituting, with a high level of accuracy, the missing data for predicted values. This research presents the hybridization of an algorithm previously proposed by the authors called adaptive assignation algorithm (AAA), with a well-known technique called multivariate imputation by chained equations (MICE). The results show how the suggested methodology outperforms both algorithms.Ministerio de Economía y Competitividad ; AYA2014-57648-PAsturias. Consejería de Economía y Empleo ; FC-15-GRUPIN14-01

    The impact of missing data imputation on HIV classification

    Get PDF
    Missing data are a part of research and data analysis that often cannot be ignored. Although a number of methods have been developed in handling and imputing missing data, this problem is, for the most part, still unsolved with many researchers still struggling with its existence. Due to the availability of software and the advancement of computational power, maximum likelihood and multiple imputations as well as neural networks and genetic algorithms (AANN-GA) have been introduced as solutions to the missing data problem. Although these methods have given considerable results in this domain, the impact that missing data and missing data imputation has on decision making has, until recently, not been assessed. This dissertation contributes to this knowledge by first introducing a new computational intelligent model that integrates Neuro-Fuzzy (N-F) modeling, Principal Component Analysis and the genetic algorithms to impute missing data. The performance of this model is then compared to that of the AANN-GA as well as the independent use of the N-F architecture. In order to determine if the data are predictable and also to assist in processing the data for training, an analysis on the HIV sero-prevalence data is performed. Two classification decision making frameworks are then presented in order to assess the effect of missing data. These decision frameworks are trained to classify between two conditions when presented with a set of data variables. The first is the use of a Bayesian neural network which is statistical in nature and the second is based on the fuzzy ARTMAP (FAM) classifier which has incremental abilities. The two methods are used and compared in order to assess the degree in which missing data, and the imputation thereof, has on decision making. The effect of missing data differs for the two frameworks; while the Bayesian neural network fails in the presence of missing data, the FAM classifier attempts to classify with a decreased accuracy. This work has shown that although missing data and the imputation thereof has an effect on decision making, the degree of that effect is dependent on the decision making framework and on the model used for data imputation

    Genetic Algorithms for Cross-Calibration of Categorical Data

    Get PDF
    The probabilistic problem of cross-calibration of two categorical variables is addressed. A probabilistic forecast of the categorical variables is obtained based on a sample of observed data. This forecast is the output of a genetic algorithm based approach, which makes no assumption on the type of relationship between the two variables and applies a scoring rule to assess the fitness of the chromosomes. It converges to a good-quality point probability forecast of the joint distribution of the two variables. The proposed approach is applied both at stationary points in time and across time. Its performance is enhanced when additional sampled data is included, and can be designed with different scoring rules or made to account for missing data

    Biopsychosocial Data Analytics and Modeling

    Get PDF
    Sustained customisation of digital health intervention (DHI) programs, in the context of community health engagement, requires strong integration of multi-sourced interdisciplinary biopsychosocial health data. The biopsychosocial model is built upon the idea that biological, psychological and social processes are integrally and interactively involved in physical health and illness. One of the longstanding challenges of dealing with healthcare data is the wide variety of data generated from different sources and the increasing need to learn actionable insights that drive performance improvement. The growth of information and communication technology has led to the increased use of DHI programs. These programs use an observational methodology that helps researchers to study the everyday behaviour of participants during the course of the program by analysing data generated from digital tools such as wearables, online surveys and ecological momentary assessment (EMA). Combined with data reported from biological and psychological tests, this provides rich and unique biopsychosocial data. There is a strong need to review and apply novel approaches to combining biopsychosocial data from a methodological perspective. Although some studies have used data analytics in research on clinical trial data generated from digital interventions, data analytics on biopsychosocial data generated from DHI programs is limited. The study in this thesis develops and implements innovative approaches for analysing the existing unique and rich biopsychosocial data generated from the wellness study, a DHI program conducted by the School of Science, Psychology and Sport at Federation University. The characteristics of variety, value and veracity that usually describe big data are also relevant to the biopsychosocial data handled in this thesis. These historical, retrospective real-life biopsychosocial data provide fertile ground for research through the use of data analytics to discover patterns hidden in the data and to obtain new knowledge. This thesis presents the studies carried out on three aspects of biopsychosocial research. First, we present the salient traits of the three components - biological, psychological and social - of biopsychosocial research. Next, we investigate the challenges of pre-processing biopsychosocial data, placing special emphasis on the time-series data generated from wearable sensor devices. Finally, we present the application of statistical and machine learning (ML) tools to integrate variables from the biopsychosocial disciplines to build a predictive model. The first chapter presents the salient features of the biopsychosocial data for each discipline. The second chapter presents the challenges of pre-processing biopsychosocial data, focusing on the time-series data generated from wearable sensor devices. The third chapter uses statistical and ML tools to integrate variables from the biopsychosocial disciplines to build a predictive model. Among its other important analyses and results, the key contributions of the research described in this thesis include the following: 1. using gamma distribution to model neurocognitive reaction time data that presents interesting properties (skewness and kurtosis for the data distribution) 2. using novel ‘peak heart-rate’ count metric to quantify ‘biological’ stress 3. using the ML approach to evaluate DHIs 4. using a recurrent neural network (RNN) and long short-term memory (LSTM) data prediction model to predict Difficulties in Emotion Regulation Scale (DERS) and primary emotion (PE) using wearable sensor data.Doctor of Philosoph

    On Knowledge Discovery Experimented with Otoneurological Data

    Get PDF
    Diagnosis of otoneurological diseases can be challenging due to similar kind of and overlapping symptoms that can also vary over time. Thus, systems to support and aid diagnosis of vertiginous patients are considered beneficial. This study continues refinement of an otoneurological decision support system ONE and its knowledge base. The aim of the study is to improve the classification accuracy of nine otoneurological diseases in real world situations by applying machine learning methods to knowledge discovery in the otoneurological domain. The phases of the dissertation is divided into three parts: fitness value formation for attribute values, attribute weighting and classification task redefinition. The first phase concentrates on the knowledge update of the ONE with the domain experts and on the knowledge discovery method that forms the fitness values for the values of the attributes. The knowledge base of the ONE needed update due to changes made to data collection questionnaire. The effect of machine learnt fitness values on classification are examined and classification results are compared to the knowledge set by the experts and their combinations. Classification performance of nearest pattern method of the ONE is compared to k-nearest neighbour method (k-NN) and Naïve Bayes (NB). The second phase concentrates on the attribute weighting. Scatter method and instance-based learning algorithms IB4 and IB1w are applied in the attribute weighting. These machine learnt attribute weights in addition to the weights defined by the domain experts and equal weighting are tested with the classification method of the ONE and attribute weighted k-NN with One-vs-All classifiers (wk-NN OVA). Genetic algorithm (GA) approach is examined in the attribute weighting. The machine learnt weight sets are utilized as a starting point with the GA. Populations (the weight sets) are evaluated with the classification method of the ONE, the wk-NN OVA and attribute weighted k-NN using neighbour’s class-based attribute weighting (cwk-NN). In the third phase, the effect of the classification task redefinition is examined. The multi-class classification task is separated into several binary classification tasks. The binary classification is studied without attribute weighting with the k-NN and support vector machines (SVM)
    • …
    corecore