1,062 research outputs found

    Imputation techniques for the reconstruction of missing interconnected data from higher Educational Institutions

    Get PDF
    Educational Institutions data constitute the basis for several important analyses on the educational systems; however they often contain not negligible shares of missing values, for several reasons. We consider in this work the relevant case of the European Tertiary Education Register (ETER), describing the Educational Institutions of Europe. The presence of missing values prevents the full exploitation of this database, since several types of analyses that could be performed are currently impracticable. The imputation of artificial data, reconstructed with the aim of being statistically equivalent to the (unknown) missing data, would allow to overcome these problems. A main complication in the imputation of this type of data is given by the correlations that exist among all the variables. We propose several imputation techniques designed to deal with the different types of missing values appearing in these interconnected data. We use these techniques to impute the database. Moreover, we evaluate the accuracy of the proposed approach by artificially introducing missing data, by imputing them, and by comparing imputed and original values. Results show that the information reconstruction does not introduce statistically significant changes in the data and that the imputed values are close enough to the original values

    Data Imputation Using Differential Dependency and Fuzzy Multi-Objective Linear Programming

    Get PDF
    Missing or incomplete data is a serious problem when it comes to collecting and analyzing data for forecasting, estimating, and decision making. Since data quality is so important in machine learning and its results, in most cases data imputation is much more appropriate than ignoring them. Missing data imputation is often based on considering equality, similarity, or distance of neighbors. Researchers use different approaches for neighbors\u27 equalities or similarities. Every approach has its advantages and limitations. Instead of equality, some researchers use inequalities together with a few relationships or similarity rules. In this thesis, after recalling some basic imputation methods, we discus about data imputation based on differential dependencies (DDs). DDs are conditional rules in which the closeness of the values of each pair of tuples in some attribute indicates the closeness of the values of those tuples in another attribute. Considering these rules, a few rows are created for each incomplete row and placed in the set of candidates for that row. Then from each set one row is selected such that they are not incompatible with each other. These selections are made by an integer linear programming (ILP) model. In this thesis, first, we propose an algorithm to generate DDs. Then in order to improve the previous approaches to increase the percentage of imputation, we suggest fuzzy relaxation that allows a little violation from DDs. Finally, we propose a multi-objective fuzzy linear programming to reach an imputation with more percentage of imputation in addition to decrease the summation of violations. A variety of datasets from “Kaggle” is used to support our approach

    SemImput: Bridging Semantic Imputation with Deep Learning for Complex Human Activity Recognition

    Get PDF
    The recognition of activities of daily living (ADL) in smart environments is a well-known and an important research area, which presents the real-time state of humans in pervasive computing. The process of recognizing human activities generally involves deploying a set of obtrusive and unobtrusive sensors, pre-processing the raw data, and building classification models using machine learning (ML) algorithms. Integrating data from multiple sensors is a challenging task due to dynamic nature of data sources. This is further complicated due to semantic and syntactic differences in these data sources. These differences become even more complex if the data generated is imperfect, which ultimately has a direct impact on its usefulness in yielding an accurate classifier. In this study, we propose a semantic imputation framework to improve the quality of sensor data using ontology-based semantic similarity learning. This is achieved by identifying semantic correlations among sensor events through SPARQL queries, and by performing a time-series longitudinal imputation. Furthermore, we applied deep learning (DL) based artificial neural network (ANN) on public datasets to demonstrate the applicability and validity of the proposed approach. The results showed a higher accuracy with semantically imputed datasets using ANN. We also presented a detailed comparative analysis, comparing the results with the state-of-the-art from the literature. We found that our semantic imputed datasets improved the classification accuracy with 95.78% as a higher one thus proving the effectiveness and robustness of learned models
    • …
    corecore