Search CORE

15 research outputs found

Missing Value Imputation With Unsupervised Backpropagation

Author: Gashler Michael S.
Martinez Tony
Morris Richard
Smith Michael R.
Publication venue
Publication date: 18/12/2013
Field of study

Many data mining and data analysis techniques operate on dense matrices or complete tables of data. Real-world data sets, however, often contain unknown values. Even many classification algorithms that are designed to operate with missing values still exhibit deteriorated accuracy. One approach to handling missing values is to fill in (impute) the missing values. In this paper, we present a technique for unsupervised learning called Unsupervised Backpropagation (UBP), which trains a multi-layer perceptron to fit to the manifold sampled by a set of observed point-vectors. We evaluate UBP with the task of imputing missing values in datasets, and show that UBP is able to predict missing values with significantly lower sum-squared error than other collaborative filtering and imputation techniques. We also demonstrate with 24 datasets and 9 supervised learning algorithms that classification accuracy is usually higher when randomly-withheld values are imputed using UBP, rather than with other methods

arXiv.org e-Print Archive

CiteSeerX

Ship machinery condition monitoring using vibration data through supervised learning

Author: Gkerekos Christos
Lazakis Iraklis
Theotokatos Gerasimos
Publication venue: University of Strathclyde Publishing
Publication date: 13/10/2016
Field of study

This paper aims to present an integrated methodology for the monitoring of marine machinery using vibration data. Monitoring of machinery is a crucial aspect of maintenance optimisation that is required for the vessel operation to remain sustainable and profitable. The proposed methodology will train models using pre-classified (healthy/faulty) data and then classify new data points using the models developed. For this, vibration points are first acquired, appropriately processed and stored in a database. Specific features are then extracted from the data and stored. These data are then used to train supervised models pertinent to specific machinery components. Finally, new data are compared against the models developed in order to evaluate their condition. The above will provide a flexible but robust framework for the early detection of emerging machinery faults. This will lead to minimisation of ship downtime and increase of the ship’s operability and income through operational enhancement

University of Strathclyde Institutional Repository

K-Clustering Methods for Investigating Social-Environmental and Natural-Environmental Features Based on Air Quality Index

Author: Chang Victor
Li Yuming
Ni Pin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2020
Field of study

Air pollution has caused environmental and health hazards across the globe, particularly in emerging countries such as China. In this article, we propose the use of air quality index and the development of advanced data processing, analysis, and visualization techniques based on the AI-based k-clustering method. We analyze the air quality data based on seven key attributes and discuss its implications. Our results provide meaningful values and contributions to the current research. Our future work will include the use of advanced AI algorithms and big data techniques to ensure better performance, accuracy and real-time checks

Teeside University's Research Repository

UCL Discovery

Aston Publications Explorer

Practical Strategies for Extreme Missing Data Imputation in Dementia Diagnosis

Author: Bucholc Magda
Ding Xuemei
Finn David P.
Liu Shuo
McClean Paula L.
McCombe Niamh
Prasad Girijesh
Todd Stephen
Wong-Lin KongFatt
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/07/2021
Field of study

Ulster University's Research Portal

Preprocessing of missing values using robust association rules

Author: A. Ragel
G. Celeux
J.R Quinlan
J.R Quinlan
L. Breiman
R.J.A Little
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Differentially Private Data Generation with Missing Data

Author: He Xi
Kerschbaum Florian
Mohapatra Shubhankar
Zong Jianqiao
Publication venue
Publication date: 17/10/2023
Field of study

Despite several works that succeed in generating synthetic data with differential privacy (DP) guarantees, they are inadequate for generating high-quality synthetic data when the input data has missing values. In this work, we formalize the problems of DP synthetic data with missing values and propose three effective adaptive strategies that significantly improve the utility of the synthetic data on four real-world datasets with different types and levels of missing data and privacy requirements. We also identify the relationship between privacy impact for the complete ground truth data and incomplete data for these DP synthetic data generation algorithms. We model the missing mechanisms as a sampling process to obtain tighter upper bounds for the privacy guarantees to the ground truth data. Overall, this study contributes to a better understanding of the challenges and opportunities for using private synthetic data generation algorithms in the presence of missing data.Comment: 18 pages, 9 figures, 2 table

arXiv.org e-Print Archive

Physics-inspired Replica Approaches to Computer Science Problems

Author: Sun Bo
Publication venue: Washington University Open Scholarship
Publication date: 15/08/2017
Field of study

We study machine learning class classification problems and combinatorial optimization problems using physics inspired replica approaches. In the current work, we focus on the traveling salesman problem which is one of the most famous problems in the entire field of combinatorial optimization. Our approach is specifically motivated by the desire to avoid trapping in metastable local minima-a common occurrence in hard problems with multiple extrema. Our method involves (i) coupling otherwise independent simulations of a system (“replicas”) via geometrical distances as well as (ii) probabilistic inference applied to the solutions found by individual replicas. In particular, we apply our method to the well-known “k-opt” algorithm and examine two particular cases-k = 2 and k = 3. With the aid of geometrical coupling alone, we are able to determine for the optimum tour length on systems up to 280 cities (an order of magnitude larger than the largest systems typically solved by the bare k = 3 opt). The probabilistic replica-based inference approach improves k - opt even further and determines the optimal solution of a problem with 318 cities. In this work, we also formulate a supervised machine learning algorithm for classification problems which is called “Stochastic Replica Voting Machine” (SRVM). The method is based on the representations of known data via multiple linear expansions in terms of various stochastic functions. The algorithm is developed, implemented and applied to a binary and a 3-class classification problems in material science. Here, we employ SRVM to predict candidate compounds capable of forming cubic Perovskite structure and further classify binary (AB) solids. We demonstrated that our SRVM method exceeds the well-known Support Vector Machine (SVM) in terms of accuracy when predicting the cubic Perovskite structure. The algorithm has also been tested on 8 diverse training data sets of various types and feature space dimensions from UCI machine learning repository. It has been shown to consistently match or exceed the accuracy of existing algorithms, while simultaneously avoiding many of their pitfalls

Washington University St. Louis: Open Scholarship