Search CORE

62 research outputs found

Missing Value Imputation With Unsupervised Backpropagation

Author: Gashler Michael S.
Martinez Tony
Morris Richard
Smith Michael R.
Publication venue
Publication date: 18/12/2013
Field of study

Many data mining and data analysis techniques operate on dense matrices or complete tables of data. Real-world data sets, however, often contain unknown values. Even many classification algorithms that are designed to operate with missing values still exhibit deteriorated accuracy. One approach to handling missing values is to fill in (impute) the missing values. In this paper, we present a technique for unsupervised learning called Unsupervised Backpropagation (UBP), which trains a multi-layer perceptron to fit to the manifold sampled by a set of observed point-vectors. We evaluate UBP with the task of imputing missing values in datasets, and show that UBP is able to predict missing values with significantly lower sum-squared error than other collaborative filtering and imputation techniques. We also demonstrate with 24 datasets and 9 supervised learning algorithms that classification accuracy is usually higher when randomly-withheld values are imputed using UBP, rather than with other methods

arXiv.org e-Print Archive

CiteSeerX

Solving Incomplete Datasets in Soft Set Using Supported Sets and Aggregate Values

Author: Awang Mohd Isa
Deris Mustafa Mat
Hassan Hasni
Mahiddin Nor Aida
Mohd Amin Hidayatulaminah
Mohd Rose Ahmad Nazari
Publication venue: Published by Elsevier B.V.
Publication date: 31/12/2011
Field of study

AbstractThe theory of soft set proposed by Molodtsovin 1999[1]is a new method for handling uncertain data and can be defined as a Boolean-valued information system. Ithas been applied to data analysis and decision support systems based on large datasets. In this paper, it is shown that calculated support value can be used to determine missing attribute value of an object. However, in cases when more than one value is missing, the aggregate values and calculated support values will be used in determining the missing values. By successfully recovering missing attribute values, the integrity of a dataset can still been maintained

Elsevier - Publisher Connector

Bayesian network classification of gastrointestinal bleeding

Author: Adam Mohd Bakri
Mustapha Aida
Nazziwa Aisha
Shohaimi Shamarina
Publication venue: Universiti Putra Malaysia Press
Publication date: 01/01/2014
Field of study

The source of gastrointestinal bleeding (GIB) remains uncertain in patients presenting without hematemesis. This paper aims at studying the accuracy, specificity and sensitivity of the Naive Bayesian Classifier (NBC) in identifying the source of GIB in the absence of hematemesis. Data of 325 patients admitted via the emergency department (ED) for GIB without hematemesis and who underwent confirmatory testing were analysed. Six attributes related to demography and their presenting signs were chosen. NBC was used to calculate the conditional probability of an individual being assigned to Upper Gastrointestinal bleeding (UGIB) or Lower Gastrointestinal bleeding (LGIB). High classification accuracy (87.3 %), specificity (0.85) and sensitivity (0.88) were achieved. NBC is a useful tool to support the identification of the source of gastrointestinal bleeding in patients without hematemesis

Universiti Putra Malaysia Institutional Repository

Improved Heterogeneous Distance Functions

Author: Martinez T. R.
Wilson D. R.
Publication venue
Publication date: 31/12/1996
Field of study

Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

arXiv.org e-Print Archive

CiteSeerX

Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals

Author: Demiroz G.
Guvenir H. A.
Ilter N.
Publication venue: 'Elsevier BV'
Publication date: 01/01/1998
Field of study

Cataloged from PDF version of article.A new classification algorithm, called VFI5 (for Voting Feature Intervals), is developed and applied to problem of differential diagnosis of erythemato-squamous diseases. The domain contains records of patients with known diagnosis. Given a training set of such records, the VFI5 classifier learns how to differentiate a new case in the domain. VFI5 represents a concept in the form of feature intervals on each feature dimension separately. classification in the VFI5 algorithm is based on a real-valued voting. Each feature equally participates in the voting process and the class that receives the maximum amount of votes is declared to be the predicted class. The performance of the VFI5 classifier is evaluated empirically in terms of classification accuracy and running time. (C) 1998 Elsevier Science B.V. All rights reserved

Bilkent University Institutional Repository

Iterative missing value imputation based on feature importance

Author: Guo Cong
Liu Chun
Yang Wei
Publication venue
Publication date: 14/11/2023
Field of study

Many datasets suffer from missing values due to various reasons,which not only increases the processing difficulty of related tasks but also reduces the accuracy of classification. To address this problem, the mainstream approach is to use missing value imputation to complete the dataset. Existing imputation methods estimate the missing parts based on the observed values in the original feature space, and they treat all features as equally important during data completion, while in fact different features have different importance. Therefore, we have designed an imputation method that considers feature importance. This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance. Our experimental analysis involves three types of datasets: synthetic datasets with different noisy features and missing values, real-world datasets with artificially generated missing values, and real-world datasets originally containing missing values. The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.To the best of our knowledge, this is the first work that considers feature importance in the imputation model

arXiv.org e-Print Archive

Encouraging experimental results on learning CNF

Author: B. Porter
G. Pagallo
G. Pagallo
J.C. Schlimmer
J.R. Quinlan
J.R. Quinlan
J.W. Shavlik
L. Breiman
L. Pitt
L.G. Valiant
M.O. Noordewier
P. Clark
P.M. Murphy
R.S. Michalski
Raymond J. Mooney
U.M. Fayyad
W. Buntine
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref