2 research outputs found

    A Review on Dimension Reduction Techniques in Data Mining

    Get PDF
    Real world data is high-dimensional like images, speech signals containing multiple dimensions to represent data. Higher dimensional data are more complex for detecting and exploiting the relationships among terms. Dimensionality reduction is a technique used for reducing complexity for analyzing high dimensional data. There are many methodologies that are being used to find the Critical Dimensions for a dataset that significantly reduces the number of dimensions. They reduce the dimensions from the original input data. Dimensionality reduction methods can be of two types as feature extractions and feature selection techniques. Feature Extraction is a distinct form of Dimensionality Reduction to extract some important feature from input dataset. Two different approaches available for dimensionality reduction are supervised approach and unsupervised approach. One exclusive purpose of this survey is to provide an adequate comprehension of the different dimensionality reduction techniques that exist currently and also to introduce the applicability of any one of the prescribed methods that depends upon the given set of parameters and varying conditions. This paper surveys the schemes that are majorly used for Dimensionality Reduction mainly high dimension datasets. A comparative analysis of surveyed methodologies is also done, based on which, best methodology for a certain type of dataset can be chosen. Keywords: Data Mining, Dimensionality Reduction, Clustering, feature selection; curse of dimensionality; critical dimensio

    Autoencoder for clinical data analysis and classification : data imputation, dimensional reduction, and pattern recognition

    Get PDF
    Over the last decade, research has focused on machine learning and data mining to develop frameworks that can improve data analysis and output performance; to build accurate decision support systems that benefit from real-life datasets. This leads to the field of clinical data analysis, which has attracted a significant amount of interest in the computing, information systems, and medical fields. To create and develop models by machine learning algorithms, there is a need for a particular type of data for the existing algorithms to build an efficient model. Clinical datasets pose several issues that can affect the classification of the dataset: missing values, high dimensionality, and class imbalance. In order to build a framework for mining the data, it is necessary first to preprocess data, by eliminating patients’ records that have too many missing values, imputing missing values, addressing high dimensionality, and classifying the data for decision support.This thesis investigates a real clinical dataset to solve their challenges. Autoencoder is employed as a tool that can compress data mining methodology, by extracting features and classifying data in one model. The first step in data mining methodology is to impute missing values, so several imputation methods are analysed and employed. Then high dimensionality is demonstrated and used to discard irrelevant and redundant features, in order to improve prediction accuracy and reduce computational complexity. Class imbalance is manipulated to investigate the effect on feature selection algorithms and classification algorithms.The first stage of analysis is to investigate the role of the missing values. Results found that techniques based on class separation will outperform other techniques in predictive ability. The next stage is to investigate the high dimensionality and a class imbalance. However it was found a small set of features that can improve the classification performance, the balancing class does not affect the performance as much as imbalance class
    corecore