1,766 research outputs found
A Survey on Soft Subspace Clustering
Subspace clustering (SC) is a promising clustering technology to identify
clusters based on their associations with subspaces in high dimensional spaces.
SC can be classified into hard subspace clustering (HSC) and soft subspace
clustering (SSC). While HSC algorithms have been extensively studied and well
accepted by the scientific community, SSC algorithms are relatively new but
gaining more attention in recent years due to better adaptability. In the
paper, a comprehensive survey on existing SSC algorithms and the recent
development are presented. The SSC algorithms are classified systematically
into three main categories, namely, conventional SSC (CSSC), independent SSC
(ISSC) and extended SSC (XSSC). The characteristics of these algorithms are
highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201
A General Spatio-Temporal Clustering-Based Non-local Formulation for Multiscale Modeling of Compartmentalized Reservoirs
Representing the reservoir as a network of discrete compartments with
neighbor and non-neighbor connections is a fast, yet accurate method for
analyzing oil and gas reservoirs. Automatic and rapid detection of coarse-scale
compartments with distinct static and dynamic properties is an integral part of
such high-level reservoir analysis. In this work, we present a hybrid framework
specific to reservoir analysis for an automatic detection of clusters in space
using spatial and temporal field data, coupled with a physics-based multiscale
modeling approach. In this work a novel hybrid approach is presented in which
we couple a physics-based non-local modeling framework with data-driven
clustering techniques to provide a fast and accurate multiscale modeling of
compartmentalized reservoirs. This research also adds to the literature by
presenting a comprehensive work on spatio-temporal clustering for reservoir
studies applications that well considers the clustering complexities, the
intrinsic sparse and noisy nature of the data, and the interpretability of the
outcome.
Keywords: Artificial Intelligence; Machine Learning; Spatio-Temporal
Clustering; Physics-Based Data-Driven Formulation; Multiscale Modelin
Multiple Imputation Ensembles (MIE) for dealing with missing data
Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases
A linear model based on Kalman filter for improving neural network classification performance
Neural network has been applied in several classification problems such as in medical diagnosis, handwriting recognition, and product inspection, with a good classification performance. The performance of a neural network is characterized by the neural network's structure, transfer function, and learning algorithm. However, a neural network classifier tends to be weak if it uses an inappropriate structure. The neural network's structure depends on the complexity of the relationship between the input and the output. There are no exact rules that can be used to determine the neural network's structure. Therefore, studies in improving neural network classification performance without changing the neural network's structure is a challenging issue. This paper proposes a method to improve neural network classification performance by constructing a linear model based on the Kalman filter as a post processing. The linear model transforms the predicted output of the neural network to a value close to the desired output by using the linear combination of the object features and the predicted output. This simple transformation will reduce the error of neural network and improve classification performance. The Kalman filter iteration is used to estimate the parameters of the linear model. Five datasets from various domains with various characteristics, such as attribute types, the number of attributes, the number of samples, and the number of classes, were used for empirical validation. The validation results show that the linear model based on the Kalman filter can improve the performance of the original neural network
Clustering Data of Mixed Categorical and Numerical Type with Unsupervised Feature Learning
Mixed-type categorical and numerical data are a challenge in many applications. This general area of mixed-type data is among the frontier areas, where computational intelligence approaches are often brittle compared with the capabilities of living creatures. In this paper, unsupervised feature learning (UFL) is applied to the mixed-type data to achieve a sparse representation, which makes it easier for clustering algorithms to separate the data. Unlike other UFL methods that work with homogeneous data, such as image and video data, the presented UFL works with the mixed-type data using fuzzy adaptive resonance theory (ART). UFL with fuzzy ART (UFLA) obtains a better clustering result by removing the differences in treating categorical and numeric features. The advantages of doing this are demonstrated with several real-world data sets with ground truth, including heart disease, teaching assistant evaluation, and credit approval. The approach is also demonstrated on noisy, mixed-type petroleum industry data. UFLA is compared with several alternative methods. To the best of our knowledge, this is the first time UFL has been extended to accomplish the fusion of mixed data types
- …