7,175 research outputs found
A Survey on Soft Subspace Clustering
Subspace clustering (SC) is a promising clustering technology to identify
clusters based on their associations with subspaces in high dimensional spaces.
SC can be classified into hard subspace clustering (HSC) and soft subspace
clustering (SSC). While HSC algorithms have been extensively studied and well
accepted by the scientific community, SSC algorithms are relatively new but
gaining more attention in recent years due to better adaptability. In the
paper, a comprehensive survey on existing SSC algorithms and the recent
development are presented. The SSC algorithms are classified systematically
into three main categories, namely, conventional SSC (CSSC), independent SSC
(ISSC) and extended SSC (XSSC). The characteristics of these algorithms are
highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201
Input variable selection in time-critical knowledge integration applications: A review, analysis, and recommendation paper
This is the post-print version of the final paper published in Advanced Engineering Informatics. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.The purpose of this research is twofold: first, to undertake a thorough appraisal of existing Input Variable Selection (IVS) methods within the context of time-critical and computation resource-limited dimensionality reduction problems; second, to demonstrate improvements to, and the application of, a recently proposed time-critical sensitivity analysis method called EventTracker to an environment science industrial use-case, i.e., sub-surface drilling.
Producing time-critical accurate knowledge about the state of a system (effect) under computational and data acquisition (cause) constraints is a major challenge, especially if the knowledge required is critical to the system operation where the safety of operators or integrity of costly equipment is at stake. Understanding and interpreting, a chain of interrelated events, predicted or unpredicted, that may or may not result in a specific state of the system, is the core challenge of this research. The main objective is then to identify which set of input data signals has a significant impact on the set of system state information (i.e. output). Through a cause-effect analysis technique, the proposed technique supports the filtering of unsolicited data that can otherwise clog up the communication and computational capabilities of a standard supervisory control and data acquisition system.
The paper analyzes the performance of input variable selection techniques from a series of perspectives. It then expands the categorization and assessment of sensitivity analysis methods in a structured framework that takes into account the relationship between inputs and outputs, the nature of their time series, and the computational effort required. The outcome of this analysis is that established methods have a limited suitability for use by time-critical variable selection applications. By way of a geological drilling monitoring scenario, the suitability of the proposed EventTracker Sensitivity Analysis method for use in high volume and time critical input variable selection problems is demonstrated.E
One-class classifiers based on entropic spanning graphs
One-class classifiers offer valuable tools to assess the presence of outliers
in data. In this paper, we propose a design methodology for one-class
classifiers based on entropic spanning graphs. Our approach takes into account
the possibility to process also non-numeric data by means of an embedding
procedure. The spanning graph is learned on the embedded input data and the
outcoming partition of vertices defines the classifier. The final partition is
derived by exploiting a criterion based on mutual information minimization.
Here, we compute the mutual information by using a convenient formulation
provided in terms of the -Jensen difference. Once training is
completed, in order to associate a confidence level with the classifier
decision, a graph-based fuzzy model is constructed. The fuzzification process
is based only on topological information of the vertices of the entropic
spanning graph. As such, the proposed one-class classifier is suitable also for
data characterized by complex geometric structures. We provide experiments on
well-known benchmarks containing both feature vectors and labeled graphs. In
addition, we apply the method to the protein solubility recognition problem by
considering several representations for the input samples. Experimental results
demonstrate the effectiveness and versatility of the proposed method with
respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification
Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN,
Vancouver, Canad
- âŠ