11,394 research outputs found
Dynamic feature selection for clustering high dimensional data streams
open access articleChange in a data stream can occur at the concept level and at the feature level. Change at the feature level can occur if new, additional features appear in the stream or if the importance and relevance of a feature changes as the stream progresses. This type of change has not received as much attention as concept-level change. Furthermore, a lot of the methods proposed for clustering streams (density-based, graph-based, and grid-based) rely on some form of distance as a similarity metric and this is problematic in high-dimensional data where the curse of dimensionality renders distance measurements and any concept of “density” difficult. To address these two challenges we propose combining them and framing the problem as a feature selection problem, specifically a dynamic feature selection problem. We propose a dynamic feature mask for clustering high dimensional data streams. Redundant features are masked and clustering is performed along unmasked, relevant features. If a feature's perceived importance changes, the mask is updated accordingly; previously unimportant features are unmasked and features which lose relevance become masked. The proposed method is algorithm-independent and can be used with any of the existing density-based clustering algorithms which typically do not have a mechanism for dealing with feature drift and struggle with high-dimensional data. We evaluate the proposed method on four density-based clustering algorithms across four high-dimensional streams; two text streams and two image streams. In each case, the proposed dynamic feature mask improves clustering performance and reduces the processing time required by the underlying algorithm. Furthermore, change at the feature level can be observed and tracked
FSL-BM: Fuzzy Supervised Learning with Binary Meta-Feature for Classification
This paper introduces a novel real-time Fuzzy Supervised Learning with Binary
Meta-Feature (FSL-BM) for big data classification task. The study of real-time
algorithms addresses several major concerns, which are namely: accuracy, memory
consumption, and ability to stretch assumptions and time complexity. Attaining
a fast computational model providing fuzzy logic and supervised learning is one
of the main challenges in the machine learning. In this research paper, we
present FSL-BM algorithm as an efficient solution of supervised learning with
fuzzy logic processing using binary meta-feature representation using Hamming
Distance and Hash function to relax assumptions. While many studies focused on
reducing time complexity and increasing accuracy during the last decade, the
novel contribution of this proposed solution comes through integration of
Hamming Distance, Hash function, binary meta-features, binary classification to
provide real time supervised method. Hash Tables (HT) component gives a fast
access to existing indices; and therefore, the generation of new indices in a
constant time complexity, which supersedes existing fuzzy supervised algorithms
with better or comparable results. To summarize, the main contribution of this
technique for real-time Fuzzy Supervised Learning is to represent hypothesis
through binary input as meta-feature space and creating the Fuzzy Supervised
Hash table to train and validate model.Comment: FICC201
- …