2 research outputs found
Novel Metaknowledge-based Processing Technique for Multimedia Big Data clustering challenges
Past research has challenged us with the task of showing relational patterns
between text-based data and then clustering for predictive analysis using Golay
Code technique. We focus on a novel approach to extract metaknowledge in
multimedia datasets. Our collaboration has been an on-going task of studying
the relational patterns between datapoints based on metafeatures extracted from
metaknowledge in multimedia datasets. Those selected are significant to suit
the mining technique we applied, Golay Code algorithm. In this research paper
we summarize findings in optimization of metaknowledge representation for
23-bit representation of structured and unstructured multimedia data in order
toComment: IEEE Multimedia Big Data (BigMM 2015
Clustering-based binary-class classification for imbalanced data sets
In this paper, we propose a new clustering-based binary-class classification framework that integrates the clustering technique into a binary-class classification approach to handle the imbalanced data sets. A binary-class classifier is designed to classify a set of data instances into two classes; while the clustering technique partitions the data instances into groups according to their similarity to each other. After applying a clustering algorithm, the data instances within the same group usually have a higher similarity, and the differences among the data instances between different groups should be larger. In our proposed framework, all negative data instances are first clustered into a set of negative groups. Next, the negative data instances in each negative group are combined with all positive data instances to construct a balanced binary-class data set. Finally, subspace models trained on these balanced binary-class data sets are integrated with the subspace model trained on the original imbalanced data set to form the proposed classification model. Experimental results demonstrate that our proposed classification framework performs better than the comparative classification approaches as well as the subspace modeling method trained on the original data set alone