1 research outputs found
Labeling large scale social media data using budget-driven One-class SVM classification
The social media classification problems draw more and more attention in the
past few years. With the rapid development of Internet and the popularity of computers,
there is astronomical amount of information in the social network (social
media platforms). The datasets are generally large scale and are often corrupted by
noise. The presence of noise in training set has strong impact on the performance
of supervised learning (classification) techniques. A budget-driven One-class SVM
approach is presented in this thesis that is suitable for large scale social media data
classification.
Our approach is based on an existing online One-class SVM learning algorithm,
referred as STOCS (Self-Tuning One-Class SVM) algorithm. To justify our choice,
we first analyze the noise-resilient ability of STOCS using synthetic data. The experiments
suggest that STOCS is more robust against label noise than several other
existing approaches. Next, to handle big data classification problem for social media
data, we introduce several budget driven features, which allow the algorithm to be
trained within limited time and under limited memory requirement. Besides, the
resulting algorithm can be easily adapted to changes in dynamic data with minimal
computational cost. Compared with two state-of-the-art approaches, Lib-Linear and
kNN, our approach is shown to be competitive with lower requirements of memory
and time