research

The Majority Report - Can we use big data to secure a better future?

Abstract

With the widely adopted use of social media, it now becomes a common platform for calling supporters for civil unrest events. Despite the noble aims of these civil unrest events, sometimes these events might turn violent and disturb the daily lives of the general public. This paper aims to propose a conceptual framework regarding the study of using online social media data to predict offline civil unrest events. We propose to use time-series metrics as the prediction attributes instead of analyzing message contents because the message contents on social media are usually noisy, informal and not so easy to interpret. In the case of a data set containing both civil unrest event dates and normal dates, we found that it contains many more samples from the normal dates class than from the civil unrest event dates class. Thus, creating an imbalanced class problem. We showed using accuracy as the performance metrics could be misleading as civil unrest events were the minority class. Thus, we suggest to use additional tactics to handle the imbalanced class prediction problem. We propose to use a combination of oversampling the minority class and using feature selection techniques to tackle the imbalanced class problem. The current results demonstrate that use of time-series metrics to predict civil unrest events is a possible solution to the problems of handling the noise and unstructured format of social media data contents in the process of analysis and predictions. In addition, we have showed that the combination of special techniques to handle imbalanced class outperformed other classifiers without using such techniques.published_or_final_versio

    Similar works