1 research outputs found
Curating Social Media Data
Social media platforms have empowered the democratization of the pulse of
people in the modern era. Due to its immense popularity and high usage, data
published on social media sites (e.g., Twitter, Facebook and Tumblr) is a rich
ocean of information. Therefore data-driven analytics of social imprints has
become a vital asset for organisations and governments to further improve their
products and services. However, due to the dynamic and noisy nature of social
media data, performing accurate analysis on raw data is a challenging task. A
key requirement is to curate the raw data before fed into analytics pipelines.
This curation process transforms the raw data into contextualized data and
knowledge. We propose a data curation pipeline, namely CrowdCorrect, to enable
analysts cleansing and curating social data and preparing it for reliable
analytics. Our pipeline provides an automatic feature extraction from a corpus
of social media data using existing in-house tools. Further, we offer a
dual-correction mechanism using both automated and crowd-sourced approaches.
The implementation of this pipeline also includes a set of tools for
automatically creating micro-tasks to facilitate the contribution of crowd
users in curating the raw data. For the purposes of this research, we use
Twitter as our motivational social media data platform due to its popularity.Comment: Masters by Research Thesi