research

Event-based clustering for reducing labeling costs of event-related microposts

Abstract

Automatically identifying the event type of event-related information in the sheer amount of social media data makes machine learning inevitable. However, this is highly dependent on (1) the number of correctly labeled instances and (2) labeling costs. Active learning has been proposed to reduce the number of instances to label. Albeit the thematic dimension is already used, other metadata such as spatial and temporal information that is helpful for achieving a more fine-grained clustering is currently not taken into account. In this paper, we present a novel event-based clustering strategy that makes use of temporal, spatial, and thematic metadata to determine instances to label. An evaluation on incident-related tweets shows that our selection strategy for active learning outperforms current state-of-the-art approaches even with few labeled instances

    Similar works