Search CORE

2 research outputs found

Heuristic Feature Selection for Clickbait Detection

Author: Hagen Matthias
Potthast Martin
Stein Benno
Völske Michael
Wiegmann Matti
Publication venue
Publication date: 04/02/2018
Field of study

We study feature selection as a means to optimize the baseline clickbait detector employed at the Clickbait Challenge 2017. The challenge's task is to score the "clickbaitiness" of a given Twitter tweet on a scale from 0 (no clickbait) to 1 (strong clickbait). Unlike most other approaches submitted to the challenge, the baseline approach is based on manual feature engineering and does not compete out of the box with many of the deep learning-based approaches. We show that scaling up feature selection efforts to heuristically identify better-performing feature subsets catapults the performance of the baseline classifier to second rank overall, beating 12 other competing approaches and improving over the baseline performance by 20%. This demonstrates that traditional classification approaches can still keep up with deep learning on this task.Comment: Clickbait Challenge 201

arXiv.org e-Print Archive

The Clickbait Challenge 2017: Towards a Regression Model for Clickbait Strength

Author: Gollub Tim
Hagen Matthias
Potthast Martin
Stein Benno
Publication venue
Publication date: 27/12/2018
Field of study

Clickbait has grown to become a nuisance to social media users and social media operators alike. Malicious content publishers misuse social media to manipulate as many users as possible to visit their websites using clickbait messages. Machine learning technology may help to handle this problem, giving rise to automatic clickbait detection. To accelerate progress in this direction, we organized the Clickbait Challenge 2017, a shared task inviting the submission of clickbait detectors for a comparative evaluation. A total of 13 detectors have been submitted, achieving significant improvements over the previous state of the art in terms of detection performance. Also, many of the submitted approaches have been published open source, rendering them reproducible, and a good starting point for newcomers. While the 2017 challenge has passed, we maintain the evaluation system and answer to new registrations in support of the ongoing research on better clickbait detectors

arXiv.org e-Print Archive