We introduce an automatic classification system to eliminate irrelevant source material for the coding of political event data from global news-wires. Our pipeline relies on a high-performance supervised heterogeneous ensemble classifier working on extremely unbalanced training classes. The output is then supplied to human coders for further information extraction, creating a semi-automatic pipeline.
The package includes the software required to train and test the classifier, as well as documentation on how to use it