This paper aims to categorize bank transactions using weak supervision,
natural language processing, and deep neural network techniques. Our approach
minimizes the reliance on expensive and difficult-to-obtain manual annotations
by leveraging heuristics and domain knowledge to train accurate transaction
classifiers. We present an effective and scalable end-to-end data pipeline,
including data preprocessing, transaction text embedding, anchoring, label
generation, discriminative neural network training, and an overview of the
system architecture. We demonstrate the effectiveness of our method by showing
it outperforms existing market-leading solutions, achieves accurate
categorization, and can be quickly extended to novel and composite use cases.
This can in turn unlock many financial applications such as financial health
reporting and credit risk assessment