Automated classification of web contents in B2B marketing

Abstract

Recent growth in digitization has affected how customers seek the information they need to make a purchase decision. This trend of customers making their purchase decision based on the information they collect online is increasing. To accommodate this change in purchase behavior, companies tend to share as much information about themselves and their products online, which in turn drives the amount of unstructured data produced. To get value for this huge amount of data being produced, the unstructured data needs to be processed before being used in digital marketing applications. When it comes to the companies serving business to customers (B2C), plenty of research exists on how the digital content could be used for marketing, but for the companies serving business to business (B2B) a huge research gap presides. B2C marketing and B2B marketing might share some analytical concepts but they are different domains. Not much research has been done in the field of using machine learning in B2B digital marketing. The lack of availability of labeled text data from the B2B domain makes it challenging for researchers to experiment on text classification models, while several methods have been proposed and used to classify unstructured text data in marketing and other domains. This thesis studies previous works done in the field of text classification in general, in the marketing domain, and compares those methods across the dataset available for this research. Text classification methods such as Random Forest, Linear SVM, KNN, Multinomial Naïve Bayes, and Multinomial Logistic Regression dominates the research field, hence these methods are tested in this research. In the used dataset surprisingly, Random Forest Classifier performed best with an average accuracy of 0.85 in the designed five-class classification task

    Similar works