Determining industry and product/service codes for a company is an important
real-world task and is typically very expensive as it involves manual curation
of data about the companies. Building an AI agent that can predict these codes
automatically can significantly help reduce costs, and eliminate human biases
and errors. However, unavailability of labeled datasets as well as the need for
high precision results within the financial domain makes this a challenging
problem. In this work, we propose a hierarchical multi-class industry code
classifier with a targeted multi-label product/service code classifier
leveraging advances in unsupervised representation learning techniques. We
demonstrate how a high quality industry and product/service code classification
system can be built using extremely limited labeled dataset. We evaluate our
approach on a dataset of more than 20,000 companies and achieved a
classification accuracy of more than 92\%. Additionally, we also compared our
approach with a dataset of 350 manually labeled product/service codes provided
by Subject Matter Experts (SMEs) and obtained an accuracy of more than 96\%
resulting in real-life adoption within the financial domain