5 research outputs found
LDA-Based Industry Classification
Industry classification is a crucial step for financial analysis. However, existing industry classification schemes have several limitations. In order to overcome these limitations, in this paper, we propose an industry classification methodology on the basis of business commonalities using the topic features learned by the Latent Dirichlet Allocation (LDA) from firms’ business descriptions. Two types of classification – firm-centric classification and industry-centric classification were explored. Preliminary evaluation results showed the effectiveness of our method
Industry Classification Based on Labor Mobility Network Mining
Industry classification is important for industry analysis and competitive intelligence. However, existing schemes and methods are limited by the small number of industry categories and the lagged information of firms’ business. In this paper, we propose a novel industry classification method by constructing the labor mobility network from the LinkedIn profiles. We also propose a hierarchical extension of the community detection algorithm to better discover latent industry clusters on the constructed network. The evaluation conducted on real-world datasets shows that our method outperforms the best existing industry classification scheme and the state-of-the-art method and improves their explanatory power by 8.31% and 3.97% respectively. Moreover, our method is effective in earlier revealing firms’ action of entering new industries
Multi-Industry Simplex : A Probabilistic Extension of GICS
Accurate industry classification is a critical tool for many asset management
applications. While the current industry gold-standard GICS (Global Industry
Classification Standard) has proven to be reliable and robust in many settings,
it has limitations that cannot be ignored. Fundamentally, GICS is a
single-industry model, in which every firm is assigned to exactly one group -
regardless of how diversified that firm may be. This approach breaks down for
large conglomerates like Amazon, which have risk exposure spread out across
multiple sectors. We attempt to overcome these limitations by developing MIS
(Multi-Industry Simplex), a probabilistic model that can flexibly assign a firm
to as many industries as can be supported by the data. In particular, we
utilize topic modeling, an natural language processing approach that utilizes
business descriptions to extract and identify corresponding industries. Each
identified industry comes with a relevance probability, allowing for high
interpretability and easy auditing, circumventing the black-box nature of
alternative machine learning approaches. We describe this model in detail and
provide two use-cases that are relevant to asset management - thematic
portfolios and nearest neighbor identification. While our approach has
limitations of its own, we demonstrate the viability of probabilistic industry
classification and hope to inspire future research in this field.Comment: 17 pages, 10 figure
Company2Vec -- German Company Embeddings based on Corporate Websites
With Company2Vec, the paper proposes a novel application in representation
learning. The model analyzes business activities from unstructured company
website data using Word2Vec and dimensionality reduction. Company2Vec maintains
semantic language structures and thus creates efficient company embeddings in
fine-granular industries. These semantic embeddings can be used for various
applications in banking. Direct relations between companies and words allow
semantic business analytics (e.g. top-n words for a company). Furthermore,
industry prediction is presented as a supervised learning application and
evaluation method. The vectorized structure of the embeddings allows measuring
companies similarities with the cosine distance. Company2Vec hence offers a
more fine-grained comparison of companies than the standard industry labels
(NACE). This property is relevant for unsupervised learning tasks, such as
clustering. An alternative industry segmentation is shown with k-means
clustering on the company embeddings. Finally, this paper proposes three
algorithms for (1) firm-centric, (2) industry-centric and (3) portfolio-centric
peer-firm identification.Comment: Accepted for Publication in: International Journal of Information
Technology & Decision Making (2023
Studies on Machine Learning for Data Analytics in Business Application
Ph.DDOCTOR OF PHILOSOPH