5 research outputs found

    LDA-Based Industry Classification

    Get PDF
    Industry classification is a crucial step for financial analysis. However, existing industry classification schemes have several limitations. In order to overcome these limitations, in this paper, we propose an industry classification methodology on the basis of business commonalities using the topic features learned by the Latent Dirichlet Allocation (LDA) from firms’ business descriptions. Two types of classification – firm-centric classification and industry-centric classification were explored. Preliminary evaluation results showed the effectiveness of our method

    Industry Classification Based on Labor Mobility Network Mining

    Get PDF
    Industry classification is important for industry analysis and competitive intelligence. However, existing schemes and methods are limited by the small number of industry categories and the lagged information of firms’ business. In this paper, we propose a novel industry classification method by constructing the labor mobility network from the LinkedIn profiles. We also propose a hierarchical extension of the community detection algorithm to better discover latent industry clusters on the constructed network. The evaluation conducted on real-world datasets shows that our method outperforms the best existing industry classification scheme and the state-of-the-art method and improves their explanatory power by 8.31% and 3.97% respectively. Moreover, our method is effective in earlier revealing firms’ action of entering new industries

    Multi-Industry Simplex : A Probabilistic Extension of GICS

    Full text link
    Accurate industry classification is a critical tool for many asset management applications. While the current industry gold-standard GICS (Global Industry Classification Standard) has proven to be reliable and robust in many settings, it has limitations that cannot be ignored. Fundamentally, GICS is a single-industry model, in which every firm is assigned to exactly one group - regardless of how diversified that firm may be. This approach breaks down for large conglomerates like Amazon, which have risk exposure spread out across multiple sectors. We attempt to overcome these limitations by developing MIS (Multi-Industry Simplex), a probabilistic model that can flexibly assign a firm to as many industries as can be supported by the data. In particular, we utilize topic modeling, an natural language processing approach that utilizes business descriptions to extract and identify corresponding industries. Each identified industry comes with a relevance probability, allowing for high interpretability and easy auditing, circumventing the black-box nature of alternative machine learning approaches. We describe this model in detail and provide two use-cases that are relevant to asset management - thematic portfolios and nearest neighbor identification. While our approach has limitations of its own, we demonstrate the viability of probabilistic industry classification and hope to inspire future research in this field.Comment: 17 pages, 10 figure

    Company2Vec -- German Company Embeddings based on Corporate Websites

    Full text link
    With Company2Vec, the paper proposes a novel application in representation learning. The model analyzes business activities from unstructured company website data using Word2Vec and dimensionality reduction. Company2Vec maintains semantic language structures and thus creates efficient company embeddings in fine-granular industries. These semantic embeddings can be used for various applications in banking. Direct relations between companies and words allow semantic business analytics (e.g. top-n words for a company). Furthermore, industry prediction is presented as a supervised learning application and evaluation method. The vectorized structure of the embeddings allows measuring companies similarities with the cosine distance. Company2Vec hence offers a more fine-grained comparison of companies than the standard industry labels (NACE). This property is relevant for unsupervised learning tasks, such as clustering. An alternative industry segmentation is shown with k-means clustering on the company embeddings. Finally, this paper proposes three algorithms for (1) firm-centric, (2) industry-centric and (3) portfolio-centric peer-firm identification.Comment: Accepted for Publication in: International Journal of Information Technology & Decision Making (2023

    Studies on Machine Learning for Data Analytics in Business Application

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore