Search CORE

2 research outputs found

Learning Embedded Representation of the Stock Correlation Matrix using Graph Machine Learning

Author: Mehta Dhagash
Nair Nayana
Pasquali Stefano
Sarmah Bhaskarjit
Publication venue
Publication date: 14/07/2022
Field of study

Understanding non-linear relationships among financial instruments has various applications in investment processes ranging from risk management, portfolio construction and trading strategies. Here, we focus on interconnectedness among stocks based on their correlation matrix which we represent as a network with the nodes representing individual stocks and the weighted links between pairs of nodes representing the corresponding pair-wise correlation coefficients. The traditional network science techniques, which are extensively utilized in financial literature, require handcrafted features such as centrality measures to understand such correlation networks. However, manually enlisting all such handcrafted features may quickly turn out to be a daunting task. Instead, we propose a new approach for studying nuances and relationships within the correlation network in an algorithmic way using a graph machine learning algorithm called Node2Vec. In particular, the algorithm compresses the network into a lower dimensional continuous space, called an embedding, where pairs of nodes that are identified as similar by the algorithm are placed closer to each other. By using log returns of S&P 500 stock data, we show that our proposed algorithm can learn such an embedding from its correlation network. We define various domain specific quantitative (and objective) and qualitative metrics that are inspired by metrics used in the field of Natural Language Processing (NLP) to evaluate the embeddings in order to identify the optimal one. Further, we discuss various applications of the embeddings in investment management.Comment: 8 pages, 2 column format, 3 figure, 7 table

arXiv.org e-Print Archive

Towards Enhanced Local Explainability of Random Forests: a Proximity-Based Approach

Author: Desai Dhruv
Mehta Dhagash
Onay Deran
Pasquali Stefano
Rosaler Joshua
Sarmah Bhaskarjit
Vamvourellis Dimitrios
Publication venue
Publication date: 18/10/2023
Field of study

We initiate a novel approach to explain the out of sample performance of random forest (RF) models by exploiting the fact that any RF can be formulated as an adaptive weighted K nearest-neighbors model. Specifically, we use the proximity between points in the feature space learned by the RF to re-write random forest predictions exactly as a weighted average of the target labels of training data points. This linearity facilitates a local notion of explainability of RF predictions that generates attributions for any model prediction across observations in the training set, and thereby complements established methods like SHAP, which instead generates attributions for a model prediction across dimensions of the feature space. We demonstrate this approach in the context of a bond pricing model trained on US corporate bond trades, and compare our approach to various existing approaches to model explainability.Comment: 5 pages, 6 figure

arXiv.org e-Print Archive