2 research outputs found
Learning Embedded Representation of the Stock Correlation Matrix using Graph Machine Learning
Understanding non-linear relationships among financial instruments has
various applications in investment processes ranging from risk management,
portfolio construction and trading strategies. Here, we focus on
interconnectedness among stocks based on their correlation matrix which we
represent as a network with the nodes representing individual stocks and the
weighted links between pairs of nodes representing the corresponding pair-wise
correlation coefficients. The traditional network science techniques, which are
extensively utilized in financial literature, require handcrafted features such
as centrality measures to understand such correlation networks. However,
manually enlisting all such handcrafted features may quickly turn out to be a
daunting task. Instead, we propose a new approach for studying nuances and
relationships within the correlation network in an algorithmic way using a
graph machine learning algorithm called Node2Vec. In particular, the algorithm
compresses the network into a lower dimensional continuous space, called an
embedding, where pairs of nodes that are identified as similar by the algorithm
are placed closer to each other. By using log returns of S&P 500 stock data, we
show that our proposed algorithm can learn such an embedding from its
correlation network. We define various domain specific quantitative (and
objective) and qualitative metrics that are inspired by metrics used in the
field of Natural Language Processing (NLP) to evaluate the embeddings in order
to identify the optimal one. Further, we discuss various applications of the
embeddings in investment management.Comment: 8 pages, 2 column format, 3 figure, 7 table
Towards Enhanced Local Explainability of Random Forests: a Proximity-Based Approach
We initiate a novel approach to explain the out of sample performance of
random forest (RF) models by exploiting the fact that any RF can be formulated
as an adaptive weighted K nearest-neighbors model. Specifically, we use the
proximity between points in the feature space learned by the RF to re-write
random forest predictions exactly as a weighted average of the target labels of
training data points. This linearity facilitates a local notion of
explainability of RF predictions that generates attributions for any model
prediction across observations in the training set, and thereby complements
established methods like SHAP, which instead generates attributions for a model
prediction across dimensions of the feature space. We demonstrate this approach
in the context of a bond pricing model trained on US corporate bond trades, and
compare our approach to various existing approaches to model explainability.Comment: 5 pages, 6 figure