123 research outputs found
Pair-copula constructions of multiple dependence
Building on the work of Bedford, Cooke and Joe, we show how multivariate data, which exhibit complex patterns of dependence in the tails, can be modelled using a cascade of pair-copulae, acting on two variables at a time. We use the pair-copula decomposition of a general multivariate distribution and propose a method to perform inference. The model construction is hierarchical in nature, the various levels corresponding to the incorporation of more variables in the conditioning sets, using pair-copulae as simple building blocs. Pair-copula decomposed models also represent a very flexible way to construct higher-dimensional coplulae. We apply the methodology to a financial data set. Our approach represents the first step towards developing of an unsupervised algorithm that explores the space of possible pair-copula models, that also can be applied to huge data sets automatically
Models for construction of multivariate dependence
In this article we review models for construction of higher-dimensional dependence that have arisen recent years. A multivariate data set, which exhibit complex patterns of dependence, particularly in the tails, can be modelled using a cascade of lower-dimensional copulae. We examine two such models that differ in their construction of the dependency structure, namely the nested Archimedean constructions and the pair-copula constructions (also referred to as vines). The constructions are compared, and estimation- and simulation techniques are examined. The fit of the two constructions is tested on two different four-dimensional data sets; precipitation values and equity returns, using a state of the art copula goodness-of-fit procedure. The nested Archimedean construction is strongly rejected for both our data sets, while the pair-copula construction provides an appropriate fit. Through VaR calculations, we show that the latter does not overfit data, but works very well even out-of-sample
Learning Latent Representations of Bank Customers With The Variational Autoencoder
Learning data representations that reflect the customers' creditworthiness
can improve marketing campaigns, customer relationship management, data and
process management or the credit risk assessment in retail banks. In this
research, we adopt the Variational Autoencoder (VAE), which has the ability to
learn latent representations that contain useful information. We show that it
is possible to steer the latent representations in the latent space of the VAE
using the Weight of Evidence and forming a specific grouping of the data that
reflects the customers' creditworthiness. Our proposed method learns a latent
representation of the data, which shows a well-defied clustering structure
capturing the customers' creditworthiness. These clusters are well suited for
the aforementioned banks' activities. Further, our methodology generalizes to
new customers, captures high-dimensional and complex financial data, and scales
to large data sets.Comment: arXiv admin note: substantial text overlap with arXiv:1806.0253
Deep Generative Models for Reject Inference in Credit Scoring
Credit scoring models based on accepted applications may be biased and their
consequences can have a statistical and economic impact. Reject inference is
the process of attempting to infer the creditworthiness status of the rejected
applications. In this research, we use deep generative models to develop two
new semi-supervised Bayesian models for reject inference in credit scoring, in
which we model the data generating process to be dependent on a Gaussian
mixture. The goal is to improve the classification accuracy in credit scoring
models by adding reject applications. Our proposed models infer the unknown
creditworthiness of the rejected applications by exact enumeration of the two
possible outcomes of the loan (default or non-default). The efficient
stochastic gradient optimization technique used in deep generative models makes
our models suitable for large data sets. Finally, the experiments in this
research show that our proposed models perform better than classical and
alternative machine learning models for reject inference in credit scoring
Multimodal Generative Models for Bankruptcy Prediction Using Textual Data
Textual data from financial filings, e.g., the Management's Discussion &
Analysis (MDA) section in Form 10-K, has been used to improve the prediction
accuracy of bankruptcy models. In practice, however, we cannot obtain the MDA
section for all public companies, which limits the use of MDA data in
traditional bankruptcy models, as they need complete data to make predictions.
The two main reasons for the lack of MDA are: (i) not all companies are obliged
to submit the MDA and (ii) technical problems arise when crawling and scrapping
the MDA section. To solve this limitation, this research introduces the
Conditional Multimodal Discriminative (CMMD) model that learns multimodal
representations that embed information from accounting, market, and textual
data modalities. The CMMD model needs a sample with all data modalities for
model training. At test time, the CMMD model only needs access to accounting
and market modalities to generate multimodal representations, which are further
used to make bankruptcy predictions and to generate words from the missing MDA
modality. With this novel methodology, it is realistic to use textual data in
bankruptcy prediction models, since accounting and market data are available
for all companies, unlike textual data. The empirical results of this research
show that if financial regulators, or investors, were to use traditional models
using MDA data, they would only be able to make predictions for 60% of the
companies. Furthermore, the classification performance of our proposed
methodology is superior to that of a large number of traditional classifier
models, taking into account all the companies in our sample
Pair-copula constructions of multiple dependence
Building on the work of Bedford, Cooke and Joe, we show how multivariate data, which exhibit complex patterns of dependence in the tails, can be modelled using a cascade of pair-copulae, acting on two variables at a time. We use the pair-copula decomposition of a general multivariate distribution and propose a method to perform inference. The model construction is hierarchical in nature, the various levels corresponding to the incorporation of more variables in the conditioning sets, using pair-copulae as simple building blocs. Pair-copula decomposed models also represent a very flexible way to construct higher-dimensional coplulae. We apply the methodology to a financial data set. Our approach represents the first step towards developing of an unsupervised algorithm that explores the space of possible pair-copula models, that also can be applied to huge data sets automatically
Discriminative Multimodal Learning via Conditional Priors in Generative Models
Deep generative models with latent variables have been used lately to learn
joint representations and generative processes from multi-modal data. These two
learning mechanisms can, however, conflict with each other and representations
can fail to embed information on the data modalities. This research studies the
realistic scenario in which all modalities and class labels are available for
model training, but where some modalities and labels required for downstream
tasks are missing. We show, in this scenario, that the variational lower bound
limits mutual information between joint representations and missing modalities.
We, to counteract these problems, introduce a novel conditional multi-modal
discriminative model that uses an informative prior distribution and optimizes
a likelihood-free objective function that maximizes mutual information between
joint representations and missing modalities. Extensive experimentation
demonstrates the benefits of our proposed model, empirical results show that
our model achieves state-of-the-art results in representative problems such as
downstream classification, acoustic inversion, and image and annotation
generation
- …