2,844 research outputs found
Variational Inference of Joint Models using Multivariate Gaussian Convolution Processes
We present a non-parametric prognostic framework for individualized event
prediction based on joint modeling of both longitudinal and time-to-event data.
Our approach exploits a multivariate Gaussian convolution process (MGCP) to
model the evolution of longitudinal signals and a Cox model to map
time-to-event data with longitudinal data modeled through the MGCP. Taking
advantage of the unique structure imposed by convolved processes, we provide a
variational inference framework to simultaneously estimate parameters in the
joint MGCP-Cox model. This significantly reduces computational complexity and
safeguards against model overfitting. Experiments on synthetic and real world
data show that the proposed framework outperforms state-of-the art approaches
built on two-stage inference and strong parametric assumptions
Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models
The interpretation of complex high-dimensional data typically requires the
use of dimensionality reduction techniques to extract explanatory
low-dimensional representations. However, in many real-world problems these
representations may not be sufficient to aid interpretation on their own, and
it would be desirable to interpret the model in terms of the original features
themselves. Our goal is to characterise how feature-level variation depends on
latent low-dimensional representations, external covariates, and non-linear
interactions between the two. In this paper, we propose to achieve this through
a structured kernel decomposition in a hybrid Gaussian Process model which we
call the Covariate Gaussian Process Latent Variable Model (c-GPLVM). We
demonstrate the utility of our model on simulated examples and applications in
disease progression modelling from high-dimensional gene expression data in the
presence of additional phenotypes. In each setting we show how the c-GPLVM can
extract low-dimensional structures from high-dimensional data sets whilst
allowing a breakdown of feature-level variability that is not present in other
commonly used dimensionality reduction approaches
Analysis of overfitting in the regularized Cox model
The Cox proportional hazards model is ubiquitous in the analysis of
time-to-event data. However, when the data dimension p is comparable to the
sample size , maximum likelihood estimates for its regression parameters are
known to be biased or break down entirely due to overfitting. This prompted the
introduction of the so-called regularized Cox model. In this paper we use the
replica method from statistical physics to investigate the relationship between
the true and inferred regression parameters in regularized multivariate Cox
regression with L2 regularization, in the regime where both p and N are large
but with p/N ~ O(1). We thereby generalize a recent study from maximum
likelihood to maximum a posteriori inference. We also establish a relationship
between the optimal regularization parameter and p/N, allowing for
straightforward overfitting corrections in time-to-event analysis
Deep Generative Models for Reject Inference in Credit Scoring
Credit scoring models based on accepted applications may be biased and their
consequences can have a statistical and economic impact. Reject inference is
the process of attempting to infer the creditworthiness status of the rejected
applications. In this research, we use deep generative models to develop two
new semi-supervised Bayesian models for reject inference in credit scoring, in
which we model the data generating process to be dependent on a Gaussian
mixture. The goal is to improve the classification accuracy in credit scoring
models by adding reject applications. Our proposed models infer the unknown
creditworthiness of the rejected applications by exact enumeration of the two
possible outcomes of the loan (default or non-default). The efficient
stochastic gradient optimization technique used in deep generative models makes
our models suitable for large data sets. Finally, the experiments in this
research show that our proposed models perform better than classical and
alternative machine learning models for reject inference in credit scoring
- …