211 research outputs found
Learning Space-Time Continuous Neural PDEs from Partially Observed States
We introduce a novel grid-independent model for learning partial differential
equations (PDEs) from noisy and partial observations on irregular
spatiotemporal grids. We propose a space-time continuous latent neural PDE
model with an efficient probabilistic framework and a novel encoder design for
improved data efficiency and grid independence. The latent state dynamics are
governed by a PDE model that combines the collocation method and the method of
lines. We employ amortized variational inference for approximate posterior
estimation and utilize a multiple shooting technique for enhanced training
speed and stability. Our model demonstrates state-of-the-art performance on
complex synthetic and real-world datasets, overcoming limitations of previous
approaches and effectively handling partially-observed data. The proposed model
outperforms recent methods, showing its potential to advance data-driven PDE
modeling and enabling robust, grid-independent modeling of complex
partially-observed dynamic processes
Modeling binding specificities of transcription factor pairs with random forests
Background Transcription factors (TFs) bind regulatory DNA regions with sequence specificity, form complexes and regulate gene expression. In cooperative TF-TF binding, two transcription factors bind onto a shared DNA binding site as a pair. Previous work has demonstrated pairwise TF-TF-DNA interactions with position weight matrices (PWMs), which may however not sufficiently take into account the complexity and flexibility of pairwise binding. Results We propose two random forest (RF) methods for joint TF-TF binding site prediction: ComBind and JointRF. We train models with previously published large-scale CAP-SELEX DNA libraries, which comprise DNA sequences enriched for binding of a selected TF pair. JointRF builds a random forest with sub-sequences selected from CAP-SELEX DNA reads with previously proposed pairwise PWM. JointRF outperforms (area under receiver operating characteristics curve, AUROC, 0.75) the current state-of-the-art method i.e. orientation and spacing specific pairwise PWMs (AUROC 0.59). Thus, JointRF may be utilized to improve prediction accuracy for pre-determined binding preferences. However, pairwise TF binding is currently considered flexible; a pair may bind DNA with different orientations and amounts of dinucleotide gaps or overlap between the two motifs. Thus, we developed ComBind, which utilizes random forests by considering simultaneously multiple orientations and spacings of the two factors. Our approach outperforms (AUROC 0.78) PWMs, as well as JointRF (pPeer reviewe
Latent Neural ODEs with Sparse Bayesian Multiple Shooting
Training dynamic models, such as neural ODEs, on long trajectories is a hard
problem that requires using various tricks, such as trajectory splitting, to
make model training work in practice. These methods are often heuristics with
poor theoretical justifications, and require iterative manual tuning. We
propose a principled multiple shooting technique for neural ODEs that splits
the trajectories into manageable short segments, which are optimised in
parallel, while ensuring probabilistic control on continuity over consecutive
segments. We derive variational inference for our shooting-based latent neural
ODE models and propose amortized encodings of irregularly sampled trajectories
with a transformer-based recognition network with temporal attention and
relative positional encoding. We demonstrate efficient and stable training, and
state-of-the-art performance on multiple large-scale benchmark datasets
A Variational Autoencoder for Heterogeneous Temporal and Longitudinal Data
The variational autoencoder (VAE) is a popular deep latent variable model
used to analyse high-dimensional datasets by learning a low-dimensional latent
representation of the data. It simultaneously learns a generative model and an
inference network to perform approximate posterior inference. Recently proposed
extensions to VAEs that can handle temporal and longitudinal data have
applications in healthcare, behavioural modelling, and predictive maintenance.
However, these extensions do not account for heterogeneous data (i.e., data
comprising of continuous and discrete attributes), which is common in many
real-life applications. In this work, we propose the heterogeneous longitudinal
VAE (HL-VAE) that extends the existing temporal and longitudinal VAEs to
heterogeneous data. HL-VAE provides efficient inference for high-dimensional
datasets and includes likelihood models for continuous, count, categorical, and
ordinal data while accounting for missing observations. We demonstrate our
model's efficacy through simulated as well as clinical datasets, and show that
our proposed model achieves competitive performance in missing value imputation
and predictive accuracy.Comment: Preprin
Bayesian inference of ODEs with Gaussian processes
Recent machine learning advances have proposed black-box estimation of
unknown continuous-time system dynamics directly from data. However, earlier
works are based on approximative ODE solutions or point estimates. We propose a
novel Bayesian nonparametric model that uses Gaussian processes to infer
posteriors of unknown ODE systems directly from data. We derive sparse
variational inference with decoupled functional sampling to represent vector
field posteriors. We also introduce a probabilistic shooting augmentation to
enable efficient inference from arbitrarily long trajectories. The method
demonstrates the benefit of computing vector field posteriors, with predictive
uncertainty scores outperforming alternative methods on multiple ODE learning
tasks
- …