290 research outputs found
Conditioning on Time is All You Need for Synthetic Survival Data Generation
Synthetic data generation holds considerable promise, offering avenues to
enhance privacy, fairness, and data accessibility. Despite the availability of
various methods for generating synthetic tabular data, challenges persist,
particularly in specialized applications such as survival analysis. One
significant obstacle in survival data generation is censoring, which manifests
as not knowing the precise timing of observed (target) events for certain
instances. Existing methods face difficulties in accurately reproducing the
real distribution of event times for both observed (uncensored) events and
censored events, i.e., the generated event-time distributions do not accurately
match the underlying distributions of the real data. So motivated, we propose a
simple paradigm to produce synthetic survival data by generating covariates
conditioned on event times (and censoring indicators), thus allowing one to
reuse existing conditional generative models for tabular data without
significant computational overhead, and without making assumptions about the
(usually unknown) generation mechanism underlying censoring. We evaluate this
method via extensive experiments on real-world datasets. Our methodology
outperforms multiple competitive baselines at generating survival data, while
improving the performance of downstream survival models trained on it and
tested on real data
Deconvolutional Latent-Variable Model for Text Sequence Matching
A latent-variable model is introduced for text matching, inferring sentence
representations by jointly optimizing generative and discriminative objectives.
To alleviate typical optimization challenges in latent-variable models for
text, we employ deconvolutional networks as the sequence decoder (generator),
providing learned latent codes with more semantic information and better
generalization. Our model, trained in an unsupervised manner, yields stronger
empirical predictive performance than a decoder based on Long Short-Term Memory
(LSTM), with less parameters and considerably faster training. Further, we
apply it to text sequence-matching problems. The proposed model significantly
outperforms several strong sentence-encoding baselines, especially in the
semi-supervised setting.Comment: Accepted by AAAI-201
Fully digital encryption technique
We propose an alternative fully digital encryption technique based on using the Fourier transform of the original object to be processed and a speckled reference wave as encryption mask. Once encrypted, the Fourier transform spectrum of the object is holographically stored. The original data recovering is performed by digital reconstruction using the same encryption mask, which is also holographically stored. Quality of reconstructed data is evaluated as a function of the sensed encrypted data. Computer simulations and experimental results are presented to demonstrate the method
Sparse Linear Identifiable Multivariate Modeling
In this paper we consider sparse and identifiable linear latent variable
(factor) and linear Bayesian network models for parsimonious analysis of
multivariate data. We propose a computationally efficient method for joint
parameter and model inference, and model comparison. It consists of a fully
Bayesian hierarchy for sparse models using slab and spike priors (two-component
delta-function and continuous mixtures), non-Gaussian latent factors and a
stochastic search over the ordering of the variables. The framework, which we
call SLIM (Sparse Linear Identifiable Multivariate modeling), is validated and
bench-marked on artificial and real biological data sets. SLIM is closest in
spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in
inference, Bayesian network structure learning and model comparison.
Experimentally, SLIM performs equally well or better than LiNGAM with
comparable computational complexity. We attribute this mainly to the stochastic
search strategy used, and to parsimony (sparsity and identifiability), which is
an explicit part of the model. We propose two extensions to the basic i.i.d.
linear framework: non-linear dependence on observed variables, called SNIM
(Sparse Non-linear Identifiable Multivariate modeling) and allowing for
correlations between latent variables, called CSLIM (Correlated SLIM), for the
temporal and/or spatial data. The source code and scripts are available from
http://cogsys.imm.dtu.dk/slim/.Comment: 45 pages, 17 figure
- …