5,143 research outputs found
UniVG: Towards UNIfied-modal Video Generation
Diffusion based video generation has received extensive attention and
achieved considerable success within both the academic and industrial
communities. However, current efforts are mainly concentrated on
single-objective or single-task video generation, such as generation driven by
text, by image, or by a combination of text and image. This cannot fully meet
the needs of real-world application scenarios, as users are likely to input
images and text conditions in a flexible manner, either individually or in
combination. To address this, we propose a Unified-modal Video Genearation
system that is capable of handling multiple video generation tasks across text
and image modalities. To this end, we revisit the various video generation
tasks within our system from the perspective of generative freedom, and
classify them into high-freedom and low-freedom video generation categories.
For high-freedom video generation, we employ Multi-condition Cross Attention to
generate videos that align with the semantics of the input images or text. For
low-freedom video generation, we introduce Biased Gaussian Noise to replace the
pure random Gaussian Noise, which helps to better preserve the content of the
input conditions. Our method achieves the lowest Fr\'echet Video Distance (FVD)
on the public academic benchmark MSR-VTT, surpasses the current open-source
methods in human evaluations, and is on par with the current close-source
method Gen2. For more samples, visit https://univg-baidu.github.io
MUST-CNN: A Multilayer Shift-and-Stitch Deep Convolutional Architecture for Sequence-based Protein Structure Prediction
Predicting protein properties such as solvent accessibility and secondary
structure from its primary amino acid sequence is an important task in
bioinformatics. Recently, a few deep learning models have surpassed the
traditional window based multilayer perceptron. Taking inspiration from the
image classification domain we propose a deep convolutional neural network
architecture, MUST-CNN, to predict protein properties. This architecture uses a
novel multilayer shift-and-stitch (MUST) technique to generate fully dense
per-position predictions on protein sequences. Our model is significantly
simpler than the state-of-the-art, yet achieves better results. By combining
MUST and the efficient convolution operation, we can consider far more
parameters while retaining very fast prediction speeds. We beat the
state-of-the-art performance on two large protein property prediction datasets.Comment: 8 pages ; 3 figures ; deep learning based sequence-sequence
prediction. in AAAI 201
Copula-like Variational Inference
This paper considers a new family of variational distributions motivated by
Sklar's theorem. This family is based on new copula-like densities on the
hypercube with non-uniform marginals which can be sampled efficiently, i.e.
with a complexity linear in the dimension of state space. Then, the proposed
variational densities that we suggest can be seen as arising from these
copula-like densities used as base distributions on the hypercube with Gaussian
quantile functions and sparse rotation matrices as normalizing flows. The
latter correspond to a rotation of the marginals with complexity . We provide some empirical evidence that such a variational family can
also approximate non-Gaussian posteriors and can be beneficial compared to
Gaussian approximations. Our method performs largely comparably to
state-of-the-art variational approximations on standard regression and
classification benchmarks for Bayesian Neural Networks.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019), Vancouver, Canad
- …