154 research outputs found
Belief propagation generative adversarial networks
Generative adversarial networks (GANs) are a class of generative models based on a minimax game. They have led to significant improvement in the field of unsupervised learning, especially image generation. However, most works in GANs are based on learning the distribution of the input dataset through a multi-layer neural network which does not explicitly model the structure of the input variables. This may work well in large and less noisy datasets, with the expectation that the learning procedure is able to assign relatively small weights to the occasional noise through averaging of many inputs. However, this approach potentially suffers when the input size is limited or noisy, resulting in reduced quality of generated samples by picking up spurious structures.
In this thesis we propose a technique to model the structure of the variable interactions by incorporating graphical models in the generative adversarial network. The proposed framework produces samples by passing random inputs through a neural network to construct the local potentials in the graphical model; performing probabilistic inference in this graphical model then yields the marginal distribution. Message passing based on discrete variables keeps a table of local potential values, the size of which could be too big for natural images. We present a solution based on continuous variables with unary and pairwise Gaussian potentials, and perform probabilistic inference using loopy belief propagation on continuous Markov random fields. Experiments on the MNIST dataset show that our model is able to outperform vanilla GANs with more than two iterations of belief propagation
Learning Only On Boundaries: a Physics-Informed Neural operator for Solving Parametric Partial Differential Equations in Complex Geometries
Recently deep learning surrogates and neural operators have shown promise in
solving partial differential equations (PDEs). However, they often require a
large amount of training data and are limited to bounded domains. In this work,
we present a novel physics-informed neural operator method to solve
parametrized boundary value problems without labeled data. By reformulating the
PDEs into boundary integral equations (BIEs), we can train the operator network
solely on the boundary of the domain. This approach reduces the number of
required sample points from to , where is the domain's
dimension, leading to a significant acceleration of the training process.
Additionally, our method can handle unbounded problems, which are unattainable
for existing physics-informed neural networks (PINNs) and neural operators. Our
numerical experiments show the effectiveness of parametrized complex geometries
and unbounded problems
An Expert's Guide to Training Physics-informed Neural Networks
Physics-informed neural networks (PINNs) have been popularized as a deep
learning framework that can seamlessly synthesize observational data and
partial differential equation (PDE) constraints. Their practical effectiveness
however can be hampered by training pathologies, but also oftentimes by poor
choices made by users who lack deep learning expertise. In this paper we
present a series of best practices that can significantly improve the training
efficiency and overall accuracy of PINNs. We also put forth a series of
challenging benchmark problems that highlight some of the most prominent
difficulties in training PINNs, and present comprehensive and fully
reproducible ablation studies that demonstrate how different architecture
choices and training strategies affect the test accuracy of the resulting
models. We show that the methods and guiding principles put forth in this study
lead to state-of-the-art results and provide strong baselines that future
studies should use for comparison purposes. To this end, we also release a
highly optimized library in JAX that can be used to reproduce all results
reported in this paper, enable future research studies, as well as facilitate
easy adaptation to new use-case scenarios.Comment: 36 pages, 25 figures, 13 table
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
Vision transformers have achieved significant improvements on various vision
tasks but their quadratic interactions between tokens significantly reduce
computational efficiency. Many pruning methods have been proposed to remove
redundant tokens for efficient vision transformers recently. However, existing
studies mainly focus on the token importance to preserve local attentive tokens
but completely ignore the global token diversity. In this paper, we emphasize
the cruciality of diverse global semantics and propose an efficient token
decoupling and merging method that can jointly consider the token importance
and diversity for token pruning. According to the class token attention, we
decouple the attentive and inattentive tokens. In addition to preserving the
most discriminative local tokens, we merge similar inattentive tokens and match
homogeneous attentive tokens to maximize the token diversity. Despite its
simplicity, our method obtains a promising trade-off between model complexity
and classification accuracy. On DeiT-S, our method reduces the FLOPs by 35%
with only a 0.2% accuracy drop. Notably, benefiting from maintaining the token
diversity, our method can even improve the accuracy of DeiT-T by 0.1% after
reducing its FLOPs by 40%
Bilateral-Fuser: A Novel Multi-cue Fusion Architecture with Anatomical-aware Tokens for Fovea Localization
Accurate localization of fovea is one of the primary steps in analyzing
retinal diseases since it helps prevent irreversible vision loss. Although
current deep learning-based methods achieve better performance than traditional
methods, there still remain challenges such as utilizing anatomical landmarks
insufficiently, sensitivity to diseased retinal images and various image
conditions. In this paper, we propose a novel transformer-based architecture
(Bilateral-Fuser) for multi-cue fusion. This architecture explicitly
incorporates long-range connections and global features using retina and vessel
distributions for robust fovea localization. We introduce a spatial attention
mechanism in the dual-stream encoder for extracting and fusing self-learned
anatomical information. This design focuses more on features distributed along
blood vessels and significantly decreases computational costs by reducing token
numbers. Our comprehensive experiments show that the proposed architecture
achieves state-of-the-art performance on two public and one large-scale private
datasets. We also present that the Bilateral-Fuser is more robust on both
normal and diseased retina images and has better generalization capacity in
cross-dataset experiments.Comment: This paper is prepared for IEEE TRANSACTIONS ON MEDICAL IMAGIN
- …