48 research outputs found
Nonradial solutions for semilinear Schrödinger equations with sign-changing potential
In this paper, we investigate the existence of infinite nonradial solutions for the Schrödinger equations
\begin{equation*}
\begin{cases}
-\triangle u+b(|x|)u=f(|x|, u), &\quad x\in {\mathbb{R}}^{N},\\
u\in H^{1}({\mathbb{R}}^{N}),
\end{cases}
\end{equation*}
where is allowed to be sign-changing. Under some assumptions on and , we obtain that the above system possesses infinitely many nonradial solutions. The method of proof relies on critical point theorem
Multiplicity of Periodic Solutions for Third-Order Nonlinear Differential Equations
We study the existence of periodic solutions for third-order nonlinear differential equations. The method of proof relies on Schauder’s fixed point theorem applied in a novel way, where the original equation is transformed into second-order integrodifferential equation through a linear integral operator. Finally, examples are presented to illustrate applications of the main results
Long-Range Grouping Transformer for Multi-View 3D Reconstruction
Nowadays, transformer networks have demonstrated superior performance in many
computer vision tasks. In a multi-view 3D reconstruction algorithm following
this paradigm, self-attention processing has to deal with intricate image
tokens including massive information when facing heavy amounts of view input.
The curse of information content leads to the extreme difficulty of model
learning. To alleviate this problem, recent methods compress the token number
representing each view or discard the attention operations between the tokens
from different views. Obviously, they give a negative impact on performance.
Therefore, we propose long-range grouping attention (LGA) based on the
divide-and-conquer principle. Tokens from all views are grouped for separate
attention operations. The tokens in each group are sampled from all views and
can provide macro representation for the resided view. The richness of feature
learning is guaranteed by the diversity among different groups. An effective
and efficient encoder can be established which connects inter-view features
using LGA and extract intra-view features using the standard self-attention
layer. Moreover, a novel progressive upsampling decoder is also designed for
voxel generation with relatively high resolution. Hinging on the above, we
construct a powerful transformer-based network, called LRGT. Experimental
results on ShapeNet verify our method achieves SOTA accuracy in multi-view
reconstruction. Code will be available at
https://github.com/LiyingCV/Long-Range-Grouping-Transformer.Comment: Accepted to ICCV 202
Expressive Whole-Body Control for Humanoid Robots
Can we enable humanoid robots to generate rich, diverse, and expressive
motions in the real world? We propose to learn a whole-body control policy on a
human-sized robot to mimic human motions as realistic as possible. To train
such a policy, we leverage the large-scale human motion capture data from the
graphics community in a Reinforcement Learning framework. However, directly
performing imitation learning with the motion capture dataset would not work on
the real humanoid robot, given the large gap in degrees of freedom and physical
capabilities. Our method Expressive Whole-Body Control (Exbody) tackles this
problem by encouraging the upper humanoid body to imitate a reference motion,
while relaxing the imitation constraint on its two legs and only requiring them
to follow a given velocity robustly. With training in simulation and Sim2Real
transfer, our policy can control a humanoid robot to walk in different styles,
shake hands with humans, and even dance with a human in the real world. We
conduct extensive studies and comparisons on diverse motions in both simulation
and the real world to show the effectiveness of our approach.Comment: Website: https://expressive-humanoid.github.i
MultiPoint BVPs for Second-Order Functional Differential Equations with Impulses
This paper is concerned about the existence of extreme solutions of multipoint boundary value problem for a class of second-order impulsive functional differential equations. We introduce a new concept of lower and upper solutions. Then, by using the method of upper and lower solutions introduced and monotone iterative technique, we obtain the existence results of extreme solutions
GARNet: Global-Aware Multi-View 3D Reconstruction Network and the Cost-Performance Tradeoff
Deep learning technology has made great progress in multi-view 3D
reconstruction tasks. At present, most mainstream solutions establish the
mapping between views and shape of an object by assembling the networks of 2D
encoder and 3D decoder as the basic structure while they adopt different
approaches to obtain aggregation of features from several views. Among them,
the methods using attention-based fusion perform better and more stable than
the others, however, they still have an obvious shortcoming -- the strong
independence of each view during predicting the weights for merging leads to a
lack of adaption of the global state. In this paper, we propose a global-aware
attention-based fusion approach that builds the correlation between each branch
and the global to provide a comprehensive foundation for weights inference. In
order to enhance the ability of the network, we introduce a novel loss function
to supervise the shape overall and propose a dynamic two-stage training
strategy that can effectively adapt to all reconstructors with attention-based
fusion. Experiments on ShapeNet verify that our method outperforms existing
SOTA methods while the amount of parameters is far less than the same type of
algorithm, Pix2Vox++. Furthermore, we propose a view-reduction method based on
maximizing diversity and discuss the cost-performance tradeoff of our model to
achieve a better performance when facing heavy input amount and limited
computational cost
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Automatic radiology report generation has attracted enormous research
interest due to its practical value in reducing the workload of radiologists.
However, simultaneously establishing global correspondences between the image
(e.g., Chest X-ray) and its related report and local alignments between image
patches and keywords remains challenging. To this end, we propose an Unify,
Align and then Refine (UAR) approach to learn multi-level cross-modal
alignments and introduce three novel modules: Latent Space Unifier (LSU),
Cross-modal Representation Aligner (CRA) and Text-to-Image Refiner (TIR).
Specifically, LSU unifies multimodal data into discrete tokens, making it
flexible to learn common knowledge among modalities with a shared network. The
modality-agnostic CRA learns discriminative features via a set of orthonormal
basis and a dual-gate mechanism first and then globally aligns visual and
textual representations under a triplet contrastive loss. TIR boosts
token-level local alignment via calibrating text-to-image attention with a
learnable mask. Additionally, we design a two-stage training procedure to make
UAR gradually grasp cross-modal alignments at different levels, which imitates
radiologists' workflow: writing sentence by sentence first and then checking
word by word. Extensive experiments and analyses on IU-Xray and MIMIC-CXR
benchmark datasets demonstrate the superiority of our UAR against varied
state-of-the-art methods.Comment: 8 pages,6 figures,4 table