145 research outputs found
Anti-charmed pentaquark from B decays
We explore the possibility of observing the anti-charmed pentaquark state
from the decay of meson produced at -factory
experiments. We first show that the observed branching ratio of the to , as well as its open histograms, can be remarkably well
explained by assuming that the decay proceeds first through the (or ) decay, whose branching ratios are known, and
then through the subsequent decay of the virtual or
mesons to , whose strength are calculated using previously fit
hadronic parameters. We then note that the can be similarly produced
when the virtual or decay into an anti-nucleon and a
. Combining the present theoretical estimates for the ratio and , we find that the anti-charmed pentaquark , which was
predicted to be bound by several model calculations, can be produced via , and be observed from the -factory experiments
through the weak decay of .Comment: 4 pages, 4 figures, Revised version to be published in Physical
Review Letter
Dynamic Graph Generation Network: Generating Relational Knowledge from Diagrams
In this work, we introduce a new algorithm for analyzing a diagram, which
contains visual and textual information in an abstract and integrated way.
Whereas diagrams contain richer information compared with individual
image-based or language-based data, proper solutions for automatically
understanding them have not been proposed due to their innate characteristics
of multi-modality and arbitrariness of layouts. To tackle this problem, we
propose a unified diagram-parsing network for generating knowledge from
diagrams based on an object detector and a recurrent neural network designed
for a graphical structure. Specifically, we propose a dynamic graph-generation
network that is based on dynamic memory and graph theory. We explore the
dynamics of information in a diagram with activation of gates in gated
recurrent unit (GRU) cells. On publicly available diagram datasets, our model
demonstrates a state-of-the-art result that outperforms other baselines.
Moreover, further experiments on question answering shows potentials of the
proposed method for various applications
Advancing Adversarial Training by Injecting Booster Signal
Recent works have demonstrated that deep neural networks (DNNs) are highly
vulnerable to adversarial attacks. To defend against adversarial attacks, many
defense strategies have been proposed, among which adversarial training has
been demonstrated to be the most effective strategy. However, it has been known
that adversarial training sometimes hurts natural accuracy. Then, many works
focus on optimizing model parameters to handle the problem. Different from the
previous approaches, in this paper, we propose a new approach to improve the
adversarial robustness by using an external signal rather than model
parameters. In the proposed method, a well-optimized universal external signal
called a booster signal is injected into the outside of the image which does
not overlap with the original content. Then, it boosts both adversarial
robustness and natural accuracy. The booster signal is optimized in parallel to
model parameters step by step collaboratively. Experimental results show that
the booster signal can improve both the natural and robust accuracies over the
recent state-of-the-art adversarial training methods. Also, optimizing the
booster signal is general and flexible enough to be adopted on any existing
adversarial training methods.Comment: Accepted at IEEE Transactions on Neural Networks and Learning System
Probing sterile neutrino in () meson decays at Belle II (BESIII)
We present, how a systematic study of ()
decays with , at Belle II (BESIII) can provide unambiguous
signature of a heavy neutrino and/or constrain its mixing with active
neutrinos , which is parameterized by . Our
constraint on that can be achieved from the full Belle II data
is comparable with what can be obtained from the much larger data set of the
upgraded LHCb. Additionally, our method offers better constraint on for mass of sterile neutrino GeV. We can also probe the Dirac
and Majorana nature of by observing the sequential decay of , including
suppression from observation of a displaced vertex as well as helicity flip,
for Majorana .Comment: 9 pages, 6 figures. This is a pre-print of an article published in
European Physical Journal C. The final authenticated version is available
online at https://doi.org/10.1140/epjc/s10052-020-8310-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Addressing the limitations of text as a source of accurate layout
representation in text-conditional diffusion models, many works incorporate
additional signals to condition certain attributes within a generated image.
Although successful, previous works do not account for the specific
localization of said attributes extended into the three dimensional plane. In
this context, we present a conditional diffusion model that integrates control
over three-dimensional object placement with disentangled representations of
global stylistic semantics from multiple exemplar images. Specifically, we
first introduce \textit{depth disentanglement training} to leverage the
relative depth of objects as an estimator, allowing the model to identify the
absolute positions of unseen objects through the use of synthetic image
triplets. We also introduce \textit{soft guidance}, a method for imposing
global semantics onto targeted regions without the use of any additional
localization cues. Our integrated framework, \textsc{Compose and Conquer
(CnC)}, unifies these techniques to localize multiple conditions in a
disentangled manner. We demonstrate that our approach allows perception of
objects at varying depths while offering a versatile framework for composing
localized objects with different global semantics. Code:
https://github.com/tomtom1103/compose-and-conquer/Comment: ICLR 202
Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model
The objective of this work is to extract target speaker's voice from a
mixture of voices using visual cues. Existing works on audio-visual speech
separation have demonstrated their performance with promising intelligibility,
but maintaining naturalness remains a challenge. To address this issue, we
propose AVDiffuSS, an audio-visual speech separation model based on a diffusion
mechanism known for its capability in generating natural samples. For an
effective fusion of the two modalities for diffusion, we also propose a
cross-attention-based feature fusion mechanism. This mechanism is specifically
tailored for the speech domain to integrate the phonetic information from
audio-visual correspondence in speech generation. In this way, the fusion
process maintains the high temporal resolution of the features, without
excessive computational requirements. We demonstrate that the proposed
framework achieves state-of-the-art results on two benchmarks, including
VoxCeleb2 and LRS3, producing speech with notably better naturalness.Comment: Project page with demo: https://mm.kaist.ac.kr/projects/avdiffuss
Physics-Informed Convolutional Transformer for Predicting Volatility Surface
Predicting volatility is important for asset predicting, option pricing and
hedging strategies because it cannot be directly observed in the financial
market. The Black-Scholes option pricing model is one of the most widely used
models by market participants. Notwithstanding, the Black-Scholes model is
based on heavily criticized theoretical premises, one of which is the constant
volatility assumption. The dynamics of the volatility surface is difficult to
estimate. In this paper, we establish a novel architecture based on
physics-informed neural networks and convolutional transformers. The
performance of the new architecture is directly compared to other well-known
deep-learning architectures, such as standard physics-informed neural networks,
convolutional long-short term memory (ConvLSTM), and self-attention ConvLSTM.
Numerical evidence indicates that the proposed physics-informed convolutional
transformer network achieves a superior performance than other methods.Comment: Submitted to Quantitative Financ
VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval
Video Moment Retrieval (VMR) is a task to localize the temporal moment in
untrimmed video specified by natural language query. For VMR, several methods
that require full supervision for training have been proposed. Unfortunately,
acquiring a large number of training videos with labeled temporal boundaries
for each query is a labor-intensive process. This paper explores methods for
performing VMR in a weakly-supervised manner (wVMR): training is performed
without temporal moment labels but only with the text query that describes a
segment of the video. Existing methods on wVMR generate multi-scale proposals
and apply query-guided attention mechanisms to highlight the most relevant
proposal. To leverage the weak supervision, contrastive learning is used which
predicts higher scores for the correct video-query pairs than for the incorrect
pairs. It has been observed that a large number of candidate proposals, coarse
query representation, and one-way attention mechanism lead to blurry attention
maps which limit the localization performance. To handle this issue,
Video-Language Alignment Network (VLANet) is proposed that learns sharper
attention by pruning out spurious candidate proposals and applying a
multi-directional attention mechanism with fine-grained query representation.
The Surrogate Proposal Selection module selects a proposal based on the
proximity to the query in the joint embedding space, and thus substantially
reduces candidate proposals which leads to lower computation load and sharper
attention. Next, the Cascaded Cross-modal Attention module considers dense
feature interactions and multi-directional attention flow to learn the
multi-modal alignment. VLANet is trained end-to-end using contrastive loss
which enforces semantically similar videos and queries to gather. The
experiments show that the method achieves state-of-the-art performance on
Charades-STA and DiDeMo datasets.Comment: 16 pages, 6 figures, European Conference on Computer Vision, 202
- …