109 research outputs found
The Possibility of Inflation in Asymptotically Safe Gravity
We examine the inflationary modes in the cubic curvature theories in the
context of asymptotically safe gravity. On the phase space of the Hubble
parameter, there exists a critical point which corresponds to the slow-roll
inflation in Einstein frame. Most of the e-foldings are attained around the
critical point for each inflationary trajectories. If the coupling constants
have the parametric relations generated as the power of the relative
energy scale of inflation to the ultraviolet cutoff , a
successful inflation with more than 60 e-foldings occurs near the critical
point.Comment: 14 pages, 4 figure
Panoramic Vision Transformer for Saliency Detection in 360{\deg} Videos
360 video saliency detection is one of the challenging benchmarks for
360 video understanding since non-negligible distortion and
discontinuity occur in the projection of any format of 360 videos, and
capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature.
We present a new framework named Panoramic Vision Transformer (PAVER). We
design the encoder using Vision Transformer with deformable convolution, which
enables us not only to plug pretrained models from normal videos into our
architecture without additional modules or finetuning but also to perform
geometric approximation only once, unlike previous deep CNN-based approaches.
Thanks to its powerful encoder, PAVER can learn the saliency from three simple
relative relations among local patch features, outperforming state-of-the-art
models for the Wild360 benchmark by large margins without supervision or
auxiliary information like class activation. We demonstrate the utility of our
saliency prediction model with the omnidirectional video quality assessment
task in VQA-ODV, where we consistently improve performance without any form of
supervision, including head movement.Comment: Published to ECCV202
CMB Spectral -Distortion of Multiple Inflation Scenario
In multiple inflation scenario having two inflations with an intermediate
matter-dominated phase, the power spectrum is estimated to be enhanced on
scales smaller than the horizon size at the beginning of the second inflation,
. We require to make sure that
the enhanced power spectrum is consistent with large scale observation of
cosmic microwave background (CMB). We consider the CMB spectral distortions
generated by the dissipation of acoustic waves to constrain the power spectrum.
The -distortion value can be times larger than the expectation of the
standard CDM model () for , while the
-distortion is hardly affected by the enhancement of the power spectrum.Comment: 16 pages, 5 figure
Edit-A-Video: Single Video Editing with Object-Aware Consistency
Despite the fact that text-to-video (TTV) model has recently achieved
remarkable success, there have been few approaches on TTV for its extension to
video editing. Motivated by approaches on TTV models adapting from
diffusion-based text-to-image (TTI) models, we suggest the video editing
framework given only a pretrained TTI model and a single pair,
which we term Edit-A-Video. The framework consists of two stages: (1) inflating
the 2D model into the 3D model by appending temporal modules and tuning on the
source video (2) inverting the source video into the noise and editing with
target text prompt and attention map injection. Each stage enables the temporal
modeling and preservation of semantic attributes of the source video. One of
the key challenges for video editing include a background inconsistency
problem, where the regions not included for the edit suffer from undesirable
and inconsistent temporal alterations. To mitigate this issue, we also
introduce a novel mask blending method, termed as sparse-causal blending (SC
Blending). We improve previous mask blending methods to reflect the temporal
consistency so that the area where the editing is applied exhibits smooth
transition while also achieving spatio-temporal consistency of the unedited
regions. We present extensive experimental results over various types of text
and videos, and demonstrate the superiority of the proposed method compared to
baselines in terms of background consistency, text alignment, and video editing
quality
Synchronizing Vision and Language: Bidirectional Token-Masking AutoEncoder for Referring Image Segmentation
Referring Image Segmentation (RIS) aims to segment target objects expressed
in natural language within a scene at the pixel level. Various recent RIS
models have achieved state-of-the-art performance by generating contextual
tokens to model multimodal features from pretrained encoders and effectively
fusing them using transformer-based cross-modal attention. While these methods
match language features with image features to effectively identify likely
target objects, they often struggle to correctly understand contextual
information in complex and ambiguous sentences and scenes. To address this
issue, we propose a novel bidirectional token-masking autoencoder (BTMAE)
inspired by the masked autoencoder (MAE). The proposed model learns the context
of image-to-language and language-to-image by reconstructing missing features
in both image and language features at the token level. In other words, this
approach involves mutually complementing across the features of images and
language, with a focus on enabling the network to understand interconnected
deep contextual information between the two modalities. This learning method
enhances the robustness of RIS performance in complex sentences and scenes. Our
BTMAE achieves state-of-the-art performance on three popular datasets, and we
demonstrate the effectiveness of the proposed method through various ablation
studies
Before the Page time: maximum entanglements or the return of the monster?
The entropy of Hawking radiation is approximately equal to the maximum of
entanglement entropy if a black hole is in a state before the Page time, i.e.,
when the entropy of Hawking radiation is smaller than the entropy of the black
hole. However, if there exists a process generating smaller entanglements
rather than maximal entanglements, the entropy of Hawking radiation will become
smaller than the maximum of the entanglement entropy before the Page time. If
this process accumulates, even though the probability is small, the emitted
radiation can eventually be distinguished from the exactly thermal state. In
this paper, we provide several interpretations of this phenomenon: (1)
information of the collapsed matter is emitted before the Page time, (2) there
exists a firewall or a non-local effect before the Page time, or (3) the
statistical entropy is greater than the areal entropy; a monster is formed. Our
conclusion will help resolve the information loss paradox by providing
groundwork for further research.Comment: 19 pages, 8 figure
- …