154 research outputs found
Magnetic reconnection at the earliest stage of solar flux emergence
On 2016 September 20, the Interface Region Imaging Spectrograph observed an
active region during its earliest emerging phase for almost 7 hours. The
Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory
observed continuous emergence of small-scale magnetic bipoles with a rate of
10 Mx~s. The emergence of magnetic fluxes and interactions
between different polarities lead to frequent occurrence of ultraviolet (UV)
bursts, which exhibit as intense transient brightenings in the 1400 \AA{}
images. In the meantime, discrete small patches with the same magnetic polarity
tend to move together and merge, leading to enhancement of the magnetic fields
and thus formation of pores (small sunspots) at some locations. The spectra of
these UV bursts are characterized by the superposition of several chromospheric
absorption lines on the greatly broadened profiles of some emission lines
formed at typical transition region temperatures, suggesting heating of the
local materials to a few tens of thousands of kelvin in the lower atmosphere by
magnetic reconnection. Some bursts reveal blue and red shifts of
100~km~s at neighboring pixels, indicating the spatially resolved
bidirectional reconnection outflows. Many such bursts appear to be associated
with the cancellation of magnetic fluxes with a rate of the order of
10 Mx~s. We also investigate the three-dimensional magnetic
field topology through a magneto-hydrostatic model and find that a small
fraction of the bursts are associated with bald patches (magnetic dips).
Finally, we find that almost all bursts are located in regions of large
squashing factor at the height of 1 Mm, reinforcing our conclusion that
these bursts are produced through reconnection in the lower atmosphere.Comment: ApJ, 10 figure
G.O.G: A Versatile Gripper-On-Gripper Design for Bimanual Cloth Manipulation with a Single Robotic Arm
The manipulation of garments poses research challenges due to their
deformable nature and the extensive variability in shapes and sizes. Despite
numerous attempts by researchers to address these via approaches involving
robot perception and control, there has been a relatively limited interest in
resolving it through the co-development of robot hardware. Consequently, the
majority of studies employ off-the-shelf grippers in conjunction with dual
robot arms to enable bimanual manipulation and high dexterity. However, this
dual-arm system increases the overall cost of the robotic system as well as its
control complexity in order to tackle robot collisions and other robot
coordination issues. As an alternative approach, we propose to enable bimanual
cloth manipulation using a single robot arm via novel end effector design --
sharing dexterity skills between manipulator and gripper rather than relying
entirely on robot arm coordination. To this end, we introduce a new gripper,
called G.O.G., based on a gripper-on-gripper structure where the first gripper
independently regulates the span, up to 500mm, between its fingers which are in
turn also grippers. These finger grippers consist of a variable friction module
that enables two grasping modes: firm and sliding grasps. Household item and
cloth object benchmarks are employed to evaluate the performance of the
proposed design, encompassing both experiments on the gripper design itself and
on cloth manipulation. Experimental results demonstrate the potential of the
introduced ideas to undertake a range of bimanual cloth manipulation tasks with
a single robot arm. Supplementary material is available at
https://sites.google.com/view/gripperongripper.Comment: Accepted for IEEE Robotics and Automation Letters in January 2024.
Dongmyoung Lee and Wei Chen contributed equally to this researc
Solar Ultraviolet Bursts in a Coordinated Observation of IRIS, Hinode and SDO
Solar ultraviolet (UV) bursts are small-scale compact brightenings in
transition region images. The spectral profiles of transition region lines in
these bursts are significantly enhanced and broadened, often with chromospheric
absorption lines such as Ni~{\sc{ii}} 1335.203 and 1393.330 {\AA} superimposed.
We investigate the properties of several UV bursts using a coordinated
observation of the Interface Region Imaging Spectrograph (IRIS), Solar Dynamics
Observatory (SDO), and \textit{Hinode} on 2015 February 7. We have identified
12 UV bursts, and 11 of them reveal small blueshifts of the Ni~{\sc{ii}}
absorption lines. However, the Ni~{\sc{ii}} lines in one UV burst exhibit
obvious redshifts of 20 km s, which appear to be related to the
cold plasma downflows observed in the IRIS slit-jaw images. We also examine the
three-dimensional magnetic field topology using a magnetohydrostatic model, and
find that some UV bursts are associated with magnetic null points or bald
patches. In addition, we find that these UV bursts reveal no obvious coronal
signatures from the observations of the Atmospheric Imaging Assembly (AIA) on
board SDO and the EUV Imaging Spectrometer (EIS) on board \textit{Hinode}.Comment: will appear in the journal of Science China Technological Science
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Recently, growing interest has been aroused in extending the multimodal
capability of large language models (LLMs), e.g., vision-language (VL)
learning, which is regarded as the next milestone of artificial general
intelligence. However, existing solutions are prohibitively expensive, which
not only need to optimize excessive parameters, but also require another
large-scale pre-training before VL instruction tuning. In this paper, we
propose a novel and affordable solution for the effective VL adaption of LLMs,
called Mixture-of-Modality Adaptation (MMA). Instead of using large neural
networks to connect the image encoder and LLM, MMA adopts lightweight modules,
i.e., adapters, to bridge the gap between LLMs and VL tasks, which also enables
the joint optimization of the image and language models. Meanwhile, MMA is also
equipped with a routing algorithm to help LLMs achieve an automatic shift
between single- and multi-modal instructions without compromising their ability
of natural language understanding. To validate MMA, we apply it to a recent LLM
called LLaMA and term this formed large vision-language instructed model as
LaVIN. To validate MMA and LaVIN, we conduct extensive experiments under two
setups, namely multimodal science question answering and multimodal dialogue.
The experimental results not only demonstrate the competitive performance and
the superior training efficiency of LaVIN than existing multimodal LLMs, but
also confirm its great potential as a general-purpose chatbot. More
importantly, the actual expenditure of LaVIN is extremely cheap, e.g., only 1.4
training hours with 3.8M trainable parameters, greatly confirming the
effectiveness of MMA. Our project is released at
https://luogen1996.github.io/lavin
MovingParts: Motion-based 3D Part Discovery in Dynamic Radiance Field
We present MovingParts, a NeRF-based method for dynamic scene reconstruction
and part discovery. We consider motion as an important cue for identifying
parts, that all particles on the same part share the common motion pattern.
From the perspective of fluid simulation, existing deformation-based methods
for dynamic NeRF can be seen as parameterizing the scene motion under the
Eulerian view, i.e., focusing on specific locations in space through which the
fluid flows as time passes. However, it is intractable to extract the motion of
constituting objects or parts using the Eulerian view representation. In this
work, we introduce the dual Lagrangian view and enforce representations under
the Eulerian/Lagrangian views to be cycle-consistent. Under the Lagrangian
view, we parameterize the scene motion by tracking the trajectory of particles
on objects. The Lagrangian view makes it convenient to discover parts by
factorizing the scene motion as a composition of part-level rigid motions.
Experimentally, our method can achieve fast and high-quality dynamic scene
reconstruction from even a single moving camera, and the induced part-based
representation allows direct applications of part tracking, animation, 3D scene
editing, etc.Comment: 10 page
SSAH: Semi-supervised Adversarial Deep Hashing with Self-paced Hard Sample Generation
Deep hashing methods have been proved to be effective and efficient for
large-scale Web media search. The success of these data-driven methods largely
depends on collecting sufficient labeled data, which is usually a crucial
limitation in practical cases. The current solutions to this issue utilize
Generative Adversarial Network (GAN) to augment data in semi-supervised
learning. However, existing GAN-based methods treat image generations and
hashing learning as two isolated processes, leading to generation
ineffectiveness. Besides, most works fail to exploit the semantic information
in unlabeled data. In this paper, we propose a novel Semi-supervised Self-pace
Adversarial Hashing method, named SSAH to solve the above problems in a unified
framework. The SSAH method consists of an adversarial network (A-Net) and a
hashing network (H-Net). To improve the quality of generative images, first,
the A-Net learns hard samples with multi-scale occlusions and multi-angle
rotated deformations which compete against the learning of accurate hashing
codes. Second, we design a novel self-paced hard generation policy to gradually
increase the hashing difficulty of generated samples. To make use of the
semantic information in unlabeled ones, we propose a semi-supervised consistent
loss. The experimental results show that our method can significantly improve
state-of-the-art models on both the widely-used hashing datasets and
fine-grained datasets
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting
Pre-trained language models (PLMs) have played an increasing role in
multimedia research. In terms of vision-language (VL) tasks, they often serve
as a language encoder and still require an additional fusion network for VL
reasoning, resulting in excessive memory overhead. In this paper, we focus on
exploring PLMs as a stand-alone model for VL reasoning tasks. Inspired by the
recently popular prompt tuning, we first prove that the processed visual
features can be also projected onto the semantic space of PLMs and act as
prompt tokens to bridge the gap between single- and multi-modal learning.
However, this solution exhibits obvious redundancy in visual information and
model inference, and the placement of prompt tokens also greatly affects the
final performance. Based on these observations, we further propose a novel
transfer learning approach for PLMs, termed Dynamic Visual Prompting (DVP).
Concretely, DVP first deploys a cross-attention module to obtain text-related
and compact visual prompt tokens, thereby greatly reducing the input length of
PLMs. To obtain the optimal placement, we also equip DVP with a
reinforcement-learning based search algorithm, which can automatically merge
DVP with PLMs for different VL tasks via a very short search process. In
addition, we also experiment DVP with the recently popular adapter approach to
keep the most parameters of PLMs intact when adapting to VL tasks, helping PLMs
achieve a quick shift between single- and multi-modal tasks. We apply DVP to
two representative PLMs, namely BERT and T5, and conduct extensive experiments
on a set of VL reasoning benchmarks including VQA2.0, GQA and SNLIVE. The
experimental results not only show the advantage of DVP on efficiency and
performance, but also confirm its superiority in adapting pre-trained language
models to VL tasks
PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation
Pixel synthesis is a promising research paradigm for image generation, which
can well exploit pixel-wise prior knowledge for generation. However, existing
methods still suffer from excessive memory footprint and computation overhead.
In this paper, we propose a progressive pixel synthesis network towards
efficient image generation, coined as PixelFolder. Specifically, PixelFolder
formulates image generation as a progressive pixel regression problem and
synthesizes images by a multi-stage paradigm, which can greatly reduce the
overhead caused by large tensor transformations. In addition, we introduce
novel pixel folding operations to further improve model efficiency while
maintaining pixel-wise prior knowledge for end-to-end regression. With these
innovative designs, we greatly reduce the expenditure of pixel synthesis, e.g.,
reducing 90% computation and 57% parameters compared to the latest pixel
synthesis method called CIPS. To validate our approach, we conduct extensive
experiments on two benchmark datasets, namely FFHQ and LSUN Church. The
experimental results show that with much less expenditure, PixelFolder obtains
new state-of-the-art (SOTA) performance on two benchmark datasets, i.e., 3.77
FID and 2.45 FID on FFHQ and LSUN Church, respectively. Meanwhile, PixelFolder
is also more efficient than the SOTA methods like StyleGAN2, reducing about 74%
computation and 36% parameters, respectively. These results greatly validate
the effectiveness of the proposed PixelFolder.Comment: 11 pages, 7 figure
- …