154 research outputs found

    Magnetic reconnection at the earliest stage of solar flux emergence

    Full text link
    On 2016 September 20, the Interface Region Imaging Spectrograph observed an active region during its earliest emerging phase for almost 7 hours. The Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory observed continuous emergence of small-scale magnetic bipoles with a rate of ∼\sim1016^{16} Mx~s−1^{-1}. The emergence of magnetic fluxes and interactions between different polarities lead to frequent occurrence of ultraviolet (UV) bursts, which exhibit as intense transient brightenings in the 1400 \AA{} images. In the meantime, discrete small patches with the same magnetic polarity tend to move together and merge, leading to enhancement of the magnetic fields and thus formation of pores (small sunspots) at some locations. The spectra of these UV bursts are characterized by the superposition of several chromospheric absorption lines on the greatly broadened profiles of some emission lines formed at typical transition region temperatures, suggesting heating of the local materials to a few tens of thousands of kelvin in the lower atmosphere by magnetic reconnection. Some bursts reveal blue and red shifts of ∼\sim100~km~s−1^{-1} at neighboring pixels, indicating the spatially resolved bidirectional reconnection outflows. Many such bursts appear to be associated with the cancellation of magnetic fluxes with a rate of the order of ∼\sim1015^{15} Mx~s−1^{-1}. We also investigate the three-dimensional magnetic field topology through a magneto-hydrostatic model and find that a small fraction of the bursts are associated with bald patches (magnetic dips). Finally, we find that almost all bursts are located in regions of large squashing factor at the height of ∼\sim1 Mm, reinforcing our conclusion that these bursts are produced through reconnection in the lower atmosphere.Comment: ApJ, 10 figure

    G.O.G: A Versatile Gripper-On-Gripper Design for Bimanual Cloth Manipulation with a Single Robotic Arm

    Full text link
    The manipulation of garments poses research challenges due to their deformable nature and the extensive variability in shapes and sizes. Despite numerous attempts by researchers to address these via approaches involving robot perception and control, there has been a relatively limited interest in resolving it through the co-development of robot hardware. Consequently, the majority of studies employ off-the-shelf grippers in conjunction with dual robot arms to enable bimanual manipulation and high dexterity. However, this dual-arm system increases the overall cost of the robotic system as well as its control complexity in order to tackle robot collisions and other robot coordination issues. As an alternative approach, we propose to enable bimanual cloth manipulation using a single robot arm via novel end effector design -- sharing dexterity skills between manipulator and gripper rather than relying entirely on robot arm coordination. To this end, we introduce a new gripper, called G.O.G., based on a gripper-on-gripper structure where the first gripper independently regulates the span, up to 500mm, between its fingers which are in turn also grippers. These finger grippers consist of a variable friction module that enables two grasping modes: firm and sliding grasps. Household item and cloth object benchmarks are employed to evaluate the performance of the proposed design, encompassing both experiments on the gripper design itself and on cloth manipulation. Experimental results demonstrate the potential of the introduced ideas to undertake a range of bimanual cloth manipulation tasks with a single robot arm. Supplementary material is available at https://sites.google.com/view/gripperongripper.Comment: Accepted for IEEE Robotics and Automation Letters in January 2024. Dongmyoung Lee and Wei Chen contributed equally to this researc

    Solar Ultraviolet Bursts in a Coordinated Observation of IRIS, Hinode and SDO

    Full text link
    Solar ultraviolet (UV) bursts are small-scale compact brightenings in transition region images. The spectral profiles of transition region lines in these bursts are significantly enhanced and broadened, often with chromospheric absorption lines such as Ni~{\sc{ii}} 1335.203 and 1393.330 {\AA} superimposed. We investigate the properties of several UV bursts using a coordinated observation of the Interface Region Imaging Spectrograph (IRIS), Solar Dynamics Observatory (SDO), and \textit{Hinode} on 2015 February 7. We have identified 12 UV bursts, and 11 of them reveal small blueshifts of the Ni~{\sc{ii}} absorption lines. However, the Ni~{\sc{ii}} lines in one UV burst exhibit obvious redshifts of ∼\sim20 km s−1^{-1}, which appear to be related to the cold plasma downflows observed in the IRIS slit-jaw images. We also examine the three-dimensional magnetic field topology using a magnetohydrostatic model, and find that some UV bursts are associated with magnetic null points or bald patches. In addition, we find that these UV bursts reveal no obvious coronal signatures from the observations of the Atmospheric Imaging Assembly (AIA) on board SDO and the EUV Imaging Spectrometer (EIS) on board \textit{Hinode}.Comment: will appear in the journal of Science China Technological Science

    Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models

    Full text link
    Recently, growing interest has been aroused in extending the multimodal capability of large language models (LLMs), e.g., vision-language (VL) learning, which is regarded as the next milestone of artificial general intelligence. However, existing solutions are prohibitively expensive, which not only need to optimize excessive parameters, but also require another large-scale pre-training before VL instruction tuning. In this paper, we propose a novel and affordable solution for the effective VL adaption of LLMs, called Mixture-of-Modality Adaptation (MMA). Instead of using large neural networks to connect the image encoder and LLM, MMA adopts lightweight modules, i.e., adapters, to bridge the gap between LLMs and VL tasks, which also enables the joint optimization of the image and language models. Meanwhile, MMA is also equipped with a routing algorithm to help LLMs achieve an automatic shift between single- and multi-modal instructions without compromising their ability of natural language understanding. To validate MMA, we apply it to a recent LLM called LLaMA and term this formed large vision-language instructed model as LaVIN. To validate MMA and LaVIN, we conduct extensive experiments under two setups, namely multimodal science question answering and multimodal dialogue. The experimental results not only demonstrate the competitive performance and the superior training efficiency of LaVIN than existing multimodal LLMs, but also confirm its great potential as a general-purpose chatbot. More importantly, the actual expenditure of LaVIN is extremely cheap, e.g., only 1.4 training hours with 3.8M trainable parameters, greatly confirming the effectiveness of MMA. Our project is released at https://luogen1996.github.io/lavin

    MovingParts: Motion-based 3D Part Discovery in Dynamic Radiance Field

    Full text link
    We present MovingParts, a NeRF-based method for dynamic scene reconstruction and part discovery. We consider motion as an important cue for identifying parts, that all particles on the same part share the common motion pattern. From the perspective of fluid simulation, existing deformation-based methods for dynamic NeRF can be seen as parameterizing the scene motion under the Eulerian view, i.e., focusing on specific locations in space through which the fluid flows as time passes. However, it is intractable to extract the motion of constituting objects or parts using the Eulerian view representation. In this work, we introduce the dual Lagrangian view and enforce representations under the Eulerian/Lagrangian views to be cycle-consistent. Under the Lagrangian view, we parameterize the scene motion by tracking the trajectory of particles on objects. The Lagrangian view makes it convenient to discover parts by factorizing the scene motion as a composition of part-level rigid motions. Experimentally, our method can achieve fast and high-quality dynamic scene reconstruction from even a single moving camera, and the induced part-based representation allows direct applications of part tracking, animation, 3D scene editing, etc.Comment: 10 page

    SSAH: Semi-supervised Adversarial Deep Hashing with Self-paced Hard Sample Generation

    Full text link
    Deep hashing methods have been proved to be effective and efficient for large-scale Web media search. The success of these data-driven methods largely depends on collecting sufficient labeled data, which is usually a crucial limitation in practical cases. The current solutions to this issue utilize Generative Adversarial Network (GAN) to augment data in semi-supervised learning. However, existing GAN-based methods treat image generations and hashing learning as two isolated processes, leading to generation ineffectiveness. Besides, most works fail to exploit the semantic information in unlabeled data. In this paper, we propose a novel Semi-supervised Self-pace Adversarial Hashing method, named SSAH to solve the above problems in a unified framework. The SSAH method consists of an adversarial network (A-Net) and a hashing network (H-Net). To improve the quality of generative images, first, the A-Net learns hard samples with multi-scale occlusions and multi-angle rotated deformations which compete against the learning of accurate hashing codes. Second, we design a novel self-paced hard generation policy to gradually increase the hashing difficulty of generated samples. To make use of the semantic information in unlabeled ones, we propose a semi-supervised consistent loss. The experimental results show that our method can significantly improve state-of-the-art models on both the widely-used hashing datasets and fine-grained datasets

    Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting

    Full text link
    Pre-trained language models (PLMs) have played an increasing role in multimedia research. In terms of vision-language (VL) tasks, they often serve as a language encoder and still require an additional fusion network for VL reasoning, resulting in excessive memory overhead. In this paper, we focus on exploring PLMs as a stand-alone model for VL reasoning tasks. Inspired by the recently popular prompt tuning, we first prove that the processed visual features can be also projected onto the semantic space of PLMs and act as prompt tokens to bridge the gap between single- and multi-modal learning. However, this solution exhibits obvious redundancy in visual information and model inference, and the placement of prompt tokens also greatly affects the final performance. Based on these observations, we further propose a novel transfer learning approach for PLMs, termed Dynamic Visual Prompting (DVP). Concretely, DVP first deploys a cross-attention module to obtain text-related and compact visual prompt tokens, thereby greatly reducing the input length of PLMs. To obtain the optimal placement, we also equip DVP with a reinforcement-learning based search algorithm, which can automatically merge DVP with PLMs for different VL tasks via a very short search process. In addition, we also experiment DVP with the recently popular adapter approach to keep the most parameters of PLMs intact when adapting to VL tasks, helping PLMs achieve a quick shift between single- and multi-modal tasks. We apply DVP to two representative PLMs, namely BERT and T5, and conduct extensive experiments on a set of VL reasoning benchmarks including VQA2.0, GQA and SNLIVE. The experimental results not only show the advantage of DVP on efficiency and performance, but also confirm its superiority in adapting pre-trained language models to VL tasks

    PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation

    Full text link
    Pixel synthesis is a promising research paradigm for image generation, which can well exploit pixel-wise prior knowledge for generation. However, existing methods still suffer from excessive memory footprint and computation overhead. In this paper, we propose a progressive pixel synthesis network towards efficient image generation, coined as PixelFolder. Specifically, PixelFolder formulates image generation as a progressive pixel regression problem and synthesizes images by a multi-stage paradigm, which can greatly reduce the overhead caused by large tensor transformations. In addition, we introduce novel pixel folding operations to further improve model efficiency while maintaining pixel-wise prior knowledge for end-to-end regression. With these innovative designs, we greatly reduce the expenditure of pixel synthesis, e.g., reducing 90% computation and 57% parameters compared to the latest pixel synthesis method called CIPS. To validate our approach, we conduct extensive experiments on two benchmark datasets, namely FFHQ and LSUN Church. The experimental results show that with much less expenditure, PixelFolder obtains new state-of-the-art (SOTA) performance on two benchmark datasets, i.e., 3.77 FID and 2.45 FID on FFHQ and LSUN Church, respectively. Meanwhile, PixelFolder is also more efficient than the SOTA methods like StyleGAN2, reducing about 74% computation and 36% parameters, respectively. These results greatly validate the effectiveness of the proposed PixelFolder.Comment: 11 pages, 7 figure
    • …
    corecore