Search CORE

309 research outputs found

A Fully Convolutional Tri-branch Network (FCTN) for Domain Adaptation

Author: Kuo C. -C. Jay
Liang Chen
Zhang Junting
Publication venue
Publication date: 26/02/2018
Field of study

A domain adaptation method for urban scene segmentation is proposed in this work. We develop a fully convolutional tri-branch network, where two branches assign pseudo labels to images in the unlabeled target domain while the third branch is trained with supervision based on images in the pseudo-labeled target domain. The re-labeling and re-training processes alternate. With this design, the tri-branch network learns target-specific discriminative representations progressively and, as a result, the cross-domain capability of the segmenter improves. We evaluate the proposed network on large-scale domain adaptation experiments using both synthetic (GTA) and real (Cityscapes) images. It is shown that our solution achieves the state-of-the-art performance and it outperforms previous methods by a significant margin.Comment: Accepted by ICASSP 201

arXiv.org e-Print Archive

Crossref

Experimental Results of Underwater Sound Speed Profile Inversion by Few-shot Multi-task Learning

Author: Gao Fan
Huang Wei
Wang Junting
Zhang Hao
Publication venue
Publication date: 18/10/2023
Field of study

Underwater Sound Speed Profile (SSP) distribution has great influence on the propagation mode of acoustic signal, thus the fast and accurate estimation of SSP is of great importance in building underwater observation systems. The state-of-the-art SSP inversion methods include frameworks of matched field processing (MFP), compressive sensing (CS), and feedforeward neural networks (FNN), among which the FNN shows better real-time performance while maintain the same level of accuracy. However, the training of FNN needs quite a lot historical SSP samples, which is diffcult to be satisfied in many ocean areas. This situation is called few-shot learning. To tackle this issue, we propose a multi-task learning (MTL) model with partial parameter sharing among different traning tasks. By MTL, common features could be extracted, thus accelerating the learning process on given tasks, and reducing the demand for reference samples, so as to enhance the generalization ability in few-shot learning. To verify the feasibility and effectiveness of MTL, a deep-ocean experiment was held in April 2023 at the South China Sea. Results shows that MTL outperforms the state-of-the-art methods in terms of accuracy for SSP inversion, while inherits the real-time advantage of FNN during the inversion stage

arXiv.org e-Print Archive

Jahn-Teller distortion driven ferromagnetism in a perovskite fluoride monolayer

Author: Ji Ke
Shen Xiaofan
Wang Jianli
Wu Zongshuo
Zhang Junting
Publication venue
Publication date: 18/04/2023
Field of study

The Jahn-Teller distortion and the resulting orbital order usually cause some fascinating correlated electronic behaviors, and generally lead to antiferromagnetism in perovskite bulks. Here we demonstrate that the Jahn-Teller distortion present in the perovskite fluoride KCrF

_3

bulk can be retained to the two-dimensional limit, resulting in a staggered orbital order and ferromagnetism in the perovskite monolayer. Octahedral tilt and rotation distortion also appear in the ground-state structure of the perovskite monolayer, which have minor effects on the electronic and magnetic properties with respect to the Jahn-Teller distortion. In addition, in the prototype phase without structural distortion, the partial occupation of the

e_g

orbitals leads to a ferromagnetic metallic state. This work facilitates the design of two-dimensional ferromagnets and functional properties based on Jahn-Teller distortion and orbital orderComment: 8 pages, 5 figures, 1 tabl

arXiv.org e-Print Archive

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

Author: Ge Yuying
Li Hongsheng
Lin Ziyi
Pan Junting
Qiao Yu
Wang Yi
Zhang Renrui
Zhu Xiatian
Publication venue
Publication date: 15/06/2023
Field of study

Video Question Answering (VideoQA) has been significantly advanced from the scaling of recent Large Language Models (LLMs). The key idea is to convert the visual information into the language feature space so that the capacity of LLMs can be fully exploited. Existing VideoQA methods typically take two paradigms: (1) learning cross-modal alignment, and (2) using an off-the-shelf captioning model to describe the visual data. However, the first design needs costly training on many extra multi-modal data, whilst the second is further limited by limited domain generalization. To address these limitations, a simple yet effective Retrieving-to-Answer (R2A) framework is proposed.Given an input video, R2A first retrieves a set of semantically similar texts from a generic text corpus using a pre-trained multi-modal model (e.g., CLIP). With both the question and the retrieved texts, a LLM (e.g., DeBERTa) can be directly used to yield a desired answer. Without the need for cross-modal fine-tuning, R2A allows for all the key components (e.g., LLM, retrieval model, and text corpus) to plug-and-play. Extensive experiments on several VideoQA benchmarks show that despite with 1.3B parameters and no fine-tuning, our R2A can outperform the 61 times larger Flamingo-80B model even additionally trained on nearly 2.1B multi-modal data

arXiv.org e-Print Archive

Simulation of Solidified Microstructure and Experimental Comparative Study of Twin-Roll Casting Aluminum Alloys

Author: Junting Zhang
Teng Ma
Xiaochao Cui
Xiaosi Sun
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Crossref