Search CORE

256 research outputs found

MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning

Author: Wang Jianghui
Wang Yuxuan
Zhao Dongyan
Zheng Zilong
Publication venue
Publication date: 03/06/2023
Field of study

We introduce MoviePuzzle, a novel challenge that targets visual narrative reasoning and holistic movie understanding. Despite the notable progress that has been witnessed in the realm of video understanding, most prior works fail to present tasks and models to address holistic video understanding and the innate visual narrative structures existing in long-form videos. To tackle this quandary, we put forth MoviePuzzle task that amplifies the temporal feature learning and structure learning of video models by reshuffling the shot, frame, and clip layers of movie segments in the presence of video-dialogue information. We start by establishing a carefully refined dataset based on MovieNet by dissecting movies into hierarchical layers and randomly permuting the orders. Besides benchmarking the MoviePuzzle with prior arts on movie understanding, we devise a Hierarchical Contrastive Movie Clustering (HCMC) model that considers the underlying structure and visual semantic orders for movie reordering. Specifically, through a pairwise and contrastive learning approach, we train models to predict the correct order of each layer. This equips them with the knack for deciphering the visual narrative structure of movies and handling the disorder lurking in video data. Experiments show that our approach outperforms existing state-of-the-art methods on the \MoviePuzzle benchmark, underscoring its efficacy

arXiv.org e-Print Archive

Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training

Author: Wang Jianghui
Wang Yuxuan
Zhao Dongyan
Zheng Zilong
Publication venue
Publication date: 30/05/2023
Field of study

We introduce CDBERT, a new learning paradigm that enhances the semantics understanding ability of the Chinese PLMs with dictionary knowledge and structure of Chinese characters. We name the two core modules of CDBERT as Shuowen and Jiezi, where Shuowen refers to the process of retrieving the most appropriate meaning from Chinese dictionaries and Jiezi refers to the process of enhancing characters' glyph representations with structure understanding. To facilitate dictionary understanding, we propose three pre-training tasks, i.e., Masked Entry Modeling, Contrastive Learning for Synonym and Antonym, and Example Learning. We evaluate our method on both modern Chinese understanding benchmark CLUE and ancient Chinese benchmark CCLUE. Moreover, we propose a new polysemy discrimination task PolyMRC based on the collected dictionary of ancient Chinese. Our paradigm demonstrates consistent improvements on previous Chinese PLMs across all tasks. Moreover, our approach yields significant boosting on few-shot setting of ancient Chinese understanding.Comment: To appear at ACL 2023 Finding

arXiv.org e-Print Archive

The Terrestrial Planet Formation around M Dwarfs: In-situ, Inward Migration or Reversed Migration

Author: Ji Jianghui
Pan Mengrui
Wang Su
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/12/2021
Field of study

Terrestrial planets are commonly observed to orbit M dwarfs with close-in trajectories. In this work, we extensively perform N-body simulations of planetesimal accretion with three models of in-situ, inward migration and reversed migration to explore terrestrial formation in tightly compact systems of M dwarfs. In the simulations, the solid disks are assumed to be 0.01\% of the masses of host stars and spread from 0.01 to 0.5 AU with the surface density profile scaling with

r^{-k}

according to the observations. Our results show that in-situ scenario may produce

7.77^{+3.23}_{-3.77}

terrestrial planets with an average mass of

1.23^{+4.01}_{-0.93} \ M_{\oplus}

around M dwarfs. The number of planets tends to increase as the disk slope is steeper or with a larger stellar mass. Moreover, we show that

2.55^{+1.45}_{-1.55}

planets with mass of

3.76^{+8.77}_{-3.46} \ M_{\oplus}

are formed in the systems via inward migration, while

2.85^{+1.15}_{-0.85}

planets with

3.01^{+13.77}_{-2.71} \ M_{\oplus}

are yielded under reversed migration. Migration scenarios can also deliver plentiful water from the exterior of ice line to the interior due to more efficient accretion. The simulation outcomes of reversed migration model produce the best matching with observations, being suggestive of a likely mechanism for planetary formation around M dwarfs.Comment: 13 pages, 9 figures, accepted for publication in MNRA

arXiv.org e-Print Archive

A method for aligning RNA secondary structures and its application to RNA motif detection

Author: Hu Jun
Liu Jianghui
Tian Bin
Wang Jason TL
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Alignment of RNA secondary structures is important in studying functional RNA motifs. In recent years, much progress has been made in RNA motif finding and structure alignment. However, existing tools either require a large number of prealigned structures or suffer from high time complexities. This makes it difficult for the tools to process RNAs whose prealigned structures are unavailable or process very large RNA structure databases. RESULTS: We present here an efficient tool called RSmatch for aligning RNA secondary structures and for motif detection. Motivated by widely used algorithms for RNA folding, we decompose an RNA secondary structure into a set of atomic structure components that are further organized by a tree model to capture the structural particularities. RSmatch can find the optimal global or local alignment between two RNA secondary structures using two scoring matrices, one for single-stranded regions and the other for double-stranded regions. The time complexity of RSmatch is O(mn) where m is the size of the query structure and n that of the subject structure. When applied to searching a structure database, RSmatch can find similar RNA substructures, and is capable of conducting multiple structure alignment and iterative database search. Therefore it can be used to identify functional RNA motifs. The accuracy of RSmatch is tested by experiments using a number of known RNA structures, including simple stem-loops and complex structures containing junctions. CONCLUSION: With respect to computing efficiency and accuracy, RSmatch compares favorably with other tools for RNA structure alignment and motif detection. This tool shall be useful to researchers interested in comparing RNA structures obtained from wet lab experiments or RNA folding programs, particularly when the size of the structure dataset is large

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Task-Robust Pre-Training for Worst-Case Downstream Adaptation

Author: Chen Yang
Fang Cong
Lin Zhouchen
Wang Jianghui
Xie Xingyu
Publication venue
Publication date: 05/07/2023
Field of study

Pre-training has achieved remarkable success when transferred to downstream tasks. In machine learning, we care about not only the good performance of a model but also its behavior under reasonable shifts of condition. The same philosophy holds when pre-training a foundation model. However, the foundation model may not uniformly behave well for a series of related downstream tasks. This happens, for example, when conducting mask recovery regression where the recovery ability or the training instances diverge like pattern features are extracted dominantly on pre-training, but semantic features are also required on a downstream task. This paper considers pre-training a model that guarantees a uniformly good performance over the downstream tasks. We call this goal as

\textit{downstream-task robustness}

. Our method first separates the upstream task into several representative ones and applies a simple minimax loss for pre-training. We then design an efficient algorithm to solve the minimax loss and prove its convergence in the convex setting. In the experiments, we show both on large-scale natural language processing and computer vision datasets our method increases the metrics on worse-case downstream tasks. Additionally, some theoretical explanations for why our loss is beneficial are provided. Specifically, we show fewer samples are inherently required for the most challenging downstream task in some cases

arXiv.org e-Print Archive