256 research outputs found
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
We introduce MoviePuzzle, a novel challenge that targets visual narrative
reasoning and holistic movie understanding. Despite the notable progress that
has been witnessed in the realm of video understanding, most prior works fail
to present tasks and models to address holistic video understanding and the
innate visual narrative structures existing in long-form videos. To tackle this
quandary, we put forth MoviePuzzle task that amplifies the temporal feature
learning and structure learning of video models by reshuffling the shot, frame,
and clip layers of movie segments in the presence of video-dialogue
information. We start by establishing a carefully refined dataset based on
MovieNet by dissecting movies into hierarchical layers and randomly permuting
the orders. Besides benchmarking the MoviePuzzle with prior arts on movie
understanding, we devise a Hierarchical Contrastive Movie Clustering (HCMC)
model that considers the underlying structure and visual semantic orders for
movie reordering. Specifically, through a pairwise and contrastive learning
approach, we train models to predict the correct order of each layer. This
equips them with the knack for deciphering the visual narrative structure of
movies and handling the disorder lurking in video data. Experiments show that
our approach outperforms existing state-of-the-art methods on the \MoviePuzzle
benchmark, underscoring its efficacy
Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training
We introduce CDBERT, a new learning paradigm that enhances the semantics
understanding ability of the Chinese PLMs with dictionary knowledge and
structure of Chinese characters. We name the two core modules of CDBERT as
Shuowen and Jiezi, where Shuowen refers to the process of retrieving the most
appropriate meaning from Chinese dictionaries and Jiezi refers to the process
of enhancing characters' glyph representations with structure understanding. To
facilitate dictionary understanding, we propose three pre-training tasks, i.e.,
Masked Entry Modeling, Contrastive Learning for Synonym and Antonym, and
Example Learning. We evaluate our method on both modern Chinese understanding
benchmark CLUE and ancient Chinese benchmark CCLUE. Moreover, we propose a new
polysemy discrimination task PolyMRC based on the collected dictionary of
ancient Chinese. Our paradigm demonstrates consistent improvements on previous
Chinese PLMs across all tasks. Moreover, our approach yields significant
boosting on few-shot setting of ancient Chinese understanding.Comment: To appear at ACL 2023 Finding
The Terrestrial Planet Formation around M Dwarfs: In-situ, Inward Migration or Reversed Migration
Terrestrial planets are commonly observed to orbit M dwarfs with close-in
trajectories. In this work, we extensively perform N-body simulations of
planetesimal accretion with three models of in-situ, inward migration and
reversed migration to explore terrestrial formation in tightly compact systems
of M dwarfs. In the simulations, the solid disks are assumed to be 0.01\% of
the masses of host stars and spread from 0.01 to 0.5 AU with the surface
density profile scaling with according to the observations. Our
results show that in-situ scenario may produce
terrestrial planets with an average mass of
around M dwarfs. The number of planets tends to increase as the disk slope is
steeper or with a larger stellar mass. Moreover, we show that
planets with mass of
are formed in the systems via inward migration, while
planets with are yielded under reversed
migration. Migration scenarios can also deliver plentiful water from the
exterior of ice line to the interior due to more efficient accretion. The
simulation outcomes of reversed migration model produce the best matching with
observations, being suggestive of a likely mechanism for planetary formation
around M dwarfs.Comment: 13 pages, 9 figures, accepted for publication in MNRA
A method for aligning RNA secondary structures and its application to RNA motif detection
BACKGROUND: Alignment of RNA secondary structures is important in studying functional RNA motifs. In recent years, much progress has been made in RNA motif finding and structure alignment. However, existing tools either require a large number of prealigned structures or suffer from high time complexities. This makes it difficult for the tools to process RNAs whose prealigned structures are unavailable or process very large RNA structure databases. RESULTS: We present here an efficient tool called RSmatch for aligning RNA secondary structures and for motif detection. Motivated by widely used algorithms for RNA folding, we decompose an RNA secondary structure into a set of atomic structure components that are further organized by a tree model to capture the structural particularities. RSmatch can find the optimal global or local alignment between two RNA secondary structures using two scoring matrices, one for single-stranded regions and the other for double-stranded regions. The time complexity of RSmatch is O(mn) where m is the size of the query structure and n that of the subject structure. When applied to searching a structure database, RSmatch can find similar RNA substructures, and is capable of conducting multiple structure alignment and iterative database search. Therefore it can be used to identify functional RNA motifs. The accuracy of RSmatch is tested by experiments using a number of known RNA structures, including simple stem-loops and complex structures containing junctions. CONCLUSION: With respect to computing efficiency and accuracy, RSmatch compares favorably with other tools for RNA structure alignment and motif detection. This tool shall be useful to researchers interested in comparing RNA structures obtained from wet lab experiments or RNA folding programs, particularly when the size of the structure dataset is large
Task-Robust Pre-Training for Worst-Case Downstream Adaptation
Pre-training has achieved remarkable success when transferred to downstream
tasks. In machine learning, we care about not only the good performance of a
model but also its behavior under reasonable shifts of condition. The same
philosophy holds when pre-training a foundation model. However, the foundation
model may not uniformly behave well for a series of related downstream tasks.
This happens, for example, when conducting mask recovery regression where the
recovery ability or the training instances diverge like pattern features are
extracted dominantly on pre-training, but semantic features are also required
on a downstream task. This paper considers pre-training a model that guarantees
a uniformly good performance over the downstream tasks. We call this goal as
. Our method first separates the upstream
task into several representative ones and applies a simple minimax loss for
pre-training. We then design an efficient algorithm to solve the minimax loss
and prove its convergence in the convex setting. In the experiments, we show
both on large-scale natural language processing and computer vision datasets
our method increases the metrics on worse-case downstream tasks. Additionally,
some theoretical explanations for why our loss is beneficial are provided.
Specifically, we show fewer samples are inherently required for the most
challenging downstream task in some cases
- β¦