7 research outputs found
On the Difference of BERT-style and CLIP-style Text Encoders
Masked language modeling (MLM) has been one of the most popular pretraining
recipes in natural language processing, e.g., BERT, one of the representative
models. Recently, contrastive language-image pretraining (CLIP) has also
attracted attention, especially its vision models that achieve excellent
performance on a broad range of vision tasks. However, few studies are
dedicated to studying the text encoders learned by CLIP. In this paper, we
analyze the difference between BERT-style and CLIP-style text encoders from
three experiments: (i) general text understanding, (ii) vision-centric text
understanding, and (iii) text-to-image generation. Experimental analyses show
that although CLIP-style text encoders underperform BERT-style ones for general
text understanding tasks, they are equipped with a unique ability, i.e.,
synesthesia, for the cross-modal association, which is more similar to the
senses of humans.Comment: Natural Language Processing. 10 pages, 1 figure. Findings of ACL-202
Plum: Prompt Learning using Metaheuristic
Since the emergence of large language models, prompt learning has become a
popular method for optimizing and customizing these models. Special prompts,
such as Chain-of-Thought, have even revealed previously unknown reasoning
capabilities within these models. However, the progress of discovering
effective prompts has been slow, driving a desire for general prompt
optimization methods. Unfortunately, few existing prompt learning methods
satisfy the criteria of being truly "general", i.e., automatic, discrete,
black-box, gradient-free, and interpretable all at once. In this paper, we
introduce metaheuristics, a branch of discrete non-convex optimization methods
with over 100 options, as a promising approach to prompt learning. Within our
paradigm, we test six typical methods: hill climbing, simulated annealing,
genetic algorithms with/without crossover, tabu search, and harmony search,
demonstrating their effectiveness in black-box prompt learning and
Chain-of-Thought prompt tuning. Furthermore, we show that these methods can be
used to discover more human-understandable prompts that were previously
unknown, opening the door to a cornucopia of possibilities in prompt
optimization. We release all the codes in
\url{https://github.com/research4pan/Plum}
TeViS:Translating Text Synopses to Video Storyboards
A video storyboard is a roadmap for video creation which consists of
shot-by-shot images to visualize key plots in a text synopsis. Creating video
storyboards, however, remains challenging which not only requires cross-modal
association between high-level texts and images but also demands long-term
reasoning to make transitions smooth across shots. In this paper, we propose a
new task called Text synopsis to Video Storyboard (TeViS) which aims to
retrieve an ordered sequence of images as the video storyboard to visualize the
text synopsis. We construct a MovieNet-TeViS dataset based on the public
MovieNet dataset. It contains 10K text synopses each paired with keyframes
manually selected from corresponding movies by considering both relevance and
cinematic coherence. To benchmark the task, we present strong CLIP-based
baselines and a novel VQ-Trans. VQ-Trans first encodes text synopsis and images
into a joint embedding space and uses vector quantization (VQ) to improve the
visual representation. Then, it auto-regressively generates a sequence of
visual features for retrieval and ordering. Experimental results demonstrate
that VQ-Trans significantly outperforms prior methods and the CLIP-based
baselines. Nevertheless, there is still a large gap compared to human
performance suggesting room for promising future work. The code and data are
available at: \url{https://ruc-aimind.github.io/projects/TeViS/}Comment: Accepted to ACM Multimedia 202
Resilience-Oriented Planning of Urban Distribution System Source–Network–Load–Storage in the Context of High-Penetrated Building-Integrated Resources
Building-integrated flexible resources can offer economical availability to accommodate high-penetrated renewable energy sources (RESs), which can be potentially coordinated to achieve cost-effective supply. This paper proposes a resilience-oriented planning model of urban distribution system source–network–load–storage in the context of high-penetrated building-integrated resources. In this model, source–network–load–storage resources are cost-optimally planned, including the lines, soft open point (SOP), building-integrated photovoltaics (BIPVs), building-integrated wind turbine (BIWT), building-integrated energy storage system (ESS), etc. To enhance fault recovery capability during extreme faults, fault scenarios are incorporated into the distribution system operation via coupled multiple recovery stages. The resilience-oriented planning is a thorny problem due to its source–network–load–storage couplings, normal-fault couplings, etc. The original resilience-oriented planning is reformulated as a mixed-integer linear programming (MILP) problem, which can then be solved with a two-stage method and evaluated via a multi-dimensional evaluation metrics. The proposed planning methodology is benchmarked over a Portugal 54-node urban distribution system to verify the superiority and effectiveness on the system economy and resilience levels. Case studies show that the proposed methodology can exploit the optimal synergies of different source–network–load–storage components and enhance system dispatchability