7 research outputs found

    On the Difference of BERT-style and CLIP-style Text Encoders

    Full text link
    Masked language modeling (MLM) has been one of the most popular pretraining recipes in natural language processing, e.g., BERT, one of the representative models. Recently, contrastive language-image pretraining (CLIP) has also attracted attention, especially its vision models that achieve excellent performance on a broad range of vision tasks. However, few studies are dedicated to studying the text encoders learned by CLIP. In this paper, we analyze the difference between BERT-style and CLIP-style text encoders from three experiments: (i) general text understanding, (ii) vision-centric text understanding, and (iii) text-to-image generation. Experimental analyses show that although CLIP-style text encoders underperform BERT-style ones for general text understanding tasks, they are equipped with a unique ability, i.e., synesthesia, for the cross-modal association, which is more similar to the senses of humans.Comment: Natural Language Processing. 10 pages, 1 figure. Findings of ACL-202

    Plum: Prompt Learning using Metaheuristic

    Full text link
    Since the emergence of large language models, prompt learning has become a popular method for optimizing and customizing these models. Special prompts, such as Chain-of-Thought, have even revealed previously unknown reasoning capabilities within these models. However, the progress of discovering effective prompts has been slow, driving a desire for general prompt optimization methods. Unfortunately, few existing prompt learning methods satisfy the criteria of being truly "general", i.e., automatic, discrete, black-box, gradient-free, and interpretable all at once. In this paper, we introduce metaheuristics, a branch of discrete non-convex optimization methods with over 100 options, as a promising approach to prompt learning. Within our paradigm, we test six typical methods: hill climbing, simulated annealing, genetic algorithms with/without crossover, tabu search, and harmony search, demonstrating their effectiveness in black-box prompt learning and Chain-of-Thought prompt tuning. Furthermore, we show that these methods can be used to discover more human-understandable prompts that were previously unknown, opening the door to a cornucopia of possibilities in prompt optimization. We release all the codes in \url{https://github.com/research4pan/Plum}

    TeViS:Translating Text Synopses to Video Storyboards

    Full text link
    A video storyboard is a roadmap for video creation which consists of shot-by-shot images to visualize key plots in a text synopsis. Creating video storyboards, however, remains challenging which not only requires cross-modal association between high-level texts and images but also demands long-term reasoning to make transitions smooth across shots. In this paper, we propose a new task called Text synopsis to Video Storyboard (TeViS) which aims to retrieve an ordered sequence of images as the video storyboard to visualize the text synopsis. We construct a MovieNet-TeViS dataset based on the public MovieNet dataset. It contains 10K text synopses each paired with keyframes manually selected from corresponding movies by considering both relevance and cinematic coherence. To benchmark the task, we present strong CLIP-based baselines and a novel VQ-Trans. VQ-Trans first encodes text synopsis and images into a joint embedding space and uses vector quantization (VQ) to improve the visual representation. Then, it auto-regressively generates a sequence of visual features for retrieval and ordering. Experimental results demonstrate that VQ-Trans significantly outperforms prior methods and the CLIP-based baselines. Nevertheless, there is still a large gap compared to human performance suggesting room for promising future work. The code and data are available at: \url{https://ruc-aimind.github.io/projects/TeViS/}Comment: Accepted to ACM Multimedia 202

    Resilience-Oriented Planning of Urban Distribution System Source–Network–Load–Storage in the Context of High-Penetrated Building-Integrated Resources

    No full text
    Building-integrated flexible resources can offer economical availability to accommodate high-penetrated renewable energy sources (RESs), which can be potentially coordinated to achieve cost-effective supply. This paper proposes a resilience-oriented planning model of urban distribution system source–network–load–storage in the context of high-penetrated building-integrated resources. In this model, source–network–load–storage resources are cost-optimally planned, including the lines, soft open point (SOP), building-integrated photovoltaics (BIPVs), building-integrated wind turbine (BIWT), building-integrated energy storage system (ESS), etc. To enhance fault recovery capability during extreme faults, fault scenarios are incorporated into the distribution system operation via coupled multiple recovery stages. The resilience-oriented planning is a thorny problem due to its source–network–load–storage couplings, normal-fault couplings, etc. The original resilience-oriented planning is reformulated as a mixed-integer linear programming (MILP) problem, which can then be solved with a two-stage method and evaluated via a multi-dimensional evaluation metrics. The proposed planning methodology is benchmarked over a Portugal 54-node urban distribution system to verify the superiority and effectiveness on the system economy and resilience levels. Case studies show that the proposed methodology can exploit the optimal synergies of different source–network–load–storage components and enhance system dispatchability
    corecore