673 research outputs found

    Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

    Full text link
    We present Video-LLaMA a multi-modal framework that empowers Large Language Models (LLMs) with the capability of understanding both visual and auditory content in the video. Video-LLaMA bootstraps cross-modal training from the frozen pre-trained visual and audio encoders and the frozen LLMs. Unlike previous works that complement LLMs to process the visual or audio signals only, Video-LLaMA enables video comprehension by tackling two challenges: (1) capturing the temporal changes in visual scenes, (2) integrating audio-visual signals. To counter the first challenge, we propose a Video Q-former to assemble a pre-trained image encoder into our video encoder and introduce a video-to-text generation task to learn video-language correspondence. For the second challenge, we leverage ImageBind, a universal embedding model aligning multiple modalities, as the pre-trained audio encoder and introduce an Audio Q-former on top of ImageBind to learn reasonable auditory query embeddings for the LLM module. To align the output of both visual and audio encoders with LLM's embedding space, we first train Video-LLaMA on massive video/image-caption pairs and then tune our model with visual-instruction datasets of moderate amount but higher quality. We found Video-LLaMA shows the ability to perceive and comprehend video content and generate meaningful responses grounded in the visual and auditory information presented in the videos.Comment: Accepted by EMNLP 2023's demo track; Code, Pretrained Model, and Dataset: https://github.com/DAMO-NLP-SG/Video-LLaM

    Surface, size and topological effects for some nematic equilibria on rectangular domains

    Get PDF
    We study nematic equilibria on rectangular domains, in a reduced two-dimensional Landau–de Gennes framework. These reduced equilibria carry over to the three-dimensional framework at a special temperature. There is one essential model variable, ϵ, which is a geometry-dependent and material-dependent variable. We compute the limiting profiles exactly in two distinguished limits: the ϵ→ 0 limit relevant for macroscopic domains and the ϵ→∞ limit relevant for nanoscale domains. The limiting profile has line defects near the shorter edges in the ϵ→∞ limit, whereas we observe fractional point defects in the ϵ→ 0 limit. The analytical studies are complemented by some bifurcation diagrams for these reduced equilibria as a function of ϵ and the rectangular aspect ratio. We also introduce the concept of ‘non-trivial’ topologies and study the relaxation of non-trivial topologies to trivial topologies mediated via point and line defects, with potential consequences for non-equilibrium phenomena and switching dynamics

    Exploiting BERT for End-to-End Aspect-based Sentiment Analysis

    Full text link
    In this paper, we investigate the modeling power of contextualized embeddings from pre-trained language models, e.g. BERT, on the E2E-ABSA task. Specifically, we build a series of simple yet insightful neural baselines to deal with E2E-ABSA. The experimental results show that even with a simple linear classification layer, our BERT-based architecture can outperform state-of-the-art works. Besides, we also standardize the comparative study by consistently utilizing a hold-out validation dataset for model selection, which is largely ignored by previous works. Therefore, our work can serve as a BERT-based benchmark for E2E-ABSA.Comment: NUT workshop@EMNLP-IJCNLP-201

    Multilingual Jailbreak Challenges in Large Language Models

    Full text link
    While large language models (LLMs) exhibit remarkable capabilities across a wide range of tasks, they pose potential safety concerns, such as the ``jailbreak'' problem, wherein malicious instructions can manipulate LLMs to exhibit undesirable behavior. Although several preventive measures have been developed to mitigate the potential risks associated with LLMs, they have primarily focused on English data. In this study, we reveal the presence of multilingual jailbreak challenges within LLMs and consider two potential risk scenarios: unintentional and intentional. The unintentional scenario involves users querying LLMs using non-English prompts and inadvertently bypassing the safety mechanisms, while the intentional scenario concerns malicious users combining malicious instructions with multilingual prompts to deliberately attack LLMs. The experimental results reveal that in the unintentional scenario, the rate of unsafe content increases as the availability of languages decreases. Specifically, low-resource languages exhibit three times the likelihood of encountering harmful content compared to high-resource languages, with both ChatGPT and GPT-4. In the intentional scenario, multilingual prompts can exacerbate the negative impact of malicious instructions, with astonishingly high rates of unsafe output: 80.92\% for ChatGPT and 40.71\% for GPT-4. To handle such a challenge in the multilingual context, we propose a novel \textsc{Self-Defense} framework that automatically generates multilingual training data for safety fine-tuning. Experimental results show that ChatGPT fine-tuned with such data can achieve a substantial reduction in unsafe content generation. Data is available at https://github.com/DAMO-NLP-SG/multilingual-safety-for-LLMs. Warning: This paper contains examples with potentially harmful content
    • …
    corecore