724 research outputs found
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
We present Video-LLaMA a multi-modal framework that empowers Large Language
Models (LLMs) with the capability of understanding both visual and auditory
content in the video. Video-LLaMA bootstraps cross-modal training from the
frozen pre-trained visual and audio encoders and the frozen LLMs. Unlike
previous works that complement LLMs to process the visual or audio signals
only, Video-LLaMA enables video comprehension by tackling two challenges: (1)
capturing the temporal changes in visual scenes, (2) integrating audio-visual
signals. To counter the first challenge, we propose a Video Q-former to
assemble a pre-trained image encoder into our video encoder and introduce a
video-to-text generation task to learn video-language correspondence. For the
second challenge, we leverage ImageBind, a universal embedding model aligning
multiple modalities, as the pre-trained audio encoder and introduce an Audio
Q-former on top of ImageBind to learn reasonable auditory query embeddings for
the LLM module. To align the output of both visual and audio encoders with
LLM's embedding space, we first train Video-LLaMA on massive
video/image-caption pairs and then tune our model with visual-instruction
datasets of moderate amount but higher quality. We found Video-LLaMA shows the
ability to perceive and comprehend video content and generate meaningful
responses grounded in the visual and auditory information presented in the
videos.Comment: Accepted by EMNLP 2023's demo track; Code, Pretrained Model, and
Dataset: https://github.com/DAMO-NLP-SG/Video-LLaM
Surface, size and topological effects for some nematic equilibria on rectangular domains
We study nematic equilibria on rectangular domains, in a reduced two-dimensional Landau–de Gennes framework. These reduced equilibria carry over to the three-dimensional framework at a special temperature. There is one essential model variable, ϵ, which is a geometry-dependent and material-dependent variable. We compute the limiting profiles exactly in two distinguished limits: the ϵ→ 0 limit relevant for macroscopic domains and the ϵ→∞ limit relevant for nanoscale domains. The limiting profile has line defects near the shorter edges in the ϵ→∞ limit, whereas we observe fractional point defects in the ϵ→ 0 limit. The analytical studies are complemented by some bifurcation diagrams for these reduced equilibria as a function of ϵ and the rectangular aspect ratio. We also introduce the concept of ‘non-trivial’ topologies and study the relaxation of non-trivial topologies to trivial topologies mediated via point and line defects, with potential consequences for non-equilibrium phenomena and switching dynamics
Exploiting BERT for End-to-End Aspect-based Sentiment Analysis
In this paper, we investigate the modeling power of contextualized embeddings
from pre-trained language models, e.g. BERT, on the E2E-ABSA task.
Specifically, we build a series of simple yet insightful neural baselines to
deal with E2E-ABSA. The experimental results show that even with a simple
linear classification layer, our BERT-based architecture can outperform
state-of-the-art works. Besides, we also standardize the comparative study by
consistently utilizing a hold-out validation dataset for model selection, which
is largely ignored by previous works. Therefore, our work can serve as a
BERT-based benchmark for E2E-ABSA.Comment: NUT workshop@EMNLP-IJCNLP-201
Multilingual Jailbreak Challenges in Large Language Models
While large language models (LLMs) exhibit remarkable capabilities across a
wide range of tasks, they pose potential safety concerns, such as the
``jailbreak'' problem, wherein malicious instructions can manipulate LLMs to
exhibit undesirable behavior. Although several preventive measures have been
developed to mitigate the potential risks associated with LLMs, they have
primarily focused on English data. In this study, we reveal the presence of
multilingual jailbreak challenges within LLMs and consider two potential risk
scenarios: unintentional and intentional. The unintentional scenario involves
users querying LLMs using non-English prompts and inadvertently bypassing the
safety mechanisms, while the intentional scenario concerns malicious users
combining malicious instructions with multilingual prompts to deliberately
attack LLMs. The experimental results reveal that in the unintentional
scenario, the rate of unsafe content increases as the availability of languages
decreases. Specifically, low-resource languages exhibit three times the
likelihood of encountering harmful content compared to high-resource languages,
with both ChatGPT and GPT-4. In the intentional scenario, multilingual prompts
can exacerbate the negative impact of malicious instructions, with
astonishingly high rates of unsafe output: 80.92\% for ChatGPT and 40.71\% for
GPT-4. To handle such a challenge in the multilingual context, we propose a
novel \textsc{Self-Defense} framework that automatically generates multilingual
training data for safety fine-tuning. Experimental results show that ChatGPT
fine-tuned with such data can achieve a substantial reduction in unsafe content
generation. Data is available at
https://github.com/DAMO-NLP-SG/multilingual-safety-for-LLMs. Warning: This
paper contains examples with potentially harmful content
- …