90 research outputs found
Supervised topic models with word order structure for document classification and retrieval learning
One limitation of most existing probabilistic latent topic models for document classification is that the topic model itself does not consider useful side-information, namely, class labels of documents. Topic models, which in turn consider the side-information, popularly known as supervised topic models, do not consider the word order structure in documents. One of the motivations behind considering the word order structure is to capture the semantic fabric of the document. We investigate a low-dimensional latent topic model for document classification. Class label information and word order structure are integrated into a supervised topic model enabling a more effective interaction among such information for solving document classification. We derive a collapsed Gibbs sampler for our model. Likewise, supervised topic models with word order structure have not been explored in document retrieval learning. We propose a novel supervised topic model for document retrieval learning which can be regarded as a pointwise model for tackling the learning-to-rank task. Available relevance assessments and word order structure are integrated into the topic model itself. We conduct extensive experiments on several publicly available benchmark datasets, and show that our model improves upon the state-of-the-art models
Deep Recurrent Generative Decoder for Abstractive Text Summarization
We propose a new framework for abstractive text summarization based on a
sequence-to-sequence oriented encoder-decoder model equipped with a deep
recurrent generative decoder (DRGN).
Latent structure information implied in the target summaries is learned based
on a recurrent latent random model for improving the summarization quality.
Neural variational inference is employed to address the intractable posterior
inference for the recurrent latent variables.
Abstractive summaries are generated based on both the generative latent
variables and the discriminative deterministic states.
Extensive experiments on some benchmark datasets in different languages show
that DRGN achieves improvements over the state-of-the-art methods.Comment: 10 pages, EMNLP 201
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
We present Video-LLaMA a multi-modal framework that empowers Large Language
Models (LLMs) with the capability of understanding both visual and auditory
content in the video. Video-LLaMA bootstraps cross-modal training from the
frozen pre-trained visual and audio encoders and the frozen LLMs. Unlike
previous works that complement LLMs to process the visual or audio signals
only, Video-LLaMA enables video comprehension by tackling two challenges: (1)
capturing the temporal changes in visual scenes, (2) integrating audio-visual
signals. To counter the first challenge, we propose a Video Q-former to
assemble a pre-trained image encoder into our video encoder and introduce a
video-to-text generation task to learn video-language correspondence. For the
second challenge, we leverage ImageBind, a universal embedding model aligning
multiple modalities, as the pre-trained audio encoder and introduce an Audio
Q-former on top of ImageBind to learn reasonable auditory query embeddings for
the LLM module. To align the output of both visual and audio encoders with
LLM's embedding space, we first train Video-LLaMA on massive
video/image-caption pairs and then tune our model with visual-instruction
datasets of moderate amount but higher quality. We found Video-LLaMA shows the
ability to perceive and comprehend video content and generate meaningful
responses grounded in the visual and auditory information presented in the
videos.Comment: Accepted by EMNLP 2023's demo track; Code, Pretrained Model, and
Dataset: https://github.com/DAMO-NLP-SG/Video-LLaM
Is GPT-4 a Good Data Analyst?
As large language models (LLMs) have demonstrated their powerful capabilities
in plenty of domains and tasks, including context understanding, code
generation, language generation, data storytelling, etc., many data analysts
may raise concerns if their jobs will be replaced by AI. This controversial
topic has drawn a lot of attention in public. However, we are still at a stage
of divergent opinions without any definitive conclusion. Motivated by this, we
raise the research question of "is GPT-4 a good data analyst?" in this work and
aim to answer it by conducting head-to-head comparative studies. In detail, we
regard GPT-4 as a data analyst to perform end-to-end data analysis with
databases from a wide range of domains. We propose a framework to tackle the
problems by carefully designing the prompts for GPT-4 to conduct experiments.
We also design several task-specific evaluation metrics to systematically
compare the performance between several professional human data analysts and
GPT-4. Experimental results show that GPT-4 can achieve comparable performance
to humans. We also provide in-depth discussions about our results to shed light
on further studies before we reach the conclusion that GPT-4 can replace data
analysts.Comment: 11 pages, 2 figure
- …