40 research outputs found
Adaptive Region Embedding for Text Classification
Deep learning models such as convolutional neural networks and recurrent
networks are widely applied in text classification. In spite of their great
success, most deep learning models neglect the importance of modeling context
information, which is crucial to understanding texts. In this work, we propose
the Adaptive Region Embedding to learn context representation to improve text
classification. Specifically, a metanetwork is learned to generate a context
matrix for each region, and each word interacts with its corresponding context
matrix to produce the regional representation for further classification.
Compared to previous models that are designed to capture context information,
our model contains less parameters and is more flexible. We extensively
evaluate our method on 8 benchmark datasets for text classification. The
experimental results prove that our method achieves state-of-the-art
performances and effectively avoids word ambiguity.Comment: AAAI 201
Online Open-set Semi-supervised Object Detection via Semi-supervised Outlier Filtering
Open-set semi-supervised object detection (OSSOD) methods aim to utilize
practical unlabeled datasets with out-of-distribution (OOD) instances for
object detection. The main challenge in OSSOD is distinguishing and filtering
the OOD instances from the in-distribution (ID) instances during
pseudo-labeling. The previous method uses an offline OOD detection network
trained only with labeled data for solving this problem. However, the scarcity
of available data limits the potential for improvement. Meanwhile, training
separately leads to low efficiency. To alleviate the above issues, this paper
proposes a novel end-to-end online framework that improves performance and
efficiency by mining more valuable instances from unlabeled data. Specifically,
we first propose a semi-supervised OOD detection strategy to mine valuable ID
and OOD instances in unlabeled datasets for training. Then, we constitute an
online end-to-end trainable OSSOD framework by integrating the OOD detection
head into the object detector, making it jointly trainable with the original
detection task. Our experimental results show that our method works well on
several benchmarks, including the partially labeled COCO dataset with open-set
classes and the fully labeled COCO dataset with the additional large-scale
open-set unlabeled dataset, OpenImages. Compared with previous OSSOD methods,
our approach achieves the best performance on COCO with OpenImages by +0.94
mAP, reaching 44.07 mAP
CLIP model is an Efficient Online Lifelong Learner
Online Lifelong Learning (OLL) addresses the challenge of learning from
continuous and non-stationary data streams. Existing online lifelong learning
methods based on image classification models often require preset conditions
such as the total number of classes or maximum memory capacity, which hinders
the realization of real never-ending learning and renders them impractical for
real-world scenarios. In this work, we propose that vision-language models,
such as Contrastive Language-Image Pretraining (CLIP), are more suitable
candidates for online lifelong learning. We discover that maintaining symmetry
between image and text is crucial during Parameter-Efficient Tuning (PET) for
CLIP model in online lifelong learning. To this end, we introduce the Symmetric
Image-Text (SIT) tuning strategy. We conduct extensive experiments on multiple
lifelong learning benchmark datasets and elucidate the effectiveness of SIT
through gradient analysis. Additionally, we assess the impact of lifelong
learning on generalizability of CLIP and found that tuning the image encoder is
beneficial for lifelong learning, while tuning the text encoder aids in
zero-shot learning
Rethinking Class-Incremental Learning from a Dynamic Imbalanced Learning Perspective
Deep neural networks suffer from catastrophic forgetting when continually
learning new concepts. In this paper, we analyze this problem from a data
imbalance point of view. We argue that the imbalance between old task and new
task data contributes to forgetting of the old tasks. Moreover, the increasing
imbalance ratio during incremental learning further aggravates the problem. To
address the dynamic imbalance issue, we propose Uniform Prototype Contrastive
Learning (UPCL), where uniform and compact features are learned. Specifically,
we generate a set of non-learnable uniform prototypes before each task starts.
Then we assign these uniform prototypes to each class and guide the feature
learning through prototype contrastive learning. We also dynamically adjust the
relative margin between old and new classes so that the feature distribution
will be maintained balanced and compact. Finally, we demonstrate through
extensive experiments that the proposed method achieves state-of-the-art
performance on several benchmark datasets including CIFAR100, ImageNet100 and
TinyImageNet
Dynamic Generation of Personalities with Large Language Models
In the realm of mimicking human deliberation, large language models (LLMs)
show promising performance, thereby amplifying the importance of this research
area. Deliberation is influenced by both logic and personality. However,
previous studies predominantly focused on the logic of LLMs, neglecting the
exploration of personality aspects. In this work, we introduce Dynamic
Personality Generation (DPG), a dynamic personality generation method based on
Hypernetworks. Initially, we embed the Big Five personality theory into GPT-4
to form a personality assessment machine, enabling it to evaluate characters'
personality traits from dialogues automatically. We propose a new metric to
assess personality generation capability based on this evaluation method. Then,
we use this personality assessment machine to evaluate dialogues in script
data, resulting in a personality-dialogue dataset. Finally, we fine-tune DPG on
the personality-dialogue dataset. Experiments prove that DPG's personality
generation capability is stronger after fine-tuning on this dataset than
traditional fine-tuning methods, surpassing prompt-based GPT-4
Deep Reinforcement Learning with Multitask Episodic Memory Based on Task-Conditioned Hypernetwork
Deep reinforcement learning algorithms are usually impeded by sampling
inefficiency, heavily depending on multiple interactions with the environment
to acquire accurate decision-making capabilities. In contrast, humans rely on
their hippocampus to retrieve relevant information from past experiences of
relevant tasks, which guides their decision-making when learning a new task,
rather than exclusively depending on environmental interactions. Nevertheless,
designing a hippocampus-like module for an agent to incorporate past
experiences into established reinforcement learning algorithms presents two
challenges. The first challenge involves selecting the most relevant past
experiences for the current task, and the second challenge is integrating such
experiences into the decision network. To address these challenges, we propose
a novel method that utilizes a retrieval network based on task-conditioned
hypernetwork, which adapts the retrieval network's parameters depending on the
task. At the same time, a dynamic modification mechanism enhances the
collaborative efforts between the retrieval and decision networks. We evaluate
the proposed method on the MiniGrid environment.The experimental results
demonstrate that our proposed method significantly outperforms strong
baselines
Robo360: A 3D Omnispective Multi-Material Robotic Manipulation Dataset
Building robots that can automate labor-intensive tasks has long been the
core motivation behind the advancements in computer vision and the robotics
community. Recent interest in leveraging 3D algorithms, particularly neural
fields, has led to advancements in robot perception and physical understanding
in manipulation scenarios. However, the real world's complexity poses
significant challenges. To tackle these challenges, we present Robo360, a
dataset that features robotic manipulation with a dense view coverage, which
enables high-quality 3D neural representation learning, and a diverse set of
objects with various physical and optical properties and facilitates research
in various object manipulation and physical world modeling tasks. We confirm
the effectiveness of our dataset using existing dynamic NeRF and evaluate its
potential in learning multi-view policies. We hope that Robo360 can open new
research directions yet to be explored at the intersection of understanding the
physical world in 3D and robot control
GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models
As large language models (LLMs) continue to develop and gain widespread
application, the ability of LLMs to exhibit empathy towards diverse group
identities and understand their perspectives is increasingly recognized as
critical. Most existing benchmarks for empathy evaluation of LLMs focus
primarily on universal human emotions, such as sadness and pain, often
overlooking the context of individuals' group identities. To address this gap,
we introduce GIEBench, a comprehensive benchmark that includes 11 identity
dimensions, covering 97 group identities with a total of 999 single-choice
questions related to specific group identities. GIEBench is designed to
evaluate the empathy of LLMs when presented with specific group identities such
as gender, age, occupation, and race, emphasizing their ability to respond from
the standpoint of the identified group. This supports the ongoing development
of empathetic LLM applications tailored to users with different identities. Our
evaluation of 23 LLMs revealed that while these LLMs understand different
identity standpoints, they fail to consistently exhibit equal empathy across
these identities without explicit instructions to adopt those perspectives.
This highlights the need for improved alignment of LLMs with diverse values to
better accommodate the multifaceted nature of human identities. Our datasets
are available at https://github.com/GIEBench/GIEBench
