Search CORE

47 research outputs found

A Hierarchical Framework for Relation Extraction with Reinforcement Learning

Author: Huang Minlie
Liu Jiexi
Takanobu Ryuichi
Zhang Tianyang
Publication venue
Publication date: 09/11/2018
Field of study

Most existing methods determine relation types only after all the entities have been recognized, thus the interaction between relation types and entity mentions is not fully modeled. This paper presents a novel paradigm to deal with relation extraction by regarding the related entities as the arguments of a relation. We apply a hierarchical reinforcement learning (HRL) framework in this paradigm to enhance the interaction between entity mentions and relation types. The whole extraction process is decomposed into a hierarchy of two-level RL policies for relation detection and entity extraction respectively, so that it is more feasible and natural to deal with overlapping relations. Our model was evaluated on public datasets collected via distant supervision, and results show that it gains better performance than existing methods and is more powerful for extracting overlapping relations.Comment: To appear in AAAI 1

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Author: Cao Xiaochun
Jin Peng
Takanobu Ryuichi
Yuan Li
Zhang Caiwan
Publication venue
Publication date: 14/11/2023
Field of study

Large language models have demonstrated impressive universal capabilities across a wide range of open-ended tasks and have extended their utility to encompass multimodal conversations. However, existing methods encounter challenges in effectively handling both image and video understanding, particularly with limited visual tokens. In this work, we introduce Chat-UniVi, a unified vision-language model capable of comprehending and engaging in conversations involving images and videos through a unified visual representation. Specifically, we employ a set of dynamic visual tokens to uniformly represent images and videos. This representation framework empowers the model to efficiently utilize a limited number of visual tokens to simultaneously capture the spatial details necessary for images and the comprehensive temporal relationship required for videos. Moreover, we leverage a multi-scale representation, enabling the model to perceive both high-level semantic concepts and low-level visual details. Notably, Chat-UniVi is trained on a mixed dataset containing both images and videos, allowing direct application to tasks involving both mediums without requiring any modifications. Extensive experimental results demonstrate that Chat-UniVi, as a unified model, consistently outperforms even existing methods exclusively designed for either images or videos.Comment: 26 page

arXiv.org e-Print Archive