Search CORE

39 research outputs found

MPMQA: Multimodal Question Answering on Product Manuals

Author: Hu Anwen
Hu Shuo
Jin Qin
Zhang Jing
Zhang Liang
Publication venue
Publication date: 19/04/2023
Field of study

Visual contents, such as illustrations and images, play a big role in product manual understanding. Existing Product Manual Question Answering (PMQA) datasets tend to ignore visual contents and only retain textual parts. In this work, to emphasize the importance of multimodal contents, we propose a Multimodal Product Manual Question Answering (MPMQA) task. For each question, MPMQA requires the model not only to process multimodal contents but also to provide multimodal answers. To support MPMQA, a large-scale dataset PM209 is constructed with human annotations, which contains 209 product manuals from 27 well-known consumer electronic brands. Human annotations include 6 types of semantic regions for manual contents and 22,021 pairs of question and answer. Especially, each answer consists of a textual sentence and related visual regions from manuals. Taking into account the length of product manuals and the fact that a question is always related to a small number of pages, MPMQA can be naturally split into two subtasks: retrieving most related pages and then generating multimodal answers. We further propose a unified model that can perform these two subtasks all together and achieve comparable performance with multiple task-specific models. The PM209 dataset is available at https://github.com/AIM3-RUC/MPMQA

arXiv.org e-Print Archive

Explore and Tell: Embodied Visual Captioning in 3D Environments

Author: Chen Shizhe
Hu Anwen
Jin Qin
Zhang Liang
Publication venue
Publication date: 20/08/2023
Field of study

While current visual captioning models have achieved impressive performance, they often assume that the image is well-captured and provides a complete view of the scene. In real-world scenarios, however, a single image may not offer a good viewpoint, hindering fine-grained scene understanding. To overcome this limitation, we propose a novel task called Embodied Captioning, which equips visual captioning models with navigation capabilities, enabling them to actively explore the scene and reduce visual ambiguity from suboptimal viewpoints. Specifically, starting at a random viewpoint, an agent must navigate the environment to gather information from different viewpoints and generate a comprehensive paragraph describing all objects in the scene. To support this task, we build the ET-Cap dataset with Kubric simulator, consisting of 10K 3D scenes with cluttered objects and three annotated paragraphs per scene. We propose a Cascade Embodied Captioning model (CaBOT), which comprises of a navigator and a captioner, to tackle this task. The navigator predicts which actions to take in the environment, while the captioner generates a paragraph description based on the whole navigation trajectory. Extensive experiments demonstrate that our model outperforms other carefully designed baselines. Our dataset, codes and models are available at https://aim3-ruc.github.io/ExploreAndTell.Comment: 12 pages; 10 figures; ICCV 202

arXiv.org e-Print Archive

Movie101: A New Movie Understanding Benchmark

Author: Hu Anwen
Jin Qin
Wang Ziheng
Yue Zihao
Zhang Liang
Zhang Qi
Publication venue
Publication date: 27/06/2023
Field of study

To help the visually impaired enjoy movies, automatic movie narrating systems are expected to narrate accurate, coherent, and role-aware plots when there are no speaking lines of actors. Existing works benchmark this challenge as a normal video captioning task via some simplifications, such as removing role names and evaluating narrations with ngram-based metrics, which makes it difficult for automatic systems to meet the needs of real application scenarios. To narrow this gap, we construct a large-scale Chinese movie benchmark, named Movie101. Closer to real scenarios, the Movie Clip Narrating (MCN) task in our benchmark asks models to generate role-aware narration paragraphs for complete movie clips where no actors are speaking. External knowledge, such as role information and movie genres, is also provided for better movie understanding. Besides, we propose a new metric called Movie Narration Score (MNScore) for movie narrating evaluation, which achieves the best correlation with human evaluation. Our benchmark also supports the Temporal Narration Grounding (TNG) task to investigate clip localization given text descriptions. For both two tasks, our proposed methods well leverage external knowledge and outperform carefully designed baselines. The dataset and codes are released at https://github.com/yuezih/Movie101.Comment: Accepted to ACL 202

arXiv.org e-Print Archive

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Author: Hu Anwen
Huang Fei
Liu Haowei
Qian Qi
Xu Haiyang
Yan Ming
Ye Jiabo
Ye Qinghao
Zhang Ji
Zhou Jingren
Publication venue
Publication date: 08/11/2023
Field of study

Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods primarily focus on enhancing multi-modal capabilities. In this work, we introduce a versatile multi-modal large language model, mPLUG-Owl2, which effectively leverages modality collaboration to improve performance in both text and multi-modal tasks. mPLUG-Owl2 utilizes a modularized network design, with the language decoder acting as a universal interface for managing different modalities. Specifically, mPLUG-Owl2 incorporates shared functional modules to facilitate modality collaboration and introduces a modality-adaptive module that preserves modality-specific features. Extensive experiments reveal that mPLUG-Owl2 is capable of generalizing both text tasks and multi-modal tasks and achieving state-of-the-art performances with a single generic model. Notably, mPLUG-Owl2 is the first MLLM model that demonstrates the modality collaboration phenomenon in both pure-text and multi-modal scenarios, setting a pioneering path in the development of future multi-modal foundation models

arXiv.org e-Print Archive

Understanding health and social challenges for aging and long-term care in China

Author: Carrino Ludovico
Chen Xi
Chen Zhuo (Adam)
Fletcher James Rupert
Hu Bo
Hu Min
Li Bingqin
Lou Vivian W. Q.
Tan Si Ying
Wang Yixiao
Wu Bei
Yang Wei
Zhang Anwen
Publication venue: 'SAGE Publications'
Publication date: 01/01/2020
Field of study

The second King’s College London Symposium on Ageing and Long-term Care in China was convened from 4 to 5th July 2019 at King’s College London in London. The aim of the Symposium was to have a better understanding of health and social challenges for aging and long-term care in China. This symposium draws research insights from a wide range of disciplines, including economics, public policy, demography, gerontology, public health and sociology. A total of 20 participants from eight countries, seek to identify the key issues and research priorities in the area of aging and long-term care in China. The results published here are a synthesis of the top four research areas that represent the perspectives from some of the leading researchers in the field. © The Author(s) 2020

The University of Manchester - Institutional Repository

Mechanisms and Therapeutic Targets of Depression After Intracerebral Hemorrhage

Author: Anwen Shao
Chengcheng Yu
Kaimin Hu
Liangliang Wang
Suzhan Zhang
Yinan Wu
Yuanhan Zhu
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

The relationship between depression and intracerebral hemorrhage (ICH) is complicated. One of the most common neuropsychiatric comorbidities of hemorrhagic stroke is Post-ICH depression. Depression, as a neuropsychiatric symptom, also negatively impacts the outcome of ICH by enhancing morbidity, disability, and mortality. However, the ICH outcome can be improved by antidepressants such as the frequently-used selective serotonin reuptake inhibitors. This review therefore presents the mechanisms of post-ICH depression, we grouped the mechanisms according to inflammation, oxidative stress (OS), apoptosis and autophagy, and explained them through their several associated signaling pathways. Inflammation is mainly related to Toll-like receptors (TLRs), the NF-kB mediated signal pathway, the PPAR-γ-dependent pathway, as well as other signaling pathways. OS is associated to nuclear factor erythroid-2 related factor 2 (Nrf2), the PI3K/Akt pathway and the MAPK/P38 pathway. Moreover, autophagy is associated with the mTOR signaling cascade and the NF-kB mediated signal pathway, while apoptosis is correlated with the death receptor-mediated apoptosis pathway, mitochondrial apoptosis pathway, caspase-independent pathways and others. Furthermore, we found that neuroinflammation, oxidative stress, autophagy, and apoptosis experience interactions with one another. Additionally, it may provide several potential therapeutic targets for patients that might suffer from depression after ICH

Directory of Open Access Journals

Frontiers - Publisher Connector

The Role of lncRNAs in the Distant Metastasis of Breast Cancer

Author: Anwen Shao
Chengcheng Yu
Chi Pan
Kaimin Hu
Liangliang Wang
Suzhan Zhang
Yinan Wu
Publication venue: 'Frontiers Media SA'
Publication date: 01/05/2019
Field of study

Breast cancer (BC) remains the most frequently diagnosed cancer worldwide. Among breast cancer patients, distant metastasis and invasion is the leading cause of BC related death. Recently, long non-coding RNAs (lncRNAs), which used to be considered a genetic byproduct (owing to their unknown biological function), have been reported to be highly implicated in the development and progression of BC. In this review, we produce a summary of the functions and mechanisms of lncRNAs implicated in the different distant metastases of BC. The functions of lncRNAs have been divided into two types: oncogenic type and tumor suppressor. Furthermore, the majority of them exert their roles through the regulation of invasion, migration, epithelial—mesenchymal transition (EMT), and the metastasis process. In the final part, we briefly addressed future research prospects of lncRNAs, especially the testing methods through which to detect lncRNAs in the clinical work, and introduced several different tools with which to detect lncRNAs more conveniently. Although lncRNA research is still in the initial stages, it is a promising prognosticator and a novel therapeutic target for BC metastasis, which requires more research in the future

Directory of Open Access Journals

MPMQA: Multimodal Question Answering on Product Manuals

Author: Hu Anwen
Hu Shuo
Jin Qin
Zhang Jing
Zhang Liang
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 26/06/2023
Field of study

Association for the Advancement of Artificial Intelligence: AAAI Publications

InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation

Author: Chen Shizhe
Hu Anwen
Jin Qin
Zhang Liang
Publication venue
Publication date: 10/05/2023
Field of study

Automatic image captioning evaluation is critical for benchmarking and promoting advances in image captioning research. Existing metrics only provide a single score to measure caption qualities, which are less explainable and informative. Instead, we humans can easily identify the problems of captions in details, e.g., which words are inaccurate and which salient objects are not described, and then rate the caption quality. To support such informative feedback, we propose an Informative Metric for Reference-free Image Caption evaluation (InfoMetIC). Given an image and a caption, InfoMetIC is able to report incorrect words and unmentioned image regions at fine-grained level, and also provide a text precision score, a vision recall score and an overall quality score at coarse-grained level. The coarse-grained score of InfoMetIC achieves significantly better correlation with human judgements than existing metrics on multiple benchmarks. We also construct a token-level evaluation dataset and demonstrate the effectiveness of InfoMetIC in fine-grained evaluation. Our code and datasets are publicly available at https://github.com/HAWLYQ/InfoMetIC.Comment: Accepted by ACL 2023 main conferenc

arXiv.org e-Print Archive