Search CORE

19 research outputs found

Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis

Author: Fu Chaoyou
He Ran
Loy Chen Change
Song Linsen
Wu Wayne
Publication venue
Publication date: 31/08/2023
Field of study

Existing automated dubbing methods are usually designed for Professionally Generated Content (PGC) production, which requires massive training data and training time to learn a person-specific audio-video mapping. In this paper, we investigate an audio-driven dubbing method that is more feasible for User Generated Content (UGC) production. There are two unique challenges to design a method for UGC: 1) the appearances of speakers are diverse and arbitrary as the method needs to generalize across users; 2) the available video data of one speaker are very limited. In order to tackle the above challenges, we first introduce a new Style Translation Network to integrate the speaking style of the target and the speaking content of the source via a cross-modal AdaIN module. It enables our model to quickly adapt to a new speaker. Then, we further develop a semi-parametric video renderer, which takes full advantage of the limited training data of the unseen speaker via a video-level retrieve-warp-refine pipeline. Finally, we propose a temporal regularization for the semi-parametric renderer, generating more continuous videos. Extensive experiments show that our method generates videos that accurately preserve various speaking styles, yet with considerably lower amount of training data and training time in comparison to existing methods. Besides, our method achieves a faster testing speed than most recent methods.Comment: TCSVT 202

arXiv.org e-Print Archive

Multi-modal Queried Object Detection in the Wild

Author: Chen Peixian
Fu Chaoyou
Li Ke
Xu Changsheng
Xu Yifan
Yang Xiaoshan
Zhang Mengdan
Publication venue
Publication date: 30/05/2023
Field of study

We introduce MQ-Det, an efficient architecture and pre-training strategy design to utilize both textual description with open-set generalization and visual exemplars with rich description granularity as category queries, namely, Multi-modal Queried object Detection, for real-world detection with both open-vocabulary categories and various granularity. MQ-Det incorporates vision queries into existing well-established language-queried-only detectors. A plug-and-play gated class-scalable perceiver module upon the frozen detector is proposed to augment category text with class-wise visual information. To address the learning inertia problem brought by the frozen detector, a vision conditioned masked language prediction strategy is proposed. MQ-Det's simple yet effective architecture and training strategy design is compatible with most language-queried object detectors, thus yielding versatile applications. Experimental results demonstrate that multi-modal queries largely boost open-world detection. For instance, MQ-Det significantly improves the state-of-the-art open-set detector GLIP by +7.8% zero-shot AP on the LVIS benchmark and averagely +6.3% AP on 13 few-shot downstream tasks, with merely 3% pre-training time required by GLIP. Code is available at https://github.com/YifanXu74/MQ-Det.Comment: Under revie

arXiv.org e-Print Archive

A Survey on Multimodal Large Language Models

Author: Chen Enhong
Fu Chaoyou
Li Ke
Sun Xing
Xu Tong
Yin Shukang
Zhao Sirui
Publication venue
Publication date: 23/06/2023
Field of study

Multimodal Large Language Model (MLLM) recently has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional methods, suggesting a potential path to artificial general intelligence. In this paper, we aim to trace and summarize the recent progress of MLLM. First of all, we present the formulation of MLLM and delineate its related concepts. Then, we discuss the key techniques and applications, including Multimodal Instruction Tuning (M-IT), Multimodal In-Context Learning (M-ICL), Multimodal Chain of Thought (M-CoT), and LLM-Aided Visual Reasoning (LAVR). Finally, we discuss existing challenges and point out promising research directions. In light of the fact that the era of MLLM has only just begun, we will keep updating this survey and hope it can inspire more research. An associated GitHub link collecting the latest papers is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.Comment: Project page:https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Model

arXiv.org e-Print Archive

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Author: Chen Enhong
Fu Chaoyou
Li Ke
Shen Yunhang
Sui Dianbo
Sun Xing
Wang Hao
Xu Tong
Yin Shukang
Zhao Sirui
Publication venue
Publication date: 24/10/2023
Field of study

Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content. In order to mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like a woodpecker heals trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://github.com/BradyFU/Woodpecker.Comment: 16 pages, 7 figures. Code Website: https://github.com/BradyFU/Woodpecke

arXiv.org e-Print Archive

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Author: Chen Peixian
Fu Chaoyou
Ji Rongrong
Li Ke
Lin Wei
Lin Xu
Qin Yulei
Qiu Zhenyu
Shen Yunhang
Sun Xing
Yang Jinrui
Zhang Mengdan
Zheng Xiawu
Publication venue
Publication date: 23/06/2023
Field of study

Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image. However, it is difficult for these case studies to fully reflect the performance of MLLM, lacking a comprehensive evaluation. In this paper, we fill in this blank, presenting the first MLLM Evaluation benchmark MME. It measures both perception and cognition abilities on a total of 14 subtasks. In order to avoid data leakage that may arise from direct use of public datasets for evaluation, the annotations of instruction-answer pairs are all manually designed. The concise instruction design allows us to fairly compare MLLMs, instead of struggling in prompt engineering. Besides, with such an instruction, we can also easily carry out quantitative statistics. A total of 10 advanced MLLMs are comprehensively evaluated on our MME, which not only suggests that existing MLLMs still have a large room for improvement, but also reveals the potential directions for the subsequent model optimization.Comment: https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Model

arXiv.org e-Print Archive

Perceção de mulheres grávidas relativamente à informação disponível acerca do consumo de álcool durante a gravidez

Author: Aizhong Liu
Cangsong Zheng
Chaoyou Pang
Guoping Wang
Helin Dong
Jing Chen
Miao Sun
Pengcheng Li
Shaodong Liu
Siping Zhang
Xinhua Zhao
Yabing Li
Publication venue
Publication date: 01/01/2016
Field of study

info:eu-repo/semantics/draf

Directory of Open Access Journals

Repositório Institucional da Universidade Católica Portuguesa

FigShare

Clinical utilization of multiple antibodies of Mycobacterium tuberculosis for serodiagnosis evaluation of tuberculosis: a retrospective observational cohort study

Author: ChaoJuan Liang
ChaoYou Chen
DeWu Bi
HongHua Shao
HuiDan Pan
JianYan Lin
LeMin Wen
MingMei Zhao
XiaoCheng Luo
XiaoLu Luo
XiaoXian Huang
XiKe Tang
XiMing Shi
Ying Wang
Yue Qin
YunHua Tang
ZhouHua Xie
Publication venue: Taylor & Francis Group
Publication date: 01/12/2023
Field of study

AbstractObjectives We aimed to investigate clinical uncertainties by characterizing the accuracy and utility of commercially available antibodies of Mycobacterium tuberculosis in the diagnostic assessment of suspected tuberculosis in high-burden countries.Methods We conducted a retrospective, descriptive, cohort study among participants aged ≥ 18 years with suspected tuberculosis in Nanning, Guangxi, and China. Participants were tested for M. tuberculosis infection using commercially available antibodies against Mycobacterum tuberculosis. Specificity, sensitivity, negative and positive predictive values, and negative and positive likelihood ratios of the tests were determined. Sputum specimens and bronchoalveolar lavage fluid were sent for mycobacterial culture, Xpert MTB/RIF assay, and cell-free M. tuberculosis DNA or RNA assay. Blood samples were used for IGRAs, T-cell counts (CD3 + CD4+ and CD3 + CD8+), and antibodies to tuberculosis test.Results Of the 1857 participants enrolled in this study, 1772 were included in the analyses, among which, 1311 were diagnosed with active tuberculosis. The specificity of antibody against 16kD for active tuberculosis was 92.7% (95% confidence interval [CI]: 89.3–95.4) with a positive likelihood ratio for active tuberculosis cases of 3.1 (95% CI: 2.1–4.7), which was higher than that of antibody to Rv1636 (90.5% [95% CI: 86.6–93.5]), antibody to 38kD (89.5% [95% CI: 85.5–92.7]), antibody against CFP-10 (82.6% [95% CI: 77.9–86.7]), and antibody against LAM (79.3% [95% CI: 74.3–83.7]). Sensitivity ranged from 15.8% (95% CI: 13.9–17.9) for antibody against Rv1636 to 32.9% (95% CI: 30.4–35.6) for antibody to LAM.Conclusions Commercially available antibodies against to Mycobacterium tuberculosis do not have sufficient sensitivity for the diagnostic evaluation of active tuberculosis. However, antibody against Rv1636 and 16kD may have sufficiently high specificities, high positive likelihood ratios, and correspondingly high positive predictive values to facilitate the rule-in of active tuberculosis

Directory of Open Access Journals

Optimizing nitrogen application rate and plant density for improving cotton yield and nitrogen use efficiency in the North China Plain - Fig 1

Author: Pengcheng Li (123281)
Helin Dong (3167085)
Cangsong Zheng (4495030)
Miao Sun (59967)
Aizhong Liu (436665)
Guoping Wang (416170)
Shaodong Liu (1916017)
Siping Zhang (2163073)
Jing Chen (4762)
Yabing Li (663434)
Chaoyou Pang (121192)
Xinhua Zhao (450527)
Publication venue
Publication date: 05/10/2017
Field of study

Leaf area index(LAI) of cotton at different growth periods in 2013(A) and 2014(B)Note: D1, D2, D3 indicate planting density at 3.00, 5.25, 7.50 plants m−2 respectively, and N0, N1, N2, N3, N4 indicate nitrogen application rate at 0, 112.5, 225.0, 337.5 kg ha−1 respectively. A, B indicate 2013 and 2014. Numbers at the same growth stage followed by the same small alphabet are not significantly different at the 5% level.</p

Publishing Network for Geoscientific and Environmental Data

FigShare

Monthly weather summary during the cotton growing season in 2013 and 2014 at Anyang, Henan, China.

Author: Aizhong Liu (436665)
Cangsong Zheng (4495030)
Chaoyou Pang (121192)
Guoping Wang (416170)
Helin Dong (3167085)
Jing Chen (4762)
Miao Sun (59967)
Pengcheng Li (123281)
Shaodong Liu (1916017)
Siping Zhang (2163073)
Xinhua Zhao (450527)
Yabing Li (663434)
Publication venue
Publication date
Field of study

Monthly weather summary during the cotton growing season in 2013 and 2014 at Anyang, Henan, China.</p

FigShare