Search CORE

117 research outputs found

InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion

Author: Gui Liang-Yan
Li Zhengyuan
Wang Yu-Xiong
Xu Sirui
Publication venue
Publication date: 31/08/2023
Field of study

This paper addresses a novel task of anticipating 3D human-object interactions (HOIs). Most existing research on HOI synthesis lacks comprehensive whole-body interactions with dynamic objects, e.g., often limited to manipulating small or static objects. Our task is significantly more challenging, as it requires modeling dynamic objects with various shapes, capturing whole-body motion, and ensuring physically valid interactions. To this end, we propose InterDiff, a framework comprising two key steps: (i) interaction diffusion, where we leverage a diffusion model to encode the distribution of future human-object interactions; (ii) interaction correction, where we introduce a physics-informed predictor to correct denoised HOIs in a diffusion step. Our key insight is to inject prior knowledge that the interactions under reference with respect to contact points follow a simple pattern and are easily predictable. Experiments on multiple human-object interaction datasets demonstrate the effectiveness of our method for this task, capable of producing realistic, vivid, and remarkably long-term 3D HOI predictions.Comment: ICCV 2023; Project Page: https://sirui-xu.github.io/InterDiff

arXiv.org e-Print Archive

Orthogonal Spatial Coding with Stimulated Parametric Down-Conversion

Author: Black A. Nicholas
Boyd Robert W.
Tang Sirui
Xu Yang
Publication venue
Publication date: 20/07/2023
Field of study

Orthogonal optical coding is widely used in classical multiuser communication networks. Using the phase conjugation property of stimulated parametric down-conversion, we extend the current orthogonal optical coding scheme to the spatial domain to encode and decode image information. In this process, the idler beam inherits the complex conjugate of the field information encoded in the seed beam. An encoding phase mask introduced to the input seed beam blurs the image transferred to the idler. The original image is restored by passing the coded transferred image through a corrective phase mask placed in the momentum space of the idler beam. We expect that this scheme can also inspire new techniques in aberration cancellation and frequency conversion imaging

arXiv.org e-Print Archive

Association between -238 but not -308 polymorphism of Tumor necrosis factor alpha (TNF-alpha)v and unexplained recurrent spontaneous abortion (URSA) in Chinese population

Author: Liu Chunmei
Ma Xu
Wang Binbin
Wang Jing
Zhou Sirui
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Objectives TNF-alpha is a critical cytokine produced by Th1 cells while altered T helper 1 (Th1)-Th2 balance is found crucial for a successful pregnancy. Study Design A cohort of 132 Southern Chinese Han RSA patients and 152 controls constituted the subjects of this study. Two functional polymorphisms -308 and -238 of TNF-alpha were studied by association analysis. Results lack of association was found in TNF-alpha -308 SNP yet a significant difference was discovered in -238 polymorphism. Conclusion This study suggested that TNF-alpha may be a risk factor in Chinese RSA patients. However the ethnic differences may also contribute to the results.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Secure and Efficient Multi-Object Grasping Detection Approach for Robotic Arms

Author: Cheng Jieren
Li Jiangpeng
Ni Sirui
Wang Hui
Xu Yichen
Yang Zaijia
Publication venue
Publication date: 07/09/2022
Field of study

Robotic arms are widely used in automatic industries. However, with wide applications of deep learning in robotic arms, there are new challenges such as the allocation of grasping computing power and the growing demand for security. In this work, we propose a robotic arm grasping approach based on deep learning and edge-cloud collaboration. This approach realizes the arbitrary grasp planning of the robot arm and considers the grasp efficiency and information security. In addition, the encoder and decoder trained by GAN enable the images to be encrypted while compressing, which ensures the security of privacy. The model achieves 92% accuracy on the OCID dataset, the image compression ratio reaches 0.03%, and the structural difference value is higher than 0.91

arXiv.org e-Print Archive

A Survey on Multimodal Large Language Models

Author: Chen Enhong
Fu Chaoyou
Li Ke
Sun Xing
Xu Tong
Yin Shukang
Zhao Sirui
Publication venue
Publication date: 23/06/2023
Field of study

Multimodal Large Language Model (MLLM) recently has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional methods, suggesting a potential path to artificial general intelligence. In this paper, we aim to trace and summarize the recent progress of MLLM. First of all, we present the formulation of MLLM and delineate its related concepts. Then, we discuss the key techniques and applications, including Multimodal Instruction Tuning (M-IT), Multimodal In-Context Learning (M-ICL), Multimodal Chain of Thought (M-CoT), and LLM-Aided Visual Reasoning (LAVR). Finally, we discuss existing challenges and point out promising research directions. In light of the fact that the era of MLLM has only just begun, we will keep updating this survey and hope it can inspire more research. An associated GitHub link collecting the latest papers is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.Comment: Project page:https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Model

arXiv.org e-Print Archive

Controllable Multi-Objective Re-ranking with Policy Hypernetworks

Author: Chen Sirui
Li Zhiyu
Lin Quan
Wang Yuan
Wen Zijing
Xu Jun
Zhang Changshuo
Zhu Cheng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/06/2023
Field of study

Multi-stage ranking pipelines have become widely used strategies in modern recommender systems, where the final stage aims to return a ranked list of items that balances a number of requirements such as user preference, diversity, novelty etc. Linear scalarization is arguably the most widely used technique to merge multiple requirements into one optimization objective, by summing up the requirements with certain preference weights. Existing final-stage ranking methods often adopt a static model where the preference weights are determined during offline training and kept unchanged during online serving. Whenever a modification of the preference weights is needed, the model has to be re-trained, which is time and resources inefficient. Meanwhile, the most appropriate weights may vary greatly for different groups of targeting users or at different time periods (e.g., during holiday promotions). In this paper, we propose a framework called controllable multi-objective re-ranking (CMR) which incorporates a hypernetwork to generate parameters for a re-ranking model according to different preference weights. In this way, CMR is enabled to adapt the preference weights according to the environment changes in an online manner, without retraining the models. Moreover, we classify practical business-oriented tasks into four main categories and seamlessly incorporate them in a new proposed re-ranking model based on an Actor-Evaluator framework, which serves as a reliable real-world testbed for CMR. Offline experiments based on the dataset collected from Taobao App showed that CMR improved several popular re-ranking models by using them as underlying models. Online A/B tests also demonstrated the effectiveness and trustworthiness of CMR

arXiv.org e-Print Archive

Mutation analysis of the WNT4 gene in Han Chinese women with premature ovarian failure

Author: Beili Chen
Binbin Wang
Jing Wang
Lu Yang
Peisu Suo
Sirui Zhou
Xu Ma
Ying Zhu
Yunxia Cao
Publication venue: Springer Nature
Publication date: 30/05/2011
Field of study

BACKGROUND: The WNT4 gene plays an important role in female sex determination and differentiation. It also contributes to maintaining of the ovaries and the survival of follicles. METHODS: We sequenced the coding region and splice sites of WNT4 in 145 Han Chinese women with premature ovarian failure (POF) and 200 healthy controls. RESULTS: Only one novel variation, in Exon 2 (195C > T), was detected among the women with POF. However, this synonymous variation did not result in a change in amino acid sequence (65 Asp > Asp). No further variants were found in any of the samples. CONCLUSION: Although we cannot provide any evidence that it is a possible disease-causing gene, this study is the first attempt to investigate the possible role of WNT4 in Han Chinese women with POF

Springer - Publisher Connector

PubMed Central

Direct-Current Generator Based on Dynamic Water-Semiconductor Junction with Polarized Water as Moving Dielectric Medium

Author: Feng Sirui
Li Linjun
Lin Shisheng
Liu Kaihui
Lu Yanghua
Xu Chi
Yan Yanfei
Yang Zunshan
Yu Xutao
Zheng Haonan
Zhou Xu
Publication venue
Publication date: 12/08/2020
Field of study

There is a rising prospective in harvesting energy from water droplets, as microscale energy is required for the distributed sensors in the interconnected human society. However, achieving a sustainable direct-current generating device from water flow is rarely reported, and the quantum polarization principle of the water molecular remains uncovered. Herein, we propose a dynamic water-semiconductor junction with moving water sandwiched between two semiconductors as a moving dielectric medium, which outputs a sustainable direct-current voltage of 0.3 V and current of 0.64 uA with low internal resistance of 390 kilohm. The sustainable direct-current electricity is originating from the dynamic water polarization process in water-semiconductor junction, in which water molecules are continuously polarized and depolarized driven by the mechanical force and Fermi level difference, during the movement of the water on silicon. We further demonstrated an encapsulated portable power-generating device with simple structure and continuous direct-current voltage, which exhibits its promising potential application in the field of wearable electronic generators

arXiv.org e-Print Archive

Directory of Open Access Journals

Evaluating Open-QA Evaluation

Author: Cheng Sirui
Ding Bowen
Guo Qipeng
Hu Xiangkun
Wang Cunxiang
Wang Yidong
Xu Zhikun
Zhang Yue
Zhang Zheng
Publication venue
Publication date: 18/07/2023
Field of study

This study focuses on the evaluation of the Open Question Answering (Open-QA) task, which can directly estimate the factuality of large language models (LLMs). Current automatic evaluation methods have shown limitations, indicating that human evaluation still remains the most reliable approach. We introduce a new task, Evaluating QA Evaluation (QA-Eval) and the corresponding dataset EVOUNA, designed to assess the accuracy of AI-generated answers in relation to standard answers within Open-QA. Our evaluation of these methods utilizes human-annotated results to measure their performance. Specifically, the work investigates methods that show high correlation with human evaluations, deeming them more reliable. We also discuss the pitfalls of current methods and methods to improve LLM-based evaluators. We believe this new QA-Eval task and corresponding dataset EVOUNA will facilitate the development of more effective automatic evaluation tools and prove valuable for future research in this area. All resources are available at \url{https://github.com/wangcunxiang/QA-Eval} and it is under the Apache-2.0 License

arXiv.org e-Print Archive

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Author: Chen Enhong
Fu Chaoyou
Li Ke
Shen Yunhang
Sui Dianbo
Sun Xing
Wang Hao
Xu Tong
Yin Shukang
Zhao Sirui
Publication venue
Publication date: 24/10/2023
Field of study

Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content. In order to mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like a woodpecker heals trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://github.com/BradyFU/Woodpecker.Comment: 16 pages, 7 figures. Code Website: https://github.com/BradyFU/Woodpecke

arXiv.org e-Print Archive