117 research outputs found

    InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion

    Full text link
    This paper addresses a novel task of anticipating 3D human-object interactions (HOIs). Most existing research on HOI synthesis lacks comprehensive whole-body interactions with dynamic objects, e.g., often limited to manipulating small or static objects. Our task is significantly more challenging, as it requires modeling dynamic objects with various shapes, capturing whole-body motion, and ensuring physically valid interactions. To this end, we propose InterDiff, a framework comprising two key steps: (i) interaction diffusion, where we leverage a diffusion model to encode the distribution of future human-object interactions; (ii) interaction correction, where we introduce a physics-informed predictor to correct denoised HOIs in a diffusion step. Our key insight is to inject prior knowledge that the interactions under reference with respect to contact points follow a simple pattern and are easily predictable. Experiments on multiple human-object interaction datasets demonstrate the effectiveness of our method for this task, capable of producing realistic, vivid, and remarkably long-term 3D HOI predictions.Comment: ICCV 2023; Project Page: https://sirui-xu.github.io/InterDiff

    Orthogonal Spatial Coding with Stimulated Parametric Down-Conversion

    Full text link
    Orthogonal optical coding is widely used in classical multiuser communication networks. Using the phase conjugation property of stimulated parametric down-conversion, we extend the current orthogonal optical coding scheme to the spatial domain to encode and decode image information. In this process, the idler beam inherits the complex conjugate of the field information encoded in the seed beam. An encoding phase mask introduced to the input seed beam blurs the image transferred to the idler. The original image is restored by passing the coded transferred image through a corrective phase mask placed in the momentum space of the idler beam. We expect that this scheme can also inspire new techniques in aberration cancellation and frequency conversion imaging

    Association between -238 but not -308 polymorphism of Tumor necrosis factor alpha (TNF-alpha)v and unexplained recurrent spontaneous abortion (URSA) in Chinese population

    Get PDF
    <p>Abstract</p> <p>Objectives</p> <p>TNF-alpha is a critical cytokine produced by Th1 cells while altered T helper 1 (Th1)-Th2 balance is found crucial for a successful pregnancy.</p> <p>Study Design</p> <p>A cohort of 132 Southern Chinese Han RSA patients and 152 controls constituted the subjects of this study. Two functional polymorphisms -308 and -238 of TNF-alpha were studied by association analysis.</p> <p>Results</p> <p>lack of association was found in TNF-alpha -308 SNP yet a significant difference was discovered in -238 polymorphism.</p> <p>Conclusion</p> <p>This study suggested that TNF-alpha may be a risk factor in Chinese RSA patients. However the ethnic differences may also contribute to the results.</p

    A Secure and Efficient Multi-Object Grasping Detection Approach for Robotic Arms

    Full text link
    Robotic arms are widely used in automatic industries. However, with wide applications of deep learning in robotic arms, there are new challenges such as the allocation of grasping computing power and the growing demand for security. In this work, we propose a robotic arm grasping approach based on deep learning and edge-cloud collaboration. This approach realizes the arbitrary grasp planning of the robot arm and considers the grasp efficiency and information security. In addition, the encoder and decoder trained by GAN enable the images to be encrypted while compressing, which ensures the security of privacy. The model achieves 92% accuracy on the OCID dataset, the image compression ratio reaches 0.03%, and the structural difference value is higher than 0.91

    A Survey on Multimodal Large Language Models

    Full text link
    Multimodal Large Language Model (MLLM) recently has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional methods, suggesting a potential path to artificial general intelligence. In this paper, we aim to trace and summarize the recent progress of MLLM. First of all, we present the formulation of MLLM and delineate its related concepts. Then, we discuss the key techniques and applications, including Multimodal Instruction Tuning (M-IT), Multimodal In-Context Learning (M-ICL), Multimodal Chain of Thought (M-CoT), and LLM-Aided Visual Reasoning (LAVR). Finally, we discuss existing challenges and point out promising research directions. In light of the fact that the era of MLLM has only just begun, we will keep updating this survey and hope it can inspire more research. An associated GitHub link collecting the latest papers is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.Comment: Project page:https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Model

    Controllable Multi-Objective Re-ranking with Policy Hypernetworks

    Full text link
    Multi-stage ranking pipelines have become widely used strategies in modern recommender systems, where the final stage aims to return a ranked list of items that balances a number of requirements such as user preference, diversity, novelty etc. Linear scalarization is arguably the most widely used technique to merge multiple requirements into one optimization objective, by summing up the requirements with certain preference weights. Existing final-stage ranking methods often adopt a static model where the preference weights are determined during offline training and kept unchanged during online serving. Whenever a modification of the preference weights is needed, the model has to be re-trained, which is time and resources inefficient. Meanwhile, the most appropriate weights may vary greatly for different groups of targeting users or at different time periods (e.g., during holiday promotions). In this paper, we propose a framework called controllable multi-objective re-ranking (CMR) which incorporates a hypernetwork to generate parameters for a re-ranking model according to different preference weights. In this way, CMR is enabled to adapt the preference weights according to the environment changes in an online manner, without retraining the models. Moreover, we classify practical business-oriented tasks into four main categories and seamlessly incorporate them in a new proposed re-ranking model based on an Actor-Evaluator framework, which serves as a reliable real-world testbed for CMR. Offline experiments based on the dataset collected from Taobao App showed that CMR improved several popular re-ranking models by using them as underlying models. Online A/B tests also demonstrated the effectiveness and trustworthiness of CMR

    Mutation analysis of the WNT4 gene in Han Chinese women with premature ovarian failure

    Get PDF
    BACKGROUND: The WNT4 gene plays an important role in female sex determination and differentiation. It also contributes to maintaining of the ovaries and the survival of follicles. METHODS: We sequenced the coding region and splice sites of WNT4 in 145 Han Chinese women with premature ovarian failure (POF) and 200 healthy controls. RESULTS: Only one novel variation, in Exon 2 (195C > T), was detected among the women with POF. However, this synonymous variation did not result in a change in amino acid sequence (65 Asp > Asp). No further variants were found in any of the samples. CONCLUSION: Although we cannot provide any evidence that it is a possible disease-causing gene, this study is the first attempt to investigate the possible role of WNT4 in Han Chinese women with POF

    Direct-Current Generator Based on Dynamic Water-Semiconductor Junction with Polarized Water as Moving Dielectric Medium

    Full text link
    There is a rising prospective in harvesting energy from water droplets, as microscale energy is required for the distributed sensors in the interconnected human society. However, achieving a sustainable direct-current generating device from water flow is rarely reported, and the quantum polarization principle of the water molecular remains uncovered. Herein, we propose a dynamic water-semiconductor junction with moving water sandwiched between two semiconductors as a moving dielectric medium, which outputs a sustainable direct-current voltage of 0.3 V and current of 0.64 uA with low internal resistance of 390 kilohm. The sustainable direct-current electricity is originating from the dynamic water polarization process in water-semiconductor junction, in which water molecules are continuously polarized and depolarized driven by the mechanical force and Fermi level difference, during the movement of the water on silicon. We further demonstrated an encapsulated portable power-generating device with simple structure and continuous direct-current voltage, which exhibits its promising potential application in the field of wearable electronic generators

    Evaluating Open-QA Evaluation

    Full text link
    This study focuses on the evaluation of the Open Question Answering (Open-QA) task, which can directly estimate the factuality of large language models (LLMs). Current automatic evaluation methods have shown limitations, indicating that human evaluation still remains the most reliable approach. We introduce a new task, Evaluating QA Evaluation (QA-Eval) and the corresponding dataset EVOUNA, designed to assess the accuracy of AI-generated answers in relation to standard answers within Open-QA. Our evaluation of these methods utilizes human-annotated results to measure their performance. Specifically, the work investigates methods that show high correlation with human evaluations, deeming them more reliable. We also discuss the pitfalls of current methods and methods to improve LLM-based evaluators. We believe this new QA-Eval task and corresponding dataset EVOUNA will facilitate the development of more effective automatic evaluation tools and prove valuable for future research in this area. All resources are available at \url{https://github.com/wangcunxiang/QA-Eval} and it is under the Apache-2.0 License

    Woodpecker: Hallucination Correction for Multimodal Large Language Models

    Full text link
    Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content. In order to mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like a woodpecker heals trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://github.com/BradyFU/Woodpecker.Comment: 16 pages, 7 figures. Code Website: https://github.com/BradyFU/Woodpecke
    corecore