117 research outputs found
InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion
This paper addresses a novel task of anticipating 3D human-object
interactions (HOIs). Most existing research on HOI synthesis lacks
comprehensive whole-body interactions with dynamic objects, e.g., often limited
to manipulating small or static objects. Our task is significantly more
challenging, as it requires modeling dynamic objects with various shapes,
capturing whole-body motion, and ensuring physically valid interactions. To
this end, we propose InterDiff, a framework comprising two key steps: (i)
interaction diffusion, where we leverage a diffusion model to encode the
distribution of future human-object interactions; (ii) interaction correction,
where we introduce a physics-informed predictor to correct denoised HOIs in a
diffusion step. Our key insight is to inject prior knowledge that the
interactions under reference with respect to contact points follow a simple
pattern and are easily predictable. Experiments on multiple human-object
interaction datasets demonstrate the effectiveness of our method for this task,
capable of producing realistic, vivid, and remarkably long-term 3D HOI
predictions.Comment: ICCV 2023; Project Page: https://sirui-xu.github.io/InterDiff
Orthogonal Spatial Coding with Stimulated Parametric Down-Conversion
Orthogonal optical coding is widely used in classical multiuser communication
networks. Using the phase conjugation property of stimulated parametric
down-conversion, we extend the current orthogonal optical coding scheme to the
spatial domain to encode and decode image information. In this process, the
idler beam inherits the complex conjugate of the field information encoded in
the seed beam. An encoding phase mask introduced to the input seed beam blurs
the image transferred to the idler. The original image is restored by passing
the coded transferred image through a corrective phase mask placed in the
momentum space of the idler beam. We expect that this scheme can also inspire
new techniques in aberration cancellation and frequency conversion imaging
Association between -238 but not -308 polymorphism of Tumor necrosis factor alpha (TNF-alpha)v and unexplained recurrent spontaneous abortion (URSA) in Chinese population
<p>Abstract</p> <p>Objectives</p> <p>TNF-alpha is a critical cytokine produced by Th1 cells while altered T helper 1 (Th1)-Th2 balance is found crucial for a successful pregnancy.</p> <p>Study Design</p> <p>A cohort of 132 Southern Chinese Han RSA patients and 152 controls constituted the subjects of this study. Two functional polymorphisms -308 and -238 of TNF-alpha were studied by association analysis.</p> <p>Results</p> <p>lack of association was found in TNF-alpha -308 SNP yet a significant difference was discovered in -238 polymorphism.</p> <p>Conclusion</p> <p>This study suggested that TNF-alpha may be a risk factor in Chinese RSA patients. However the ethnic differences may also contribute to the results.</p
A Secure and Efficient Multi-Object Grasping Detection Approach for Robotic Arms
Robotic arms are widely used in automatic industries. However, with wide
applications of deep learning in robotic arms, there are new challenges such as
the allocation of grasping computing power and the growing demand for security.
In this work, we propose a robotic arm grasping approach based on deep learning
and edge-cloud collaboration. This approach realizes the arbitrary grasp
planning of the robot arm and considers the grasp efficiency and information
security. In addition, the encoder and decoder trained by GAN enable the images
to be encrypted while compressing, which ensures the security of privacy. The
model achieves 92% accuracy on the OCID dataset, the image compression ratio
reaches 0.03%, and the structural difference value is higher than 0.91
A Survey on Multimodal Large Language Models
Multimodal Large Language Model (MLLM) recently has been a new rising
research hotspot, which uses powerful Large Language Models (LLMs) as a brain
to perform multimodal tasks. The surprising emergent capabilities of MLLM, such
as writing stories based on images and OCR-free math reasoning, are rare in
traditional methods, suggesting a potential path to artificial general
intelligence. In this paper, we aim to trace and summarize the recent progress
of MLLM. First of all, we present the formulation of MLLM and delineate its
related concepts. Then, we discuss the key techniques and applications,
including Multimodal Instruction Tuning (M-IT), Multimodal In-Context Learning
(M-ICL), Multimodal Chain of Thought (M-CoT), and LLM-Aided Visual Reasoning
(LAVR). Finally, we discuss existing challenges and point out promising
research directions. In light of the fact that the era of MLLM has only just
begun, we will keep updating this survey and hope it can inspire more research.
An associated GitHub link collecting the latest papers is available at
https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.Comment: Project
page:https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Model
Controllable Multi-Objective Re-ranking with Policy Hypernetworks
Multi-stage ranking pipelines have become widely used strategies in modern
recommender systems, where the final stage aims to return a ranked list of
items that balances a number of requirements such as user preference,
diversity, novelty etc. Linear scalarization is arguably the most widely used
technique to merge multiple requirements into one optimization objective, by
summing up the requirements with certain preference weights. Existing
final-stage ranking methods often adopt a static model where the preference
weights are determined during offline training and kept unchanged during online
serving. Whenever a modification of the preference weights is needed, the model
has to be re-trained, which is time and resources inefficient. Meanwhile, the
most appropriate weights may vary greatly for different groups of targeting
users or at different time periods (e.g., during holiday promotions). In this
paper, we propose a framework called controllable multi-objective re-ranking
(CMR) which incorporates a hypernetwork to generate parameters for a re-ranking
model according to different preference weights. In this way, CMR is enabled to
adapt the preference weights according to the environment changes in an online
manner, without retraining the models. Moreover, we classify practical
business-oriented tasks into four main categories and seamlessly incorporate
them in a new proposed re-ranking model based on an Actor-Evaluator framework,
which serves as a reliable real-world testbed for CMR. Offline experiments
based on the dataset collected from Taobao App showed that CMR improved several
popular re-ranking models by using them as underlying models. Online A/B tests
also demonstrated the effectiveness and trustworthiness of CMR
Mutation analysis of the WNT4 gene in Han Chinese women with premature ovarian failure
BACKGROUND: The WNT4 gene plays an important role in female sex determination and differentiation. It also contributes to maintaining of the ovaries and the survival of follicles. METHODS: We sequenced the coding region and splice sites of WNT4 in 145 Han Chinese women with premature ovarian failure (POF) and 200 healthy controls. RESULTS: Only one novel variation, in Exon 2 (195C > T), was detected among the women with POF. However, this synonymous variation did not result in a change in amino acid sequence (65 Asp > Asp). No further variants were found in any of the samples. CONCLUSION: Although we cannot provide any evidence that it is a possible disease-causing gene, this study is the first attempt to investigate the possible role of WNT4 in Han Chinese women with POF
Direct-Current Generator Based on Dynamic Water-Semiconductor Junction with Polarized Water as Moving Dielectric Medium
There is a rising prospective in harvesting energy from water droplets, as
microscale energy is required for the distributed sensors in the interconnected
human society. However, achieving a sustainable direct-current generating
device from water flow is rarely reported, and the quantum polarization
principle of the water molecular remains uncovered. Herein, we propose a
dynamic water-semiconductor junction with moving water sandwiched between two
semiconductors as a moving dielectric medium, which outputs a sustainable
direct-current voltage of 0.3 V and current of 0.64 uA with low internal
resistance of 390 kilohm. The sustainable direct-current electricity is
originating from the dynamic water polarization process in water-semiconductor
junction, in which water molecules are continuously polarized and depolarized
driven by the mechanical force and Fermi level difference, during the movement
of the water on silicon. We further demonstrated an encapsulated portable
power-generating device with simple structure and continuous direct-current
voltage, which exhibits its promising potential application in the field of
wearable electronic generators
Evaluating Open-QA Evaluation
This study focuses on the evaluation of the Open Question Answering (Open-QA)
task, which can directly estimate the factuality of large language models
(LLMs). Current automatic evaluation methods have shown limitations, indicating
that human evaluation still remains the most reliable approach. We introduce a
new task, Evaluating QA Evaluation (QA-Eval) and the corresponding dataset
EVOUNA, designed to assess the accuracy of AI-generated answers in relation to
standard answers within Open-QA. Our evaluation of these methods utilizes
human-annotated results to measure their performance. Specifically, the work
investigates methods that show high correlation with human evaluations, deeming
them more reliable. We also discuss the pitfalls of current methods and methods
to improve LLM-based evaluators. We believe this new QA-Eval task and
corresponding dataset EVOUNA will facilitate the development of more effective
automatic evaluation tools and prove valuable for future research in this area.
All resources are available at \url{https://github.com/wangcunxiang/QA-Eval}
and it is under the Apache-2.0 License
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Hallucination is a big shadow hanging over the rapidly evolving Multimodal
Large Language Models (MLLMs), referring to the phenomenon that the generated
text is inconsistent with the image content. In order to mitigate
hallucinations, existing studies mainly resort to an instruction-tuning manner
that requires retraining the models with specific data. In this paper, we pave
a different way, introducing a training-free method named Woodpecker. Like a
woodpecker heals trees, it picks out and corrects hallucinations from the
generated text. Concretely, Woodpecker consists of five stages: key concept
extraction, question formulation, visual knowledge validation, visual claim
generation, and hallucination correction. Implemented in a post-remedy manner,
Woodpecker can easily serve different MLLMs, while being interpretable by
accessing intermediate outputs of the five stages. We evaluate Woodpecker both
quantitatively and qualitatively and show the huge potential of this new
paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement
in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released
at https://github.com/BradyFU/Woodpecker.Comment: 16 pages, 7 figures. Code Website:
https://github.com/BradyFU/Woodpecke
- …