1,218 research outputs found
InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion
This paper addresses a novel task of anticipating 3D human-object
interactions (HOIs). Most existing research on HOI synthesis lacks
comprehensive whole-body interactions with dynamic objects, e.g., often limited
to manipulating small or static objects. Our task is significantly more
challenging, as it requires modeling dynamic objects with various shapes,
capturing whole-body motion, and ensuring physically valid interactions. To
this end, we propose InterDiff, a framework comprising two key steps: (i)
interaction diffusion, where we leverage a diffusion model to encode the
distribution of future human-object interactions; (ii) interaction correction,
where we introduce a physics-informed predictor to correct denoised HOIs in a
diffusion step. Our key insight is to inject prior knowledge that the
interactions under reference with respect to contact points follow a simple
pattern and are easily predictable. Experiments on multiple human-object
interaction datasets demonstrate the effectiveness of our method for this task,
capable of producing realistic, vivid, and remarkably long-term 3D HOI
predictions.Comment: ICCV 2023; Project Page: https://sirui-xu.github.io/InterDiff
Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation
Resource-constrained perception systems such as edge computing and
vision-for-robotics require vision models to be both accurate and lightweight
in computation and memory usage. While knowledge distillation is a proven
strategy to enhance the performance of lightweight classification models, its
application to structured outputs like object detection and instance
segmentation remains a complicated task, due to the variability in outputs and
complex internal network modules involved in the distillation process. In this
paper, we propose a simple yet surprisingly effective sequential approach to
knowledge distillation that progressively transfers the knowledge of a set of
teacher detectors to a given lightweight student. To distill knowledge from a
highly accurate but complex teacher model, we construct a sequence of teachers
to help the student gradually adapt. Our progressive strategy can be easily
combined with existing detection distillation mechanisms to consistently
maximize student performance in various settings. To the best of our knowledge,
we are the first to successfully distill knowledge from Transformer-based
teacher detectors to convolution-based students, and unprecedentedly boost the
performance of ResNet-50 based RetinaNet from 36.5% to 42.0% AP and Mask R-CNN
from 38.2% to 42.5% AP on the MS COCO benchmark.Comment: ICML 202
Pose Guided Attention for Multi-label Fashion Image Classification
We propose a compact framework with guided attention for multi-label
classification in the fashion domain. Our visual semantic attention model
(VSAM) is supervised by automatic pose extraction creating a discriminative
feature space. VSAM outperforms the state of the art for an in-house dataset
and performs on par with previous works on the DeepFashion dataset, even
without using any landmark annotations. Additionally, we show that our semantic
attention module brings robustness to large quantities of wrong annotations and
provides more interpretable results.Comment: Published at ICCV 2019 Workshop on Computer Vision for Fashion, Art
and Desig
Multifunctional luminescent nanomaterials from NaLa(MoO 4) 2:Eu 3+ /Tb 3+ with tunable decay lifetimes, emission colors, and enhanced cell viability
A facile, but effective, method has been developed for large-scale preparation of NaLa(MoO 4) 2 nanorods and microflowers co-doped with Eu 3+ and Tb 3+ ions (abbreviated as: NLM:Ln 3+). The as-synthesized nanomaterials possess a pure tetragonal phase with variable morphologies from shuttle-like nanorods to microflowers by controlling the reaction temperature and the amount of ethylene glycol used. Consequently, the resulting nanomaterials exhibit superb luminescent emissions over the visible region from red through yellow to green by simply changing the relative doping ratios of Eu 3+ to Tb 3+ ions. Biocompatibility study indicates that the addition of NLM:Ln 3+ nanomaterials can stimulate the growth of normal human retinal pigment epithelium (ARPE-19) cells. Therefore, the newly-developed NaLa(MoO 4) 2 nanomaterials hold potentials for a wide range of multifunctional applications, including bioimaging, security protection, optical display, optoelectronics for information storage, and cell stimulation
Aligning Large Multimodal Models with Factually Augmented RLHF
Large Multimodal Models (LMM) are built across modalities and the
misalignment between two modalities can result in "hallucination", generating
textual outputs that are not grounded by the multimodal information in context.
To address the multimodal misalignment issue, we adapt the Reinforcement
Learning from Human Feedback (RLHF) from the text domain to the task of
vision-language alignment, where human annotators are asked to compare two
responses and pinpoint the more hallucinated one, and the vision-language model
is trained to maximize the simulated human rewards. We propose a new alignment
algorithm called Factually Augmented RLHF that augments the reward model with
additional factual information such as image captions and ground-truth
multi-choice options, which alleviates the reward hacking phenomenon in RLHF
and further improves the performance. We also enhance the GPT-4-generated
training data (for vision instruction tuning) with previously available
human-written image-text pairs to improve the general capabilities of our
model. To evaluate the proposed approach in real-world scenarios, we develop a
new evaluation benchmark MMHAL-BENCH with a special focus on penalizing
hallucinations. As the first LMM trained with RLHF, our approach achieves
remarkable improvement on the LLaVA-Bench dataset with the 94% performance
level of the text-only GPT-4 (while previous best methods can only achieve the
87% level), and an improvement by 60% on MMHAL-BENCH over other baselines. We
opensource our code, model, data at https://llava-rlhf.github.io.Comment: Preprin
Salidroside Inhibits HMGB1 Acetylation and Release through Upregulation of SirT1 during Inflammation
HMGB1, a highly conserved nonhistone DNA-binding protein, plays an important role in inflammatory diseases. Once released to the extracellular space, HMGB1 acts as a proinflammatory cytokine that triggers inflammatory reaction. Our previous study showed that salidroside exerts anti-inflammatory effect via inhibiting the JAK2-STAT3 signalling pathway. However, whether salidroside inhibits the release of HMGB1 is still unclear. In this study, we aim to study the effects of salidroside on HMGB1 release and then investigate the potential molecular mechanisms. In an experimental rat model of sepsis caused by CLP, salidroside administration significantly attenuated lung injury and reduced the serum HMGB1 level. In RAW264.7 cells, we investigated the effects of salidroside on LPS-induced HMGB1 release and then explored the underlying molecular mechanisms. We found that salidroside significantly inhibited LPS-induced HMGB1 release, and the inhibitory effect was correlated with the HMGB1 acetylation levels. Mechanismly, salidroside inhibits HMGB1 acetylation through the AMPK-SirT1 pathway. In addition, SirT1 overexpression attenuated LPS-induced HMGB1 acetylation and nucleocytoplasmic translocation. Furthermore, in SirT1 shRNA plasmid-transfected cells, salidroside treatment enhanced SirT1 expression and reduced LPS-activated HMGB1 acetylation and nucleocytoplasmic translocation. Collectively, these results demonstrated that salidroside might reduce HMGB1 release through the AMPK-SirT1 signalling pathway and suppress HMGB1 acetylation and nucleocytoplasmic translocation
- …