1,218 research outputs found

    InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion

    Full text link
    This paper addresses a novel task of anticipating 3D human-object interactions (HOIs). Most existing research on HOI synthesis lacks comprehensive whole-body interactions with dynamic objects, e.g., often limited to manipulating small or static objects. Our task is significantly more challenging, as it requires modeling dynamic objects with various shapes, capturing whole-body motion, and ensuring physically valid interactions. To this end, we propose InterDiff, a framework comprising two key steps: (i) interaction diffusion, where we leverage a diffusion model to encode the distribution of future human-object interactions; (ii) interaction correction, where we introduce a physics-informed predictor to correct denoised HOIs in a diffusion step. Our key insight is to inject prior knowledge that the interactions under reference with respect to contact points follow a simple pattern and are easily predictable. Experiments on multiple human-object interaction datasets demonstrate the effectiveness of our method for this task, capable of producing realistic, vivid, and remarkably long-term 3D HOI predictions.Comment: ICCV 2023; Project Page: https://sirui-xu.github.io/InterDiff

    Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation

    Full text link
    Resource-constrained perception systems such as edge computing and vision-for-robotics require vision models to be both accurate and lightweight in computation and memory usage. While knowledge distillation is a proven strategy to enhance the performance of lightweight classification models, its application to structured outputs like object detection and instance segmentation remains a complicated task, due to the variability in outputs and complex internal network modules involved in the distillation process. In this paper, we propose a simple yet surprisingly effective sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student. To distill knowledge from a highly accurate but complex teacher model, we construct a sequence of teachers to help the student gradually adapt. Our progressive strategy can be easily combined with existing detection distillation mechanisms to consistently maximize student performance in various settings. To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students, and unprecedentedly boost the performance of ResNet-50 based RetinaNet from 36.5% to 42.0% AP and Mask R-CNN from 38.2% to 42.5% AP on the MS COCO benchmark.Comment: ICML 202

    Pose Guided Attention for Multi-label Fashion Image Classification

    Full text link
    We propose a compact framework with guided attention for multi-label classification in the fashion domain. Our visual semantic attention model (VSAM) is supervised by automatic pose extraction creating a discriminative feature space. VSAM outperforms the state of the art for an in-house dataset and performs on par with previous works on the DeepFashion dataset, even without using any landmark annotations. Additionally, we show that our semantic attention module brings robustness to large quantities of wrong annotations and provides more interpretable results.Comment: Published at ICCV 2019 Workshop on Computer Vision for Fashion, Art and Desig

    Multifunctional luminescent nanomaterials from NaLa(MoO 4) 2:Eu 3+ /Tb 3+ with tunable decay lifetimes, emission colors, and enhanced cell viability

    Full text link
    A facile, but effective, method has been developed for large-scale preparation of NaLa(MoO 4) 2 nanorods and microflowers co-doped with Eu 3+ and Tb 3+ ions (abbreviated as: NLM:Ln 3+). The as-synthesized nanomaterials possess a pure tetragonal phase with variable morphologies from shuttle-like nanorods to microflowers by controlling the reaction temperature and the amount of ethylene glycol used. Consequently, the resulting nanomaterials exhibit superb luminescent emissions over the visible region from red through yellow to green by simply changing the relative doping ratios of Eu 3+ to Tb 3+ ions. Biocompatibility study indicates that the addition of NLM:Ln 3+ nanomaterials can stimulate the growth of normal human retinal pigment epithelium (ARPE-19) cells. Therefore, the newly-developed NaLa(MoO 4) 2 nanomaterials hold potentials for a wide range of multifunctional applications, including bioimaging, security protection, optical display, optoelectronics for information storage, and cell stimulation

    Aligning Large Multimodal Models with Factually Augmented RLHF

    Full text link
    Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context. To address the multimodal misalignment issue, we adapt the Reinforcement Learning from Human Feedback (RLHF) from the text domain to the task of vision-language alignment, where human annotators are asked to compare two responses and pinpoint the more hallucinated one, and the vision-language model is trained to maximize the simulated human rewards. We propose a new alignment algorithm called Factually Augmented RLHF that augments the reward model with additional factual information such as image captions and ground-truth multi-choice options, which alleviates the reward hacking phenomenon in RLHF and further improves the performance. We also enhance the GPT-4-generated training data (for vision instruction tuning) with previously available human-written image-text pairs to improve the general capabilities of our model. To evaluate the proposed approach in real-world scenarios, we develop a new evaluation benchmark MMHAL-BENCH with a special focus on penalizing hallucinations. As the first LMM trained with RLHF, our approach achieves remarkable improvement on the LLaVA-Bench dataset with the 94% performance level of the text-only GPT-4 (while previous best methods can only achieve the 87% level), and an improvement by 60% on MMHAL-BENCH over other baselines. We opensource our code, model, data at https://llava-rlhf.github.io.Comment: Preprin

    Salidroside Inhibits HMGB1 Acetylation and Release through Upregulation of SirT1 during Inflammation

    Get PDF
    HMGB1, a highly conserved nonhistone DNA-binding protein, plays an important role in inflammatory diseases. Once released to the extracellular space, HMGB1 acts as a proinflammatory cytokine that triggers inflammatory reaction. Our previous study showed that salidroside exerts anti-inflammatory effect via inhibiting the JAK2-STAT3 signalling pathway. However, whether salidroside inhibits the release of HMGB1 is still unclear. In this study, we aim to study the effects of salidroside on HMGB1 release and then investigate the potential molecular mechanisms. In an experimental rat model of sepsis caused by CLP, salidroside administration significantly attenuated lung injury and reduced the serum HMGB1 level. In RAW264.7 cells, we investigated the effects of salidroside on LPS-induced HMGB1 release and then explored the underlying molecular mechanisms. We found that salidroside significantly inhibited LPS-induced HMGB1 release, and the inhibitory effect was correlated with the HMGB1 acetylation levels. Mechanismly, salidroside inhibits HMGB1 acetylation through the AMPK-SirT1 pathway. In addition, SirT1 overexpression attenuated LPS-induced HMGB1 acetylation and nucleocytoplasmic translocation. Furthermore, in SirT1 shRNA plasmid-transfected cells, salidroside treatment enhanced SirT1 expression and reduced LPS-activated HMGB1 acetylation and nucleocytoplasmic translocation. Collectively, these results demonstrated that salidroside might reduce HMGB1 release through the AMPK-SirT1 signalling pathway and suppress HMGB1 acetylation and nucleocytoplasmic translocation
    • …
    corecore