121 research outputs found
DiffuStereo: High Quality Human Reconstruction via Diffusion-based Stereo Using Sparse Cameras
We propose DiffuStereo, a novel system using only sparse cameras (8 in this
work) for high-quality 3D human reconstruction. At its core is a novel
diffusion-based stereo module, which introduces diffusion models, a type of
powerful generative models, into the iterative stereo matching network. To this
end, we design a new diffusion kernel and additional stereo constraints to
facilitate stereo matching and depth estimation in the network. We further
present a multi-level stereo network architecture to handle high-resolution (up
to 4k) inputs without requiring unaffordable memory footprint. Given a set of
sparse-view color images of a human, the proposed multi-level diffusion-based
stereo network can produce highly accurate depth maps, which are then converted
into a high-quality 3D human model through an efficient multi-view fusion
strategy. Overall, our method enables automatic reconstruction of human models
with quality on par to high-end dense-view camera rigs, and this is achieved
using a much more light-weight hardware setup. Experiments show that our method
outperforms state-of-the-art methods by a large margin both qualitatively and
quantitatively.Comment: Accepted by ECCV202
Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval
Image retrieval targets to find images from a database that are visually
similar to the query image. Two-stage methods following retrieve-and-rerank
paradigm have achieved excellent performance, but their separate local and
global modules are inefficient to real-world applications. To better trade-off
retrieval efficiency and accuracy, some approaches fuse global and local
feature into a joint representation to perform single-stage image retrieval.
However, they are still challenging due to various situations to tackle,
, background, occlusion and viewpoint. In this work, we design a
Coarse-to-Fine framework to learn Compact Discriminative representation (CFCD)
for end-to-end single-stage image retrieval-requiring only image-level labels.
Specifically, we first design a novel adaptive softmax-based loss which
dynamically tunes its scale and margin within each mini-batch and increases
them progressively to strengthen supervision during training and intra-class
compactness. Furthermore, we propose a mechanism which attentively selects
prominent local descriptors and infuse fine-grained semantic relations into the
global representation by a hard negative sampling strategy to optimize
inter-class distinctiveness at a global scale. Extensive experimental results
have demonstrated the effectiveness of our method, which achieves
state-of-the-art single-stage image retrieval performance on benchmarks such as
Revisited Oxford and Revisited Paris. Code is available at
https://github.com/bassyess/CFCD.Comment: Accepted to ICCV 202
Phosphorus-doped porous carbons as efficient electrocatalysts for oxygen reduction
Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG geförderten) Allianz- bzw. Nationallizenz frei zugänglich.This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively.Efficient electrocatalysts for the oxygen reduction reaction (ORR) play a critical role in the performance of fuel cells and metal–air batteries. In this study, we report a facile synthesis of phosphorus (P)-doped porous carbon as a highly active electrocatalyst for the ORR. Phosphorus-doped porous carbon was prepared by simultaneous doping and activation of carbon with phosphoric acid (H3PO4) in the presence of Co. Both phosphorus and cobalt were found to play significant roles in improving the catalytic activity of carbon for the ORR. The as-prepared phosphorus-doped porous carbon exhibited considerable catalytic activity for the ORR as evidenced by rotating ring-disk electrode studies. At the same mass loading, the Tafel slope of phosphorus-doped porous carbon electrocatalysts is comparable to that of the commercial Pt/C catalysts (20 wt% Pt on Vulcan XC-72, Johnson Matthey) with stability superior to Pt/C in alkaline solutions
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
We present DreamCraft3D, a hierarchical 3D content generation method that
produces high-fidelity and coherent 3D objects. We tackle the problem by
leveraging a 2D reference image to guide the stages of geometry sculpting and
texture boosting. A central focus of this work is to address the consistency
issue that existing works encounter. To sculpt geometries that render
coherently, we perform score distillation sampling via a view-dependent
diffusion model. This 3D prior, alongside several training strategies,
prioritizes the geometry consistency but compromises the texture fidelity. We
further propose Bootstrapped Score Distillation to specifically boost the
texture. We train a personalized diffusion model, Dreambooth, on the augmented
renderings of the scene, imbuing it with 3D knowledge of the scene being
optimized. The score distillation from this 3D-aware diffusion prior provides
view-consistent guidance for the scene. Notably, through an alternating
optimization of the diffusion prior and 3D scene representation, we achieve
mutually reinforcing improvements: the optimized 3D scene aids in training the
scene-specific diffusion model, which offers increasingly view-consistent
guidance for 3D optimization. The optimization is thus bootstrapped and leads
to substantial texture boosting. With tailored 3D priors throughout the
hierarchical generation, DreamCraft3D generates coherent 3D objects with
photorealistic renderings, advancing the state-of-the-art in 3D content
generation. Code available at https://github.com/deepseek-ai/DreamCraft3D.Comment: Project Page: https://mrtornado24.github.io/DreamCraft3D
Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor
Recent years have witnessed considerable achievements in editing images with
text instructions. When applying these editors to dynamic scene editing, the
new-style scene tends to be temporally inconsistent due to the frame-by-frame
nature of these 2D editors. To tackle this issue, we propose Control4D, a novel
approach for high-fidelity and temporally consistent 4D portrait editing.
Control4D is built upon an efficient 4D representation with a 2D
diffusion-based editor. Instead of using direct supervisions from the editor,
our method learns a 4D GAN from it and avoids the inconsistent supervision
signals. Specifically, we employ a discriminator to learn the generation
distribution based on the edited images and then update the generator with the
discrimination signals. For more stable training, multi-level information is
extracted from the edited images and used to facilitate the learning of the
generator. Experimental results show that Control4D surpasses previous
approaches and achieves more photo-realistic and consistent 4D editing
performances. The link to our project website is
https://control4darxiv.github.io.Comment: The link to our project website is https://control4darxiv.github.i
Coherent interface strengthening of ultrahigh pressure heat-treated Mg-Li-Y alloys.
Achieving good strength-ductility of Mg alloys has always been a crucial issue for the widespread applications of Mg-based structural materials. Herein, an unexpected double-stage strengthening phenomenon was discovered in Mg-8Li-1Y(wt.%) alloys through high pressure (6 GPa) heat treatments over a range of 700-1300°C. Attractively, the yield strength values are improved remarkably without losing their ductility. The low temperature strengthening mechanism is mainly driven by the formation of large-volume nanoscale contraction twins. In contrast, the high-temperature strengthening reason is ascribed to the presence of densely nano-sized stacking faults. Both coherent interfaces contribute effectively to high mechanical strength without any tradeoff in ductility
D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation
Temporal sentence grounding (TSG) aims to locate a specific moment from an
untrimmed video with a given natural language query. Recently, weakly
supervised methods still have a large performance gap compared to fully
supervised ones, while the latter requires laborious timestamp annotations. In
this study, we aim to reduce the annotation cost yet keep competitive
performance for TSG task compared to fully supervised ones. To achieve this
goal, we investigate a recently proposed glance-supervised temporal sentence
grounding task, which requires only single frame annotation (referred to as
glance annotation) for each query. Under this setup, we propose a Dynamic
Gaussian prior based Grounding framework with Glance annotation (D3G), which
consists of a Semantic Alignment Group Contrastive Learning module (SA-GCL) and
a Dynamic Gaussian prior Adjustment module (DGA). Specifically, SA-GCL samples
reliable positive moments from a 2D temporal map via jointly leveraging
Gaussian prior and semantic consistency, which contributes to aligning the
positive sentence-moment pairs in the joint embedding space. Moreover, to
alleviate the annotation bias resulting from glance annotation and model
complex queries consisting of multiple events, we propose the DGA module, which
adjusts the distribution dynamically to approximate the ground truth of target
moments. Extensive experiments on three challenging benchmarks verify the
effectiveness of the proposed D3G. It outperforms the state-of-the-art weakly
supervised methods by a large margin and narrows the performance gap compared
to fully supervised methods. Code is available at
https://github.com/solicucu/D3G.Comment: ICCV202
Unified and Dynamic Graph for Temporal Character Grouping in Long Videos
Video temporal character grouping locates appearing moments of major
characters within a video according to their identities. To this end, recent
works have evolved from unsupervised clustering to graph-based supervised
clustering. However, graph methods are built upon the premise of fixed affinity
graphs, bringing many inexact connections. Besides, they extract multi-modal
features with kinds of models, which are unfriendly to deployment. In this
paper, we present a unified and dynamic graph (UniDG) framework for temporal
character grouping. This is accomplished firstly by a unified representation
network that learns representations of multiple modalities within the same
space and still preserves the modality's uniqueness simultaneously. Secondly,
we present a dynamic graph clustering where the neighbors of different
quantities are dynamically constructed for each node via a cyclic matching
strategy, leading to a more reliable affinity graph. Thirdly, a progressive
association method is introduced to exploit spatial and temporal contexts among
different modalities, allowing multi-modal clustering results to be well fused.
As current datasets only provide pre-extracted features, we evaluate our UniDG
method on a collected dataset named MTCG, which contains each character's
appearing clips of face and body and speaking voice tracks. We also evaluate
our key components on existing clustering and retrieval datasets to verify the
generalization ability. Experimental results manifest that our method can
achieve promising results and outperform several state-of-the-art approaches
Dual-Functional PLGA Nanoparticles Co-Loaded with Indocyanine Green and Resiquimod for Prostate Cancer Treatment
Purpose: With the advance of screening techniques, there is a growing number of low-risk or intermediate-risk prostate cancer (PCa) cases, remaining a serious threat to men's health. To obtain better efficacy, a growing interest has been attracted to develop such emerging treatments as immunotherapy and focal therapy. However, few studies offer guidance on whether and how to combine these modalities against PCa. This study was designed to develop dual-functional nanoparticles (NPs) which combined photothermal therapy (PTT) with immunotherapy and determine the anti-tumor efficacy for PCa treatment. Methods: By a double emulsion technique, the drug nanocarrier, poly(lactic-co-glycolic acid) or PLGA, was applied for co-loading of a fluorescent dye, indocyanine green (ICG) and a toll-like receptor 7/8 (TLR7/8) agonist resiquimod (R848) to synthesize PLGA-ICG-R848 NPs. Next, we determined their characteristic features and evaluated whether they inhibited the cell viability in multiple PCa cell lines. After treatment with PLGA-ICG-R848, the maturation markers of bone marrow-derived dendritic cells (BMDCs) were detected by flow cytometry. By establishing a subcutaneous xenograft model of mouse PCa, we explored both the anti-tumor effect and immune response following the NPs-based laser ablation. Results: With a mean diameter of 157.7 nm, PLGA-ICG-R848 exhibited no cytotoxic effect in PCa cells, but they significantly decreased RM9 cell viability to (3.9 +/- 1.0)% after laser irradiation. Moreover, PLGA-ICG-R848 promoted BMDCs maturation with the significantly elevated proportions of CD11c+CD86+ and CD11c+CD80+ cells. Following PLGA-ICG-R848-based laser ablation in vivo, the decreased bioluminescent signals indicated a significant inhibition of PCa growth, while the ratio of splenic natural killer (NK) cells in PLGA-ICG-R848 was (3.96 +/- 1.88)% compared with (0.99 +/- 0.10)% in PBS group, revealing the enhanced immune response against PCa. Conclusion: The dual-functional PLGA-ICG-R848 NPs under laser irradiation exhibit the anti-tumor efficacy for PCa treatment by combining PTT with immunotherapy
- …