278 research outputs found

    Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

    Full text link
    This paper presents ER-NeRF, a novel conditional Neural Radiance Fields (NeRF) based architecture for talking portrait synthesis that can concurrently achieve fast convergence, real-time rendering, and state-of-the-art performance with small model size. Our idea is to explicitly exploit the unequal contribution of spatial regions to guide talking portrait modeling. Specifically, to improve the accuracy of dynamic head reconstruction, a compact and expressive NeRF-based Tri-Plane Hash Representation is introduced by pruning empty spatial regions with three planar hash encoders. For speech audio, we propose a Region Attention Module to generate region-aware condition feature via an attention mechanism. Different from existing methods that utilize an MLP-based encoder to learn the cross-modal relation implicitly, the attention mechanism builds an explicit connection between audio features and spatial regions to capture the priors of local motions. Moreover, a direct and fast Adaptive Pose Encoding is introduced to optimize the head-torso separation problem by mapping the complex transformation of the head pose into spatial coordinates. Extensive experiments demonstrate that our method renders better high-fidelity and audio-lips synchronized talking portrait videos, with realistic details and high efficiency compared to previous methods.Comment: Accepted by ICCV 202

    Fast Updating Truncated SVD for Representation Learning with Sparse Matrices

    Full text link
    Updating a truncated Singular Value Decomposition (SVD) is crucial in representation learning, especially when dealing with large-scale data matrices that continuously evolve in practical scenarios. Aligning SVD-based models with fast-paced updates becomes increasingly important. Existing methods for updating truncated SVDs employ Rayleigh-Ritz projection procedures, where projection matrices are augmented based on original singular vectors. However, these methods suffer from inefficiency due to the densification of the update matrix and the application of the projection to all singular vectors. To address these limitations, we introduce a novel method for dynamically approximating the truncated SVD of a sparse and temporally evolving matrix. Our approach leverages sparsity in the orthogonalization process of augmented matrices and utilizes an extended decomposition to independently store projections in the column space of singular vectors. Numerical experiments demonstrate a remarkable efficiency improvement of an order of magnitude compared to previous methods. Remarkably, this improvement is achieved while maintaining a comparable precision to existing approaches

    Robust Synthetic-to-Real Transfer for Stereo Matching

    Full text link
    With advancements in domain generalized stereo matching networks, models pre-trained on synthetic data demonstrate strong robustness to unseen domains. However, few studies have investigated the robustness after fine-tuning them in real-world scenarios, during which the domain generalization ability can be seriously degraded. In this paper, we explore fine-tuning stereo matching networks without compromising their robustness to unseen domains. Our motivation stems from comparing Ground Truth (GT) versus Pseudo Label (PL) for fine-tuning: GT degrades, but PL preserves the domain generalization ability. Empirically, we find the difference between GT and PL implies valuable information that can regularize networks during fine-tuning. We also propose a framework to utilize this difference for fine-tuning, consisting of a frozen Teacher, an exponential moving average (EMA) Teacher, and a Student network. The core idea is to utilize the EMA Teacher to measure what the Student has learned and dynamically improve GT and PL for fine-tuning. We integrate our framework with state-of-the-art networks and evaluate its effectiveness on several real-world datasets. Extensive experiments show that our method effectively preserves the domain generalization ability during fine-tuning.Comment: Accepted at CVPR 202

    Nasopharyngeal carcinoma with non-squamous phenotype may be a variant of nasopharyngeal squamous cell carcinoma after inhibition of EGFR/PI3K/AKT/mTOR pathway

    Get PDF
    Nasopharyngeal carcinoma (NPC) is a cancerous tumor that develops in the nasopharynx epithelium and typically has squamous differentiation. The squamous phenotype is evident in immunohisto-chemistry, with diffuse nuclear positivity for p63 and p40. Nonetheless, a few NPCs have been identified by clinicopathological diagnosis that do not exhibit the squamous phenotype; these NPCs are currently referred to as non-squamous immuno-phenotype nasopharyngeal carcinomas (NSNPCs). In a previous work, we have revealed similarities between the histological appearance, etiology, and gene alterations of NSNPC and conventional NPC. According to ultrastructural findings, NSNPC still falls under the category of non-keratinized squamous cell carcinoma that is undifferentiated. NSNPC has an excellent prognosis and a low level of malignancy, according to a retrospective investigation. Based on prior research, we investigated the molecular mechanism of NSNPC not expressing the squamous phenotype and its biological behavior. IHC was used to determine the expression of EGFR, PI3K, AKT, p-AKT, mTOR, p-mTOR, Notch, STAT3 and p-STAT3 in a total of 20 NSNPC tissue samples and 20 classic NPC tissue samples. We obtained human NPC cell lines (CNE-2,5-8F) and used EGFR overexpression plasmid and shRNAs to transfect them. To find out whether mRNA and proteins were expressed in the cells, we used Western blotting and qRT-PCR. Cell biological behavior was discovered using the CCK-8 assay, cell migration assay, and cell invasion assay. EGFR, PI3K, p-AKT and p-mTOR proteins were lowly expressed in NSNPC tissues by immunohistochemistry, compared with classical NPC. In the classical NPC cell lines CNE-2 and 5-8F, overexpression EGFR can up-regulate the expression of p63 through the PI3K/AKT/mTOR pathway, and promote the proliferation, migration, and invasion of nasopharyngeal carcinoma cells. At the same time, knockout of EGFR can down-regulate p63 expression through the PI3K/AKT/mTOR pathway, and inhibit the proliferation, migration, and invasion of nasopharyngeal carcinoma cells. The lack of p63 expression in NSNPC was linked with the inhibition of the EGFR/PI3K/AKT/mTOR pathway, and NSNPC may be a variant of classical NPC

    Regulations of the key mediators in inflammation and atherosclerosis by Aspirin in human macrophages

    Get PDF
    Although its role to prevent secondary cardiovascular complications has been well established, how acetyl salicylic acid (ASA, aspirin) regulates certain key molecules in the atherogenesis is still not known. Considering the role of matrix metalloproteinase-9 (MMP-9) to destabilize the atherosclerotic plaques, the roles of the scavenger receptor class BI (SR-BI) and ATP-binding cassette transporter A1 (ABCA1) to promote cholesterol efflux in the foam cells at the plaques, and the role of NF-κB in the overall inflammation related to the atherosclerosis, we addressed whether these molecules are all related to a common mechanism that may be regulated by acetyl salicylic acid. We investigated the effect of ASA to regulate the expressions and activities of these molecules in THP-1 macrophages. Our results showed that ASA inhibited MMP-9 mRNA expression, and caused the decrease in the MMP-9 activities from the cell culture supernatants. In addition, it inhibited the nuclear translocation of NF-κB p65 subunit, thus the activity of this inflammatory molecule. On the contrary, acetyl salicylic acid induced the expressions of ABCA1 and SR-BI, two molecules known to reduce the progression of atherosclerosis, at both mRNA and protein levels. It also stimulated the cholesterol efflux out of macrophages. These data suggest that acetyl salicylic acid may alleviate symptoms of atherosclerosis by two potential mechanisms: maintaining the plaque stability via inhibiting activities of inflammatory molecules MMP-9 and NF-κB, and increasing the cholesterol efflux through inducing expressions of ABCA1 and SR-BI

    STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition

    Full text link
    Existing methods of privacy-preserving action recognition (PPAR) mainly focus on frame-level (spatial) privacy removal through 2D CNNs. Unfortunately, they have two major drawbacks. First, they may compromise temporal dynamics in input videos, which are critical for accurate action recognition. Second, they are vulnerable to practical attacking scenarios where attackers probe for privacy from an entire video rather than individual frames. To address these issues, we propose a novel framework STPrivacy to perform video-level PPAR. For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i.e., sparsification and anonymization, to remove privacy from a spatio-temporal perspective. In specific, our privacy sparsification mechanism applies adaptive token selection to abandon action-irrelevant tubelets. Then, our anonymization mechanism implicitly manipulates the remaining action-tubelets to erase privacy in the embedding space through adversarial learning. These mechanisms provide significant advantages in terms of privacy preservation for human eyes and action-privacy trade-off adjustment during deployment. We additionally contribute the first two large-scale PPAR benchmarks, VP-HMDB51 and VP-UCF101, to the community. Extensive evaluations on them, as well as two other tasks, validate the effectiveness and generalization capability of our framework

    MangaGAN: Unpaired Photo-to-Manga Translation Based on The Methodology of Manga Drawing

    Full text link
    Manga is a world popular comic form originated in Japan, which typically employs black-and-white stroke lines and geometric exaggeration to describe humans' appearances, poses, and actions. In this paper, we propose MangaGAN, the first method based on Generative Adversarial Network (GAN) for unpaired photo-to-manga translation. Inspired by how experienced manga artists draw manga, MangaGAN generates the geometric features of manga face by a designed GAN model and delicately translates each facial region into the manga domain by a tailored multi-GANs architecture. For training MangaGAN, we construct a new dataset collected from a popular manga work, containing manga facial features, landmarks, bodies, and so on. Moreover, to produce high-quality manga faces, we further propose a structural smoothing loss to smooth stroke-lines and avoid noisy pixels, and a similarity preserving module to improve the similarity between domains of photo and manga. Extensive experiments show that MangaGAN can produce high-quality manga faces which preserve both the facial similarity and a popular manga style, and outperforms other related state-of-the-art methods.Comment: 17 page
    corecore