900 research outputs found

    MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation

    Full text link
    In this work, we propose an ID-preserving talking head generation framework, which advances previous methods in two aspects. First, as opposed to interpolating from sparse flow, we claim that dense landmarks are crucial to achieving accurate geometry-aware flow fields. Second, inspired by face-swapping methods, we adaptively fuse the source identity during synthesis, so that the network better preserves the key characteristics of the image portrait. Although the proposed model surpasses prior generation fidelity on established benchmarks, to further make the talking head generation qualified for real usage, personalized fine-tuning is usually needed. However, this process is rather computationally demanding that is unaffordable to standard users. To solve this, we propose a fast adaptation model using a meta-learning approach. The learned model can be adapted to a high-quality personalized model as fast as 30 seconds. Last but not the least, a spatial-temporal enhancement module is proposed to improve the fine details while ensuring temporal coherency. Extensive experiments prove the significant superiority of our approach over the state of the arts in both one-shot and personalized settings.Comment: CVPR 2023, project page: https://meta-portrait.github.i

    Storytelling with salient stills

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1996.Includes bibliographical references (p. 59-63).Michale J. Massey.M.S

    Applications of Graphene Quantum Dots in Biomedical Sensors

    Get PDF
    Due to the proliferative cancer rates, cardiovascular diseases, neurodegenerative disorders, autoimmune diseases and a plethora of infections across the globe, it is essential to introduce strategies that can rapidly and specifically detect the ultralow concentrations of relevant biomarkers, pathogens, toxins and pharmaceuticals in biological matrices. Considering these pathophysiologies, various research works have become necessary to fabricate biosensors for their early diagnosis and treatment, using nanomaterials like quantum dots (QDs). These nanomaterials effectively ameliorate the sensor performance with respect to their reproducibility, selectivity as well as sensitivity. In particular, graphene quantum dots (GQDs), which are ideally graphene fragments of nanometer size, constitute discrete features such as acting as attractive fluorophores and excellent electro-catalysts owing to their photo-stability, water-solubility, biocompatibility, non-toxicity and lucrativeness that make them favorable candidates for a wide range of novel biomedical applications. Herein, we reviewed about 300 biomedical studies reported over the last five years which entail the state of art as well as some pioneering ideas with respect to the prominent role of GQDs, especially in the development of optical, electrochemical and photoelectrochemical biosensors. Additionally, we outline the ideal properties of GQDs, their eclectic methods of synthesis, and the general principle behind several biosensing techniques.DFG, 428780268, Biomimetische Rezeptoren auf NanoMIP-Basis zur Virenerkennung und -entfernung mittels integrierter Ansätz

    SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

    Full text link
    Diffusion models, which have emerged to become popular text-to-image generation models, can produce high-quality and content-rich images guided by textual prompts. However, there are limitations to semantic understanding and commonsense reasoning in existing models when the input prompts are concise narrative, resulting in low-quality image generation. To improve the capacities for narrative prompts, we propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models. To reach this goal, we first collect and annotate a new dataset SURD which consists of more than 57,000 semantically corrected multi-modal samples. Each sample contains a simple narrative prompt, a complex keyword-based prompt, and a high-quality image. Then, we align the semantic representation of narrative prompts to the complex prompts and transfer knowledge of large language models (LLMs) to our SUR-adapter via knowledge distillation so that it can acquire the powerful semantic understanding and reasoning capabilities to build a high-quality textual semantic representation for text-to-image generation. We conduct experiments by integrating multiple LLMs and popular pre-trained diffusion models to show the effectiveness of our approach in enabling diffusion models to understand and reason concise natural language without image quality degradation. Our approach can make text-to-image diffusion models easier to use with better user experience, which demonstrates our approach has the potential for further advancing the development of user-friendly text-to-image generation models by bridging the semantic gap between simple narrative prompts and complex keyword-based prompts. The code is released at https://github.com/Qrange-group/SUR-adapter.Comment: accepted by ACM MM 202
    corecore