8 research outputs found

    ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation

    Full text link
    Editing real facial images is a crucial task in computer vision with significant demand in various real-world applications. While GAN-based methods have showed potential in manipulating images especially when combined with CLIP, these methods are limited in their ability to reconstruct real images due to challenging GAN inversion capability. Despite the successful image reconstruction achieved by diffusion-based methods, there are still challenges in effectively manipulating fine-gained facial attributes with textual instructions.To address these issues and facilitate convenient manipulation of real facial images, we propose a novel approach that conduct text-driven image editing in the semantic latent space of diffusion model. By aligning the temporal feature of the diffusion model with the semantic condition at generative process, we introduce a stable manipulation strategy, which perform precise zero-shot manipulation effectively. Furthermore, we develop an interactive system named ChatFace, which combines the zero-shot reasoning ability of large language models to perform efficient manipulations in diffusion semantic latent space. This system enables users to perform complex multi-attribute manipulations through dialogue, opening up new possibilities for interactive image editing. Extensive experiments confirmed that our approach outperforms previous methods and enables precise editing of real facial images, making it a promising candidate for real-world applications. Project page: https://dongxuyue.github.io/chatface

    Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

    Full text link
    Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries. In pursuit of the ultimate goal of achieving artificial general intelligence, a truly intelligent Video-LLM model should not only see and understand the surroundings, but also possess human-level commonsense, and make well-informed decisions for the users. To guide the development of such a model, the establishment of a robust and comprehensive evaluation system becomes crucial. To this end, this paper proposes \textit{Video-Bench}, a new comprehensive benchmark along with a toolkit specifically designed for evaluating Video-LLMs. The benchmark comprises 10 meticulously crafted tasks, evaluating the capabilities of Video-LLMs across three distinct levels: Video-exclusive Understanding, Prior Knowledge-based Question-Answering, and Comprehension and Decision-making. In addition, we introduce an automatic toolkit tailored to process model outputs for various tasks, facilitating the calculation of metrics and generating convenient final scores. We evaluate 8 representative Video-LLMs using \textit{Video-Bench}. The findings reveal that current Video-LLMs still fall considerably short of achieving human-like comprehension and analysis of real-world videos, offering valuable insights for future research directions. The benchmark and toolkit are available at: \url{https://github.com/PKU-YuanGroup/Video-Bench}.Comment: Benchmark is available at https://github.com/PKU-YuanGroup/Video-Benc

    LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

    Full text link
    The video-language (VL) pretraining has achieved remarkable improvement in multiple downstream tasks. However, the current VL pretraining framework is hard to extend to multiple modalities (N modalities, N>=3) beyond vision and language. We thus propose LanguageBind, taking the language as the bind across different modalities because the language modality is well-explored and contains rich semantics. Specifically, we freeze the language encoder acquired by VL pretraining, then train encoders for other modalities with contrastive learning. As a result, all modalities are mapped to a shared feature space, implementing multi-modal semantic alignment. While LanguageBind ensures that we can extend VL modalities to N modalities, we also need a high-quality dataset with alignment data pairs centered on language. We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M. In our VIDAL-10M, all videos are from short video platforms with complete semantics rather than truncated segments from long videos, and all the video, depth, infrared, and audio modalities are aligned to their textual descriptions. LanguageBind has achieved superior performance on a wide range of 15 benchmarks covering video, audio, depth, and infrared. Moreover, multiple experiments have provided evidence for the effectiveness of LanguageBind in achieving indirect alignment and complementarity among diverse modalities. Code address: https://github.com/PKU-YuanGroup/LanguageBindComment: Accepted by ICLR 202

    Regenerated woody plants influence soil microbial communities in a subtropical forest

    Get PDF
    10 páginas.- 4 figuras.- 3 tablas.- referencias.- upplementary data to this article can be found online at https://doi. org/10.1016/j.apsoil.2023.104890Forests are critical for supporting multiple ecosystem services such as climate change mitigation. Microbial diversity in soil provides important functions to maintain and regenerate forest ecosystems, and yet a critical knowledge gap remains in identifying the linkage between attributes of regenerated woody plant (RWP) communities and the diversity patterns of soil microbial communities in subtropical plantations. Here, we investigated the changes in soil microbial communities and plant traits in a nine hectare Chinese fir (Cunninghamia lanceolata; CF) plantation to assess how non-planted RWP communities regulate soil bacterial and fungal diversity, and further explore the potential mechanisms that structure their interaction. Our study revealed that soil bacterial richness was positively associated with RWP richness, whereas soil fungal richness was negatively associated with RWP basal area. Meanwhile, RWP richness was positively correlated with ectomycorrhizal (ECM) fungal richness but negatively correlated with the richness of both pathogenic and saprotrophic fungi, suggesting that the RWP-fungal richness relationship was trophic guild-specific. Soil microbial community beta diversity (i.e., dissimilarity in community composition) was strongly coupled with both RWP beta diversity and the heterogeneity of RWP basal area. Our study highlights the importance of community-level RWP plant attributes for the regulation of microbial biodiversity in plantation systems, which should be considered in forest management programs in the future.This work was funded by the National Key Research and Development Program of China (2021YFD2201301 and 2022YFF1303003), the National Natural Science Foundation of China (U22A20612), and the Key Project of Jiangxi Province Natural Science Foundation of China (20224ACB205003).Peer reviewe

    An association of smoking with serum urate and gout: A health paradox

    No full text
    corecore