8 research outputs found
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation
Editing real facial images is a crucial task in computer vision with
significant demand in various real-world applications. While GAN-based methods
have showed potential in manipulating images especially when combined with
CLIP, these methods are limited in their ability to reconstruct real images due
to challenging GAN inversion capability. Despite the successful image
reconstruction achieved by diffusion-based methods, there are still challenges
in effectively manipulating fine-gained facial attributes with textual
instructions.To address these issues and facilitate convenient manipulation of
real facial images, we propose a novel approach that conduct text-driven image
editing in the semantic latent space of diffusion model. By aligning the
temporal feature of the diffusion model with the semantic condition at
generative process, we introduce a stable manipulation strategy, which perform
precise zero-shot manipulation effectively. Furthermore, we develop an
interactive system named ChatFace, which combines the zero-shot reasoning
ability of large language models to perform efficient manipulations in
diffusion semantic latent space. This system enables users to perform complex
multi-attribute manipulations through dialogue, opening up new possibilities
for interactive image editing. Extensive experiments confirmed that our
approach outperforms previous methods and enables precise editing of real
facial images, making it a promising candidate for real-world applications.
Project page: https://dongxuyue.github.io/chatface
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Video-based large language models (Video-LLMs) have been recently introduced,
targeting both fundamental improvements in perception and comprehension, and a
diverse range of user inquiries. In pursuit of the ultimate goal of achieving
artificial general intelligence, a truly intelligent Video-LLM model should not
only see and understand the surroundings, but also possess human-level
commonsense, and make well-informed decisions for the users. To guide the
development of such a model, the establishment of a robust and comprehensive
evaluation system becomes crucial. To this end, this paper proposes
\textit{Video-Bench}, a new comprehensive benchmark along with a toolkit
specifically designed for evaluating Video-LLMs. The benchmark comprises 10
meticulously crafted tasks, evaluating the capabilities of Video-LLMs across
three distinct levels: Video-exclusive Understanding, Prior Knowledge-based
Question-Answering, and Comprehension and Decision-making. In addition, we
introduce an automatic toolkit tailored to process model outputs for various
tasks, facilitating the calculation of metrics and generating convenient final
scores. We evaluate 8 representative Video-LLMs using \textit{Video-Bench}. The
findings reveal that current Video-LLMs still fall considerably short of
achieving human-like comprehension and analysis of real-world videos, offering
valuable insights for future research directions. The benchmark and toolkit are
available at: \url{https://github.com/PKU-YuanGroup/Video-Bench}.Comment: Benchmark is available at
https://github.com/PKU-YuanGroup/Video-Benc
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
The video-language (VL) pretraining has achieved remarkable improvement in
multiple downstream tasks. However, the current VL pretraining framework is
hard to extend to multiple modalities (N modalities, N>=3) beyond vision and
language. We thus propose LanguageBind, taking the language as the bind across
different modalities because the language modality is well-explored and
contains rich semantics. Specifically, we freeze the language encoder acquired
by VL pretraining, then train encoders for other modalities with contrastive
learning. As a result, all modalities are mapped to a shared feature space,
implementing multi-modal semantic alignment. While LanguageBind ensures that we
can extend VL modalities to N modalities, we also need a high-quality dataset
with alignment data pairs centered on language. We thus propose VIDAL-10M with
Video, Infrared, Depth, Audio and their corresponding Language, naming as
VIDAL-10M. In our VIDAL-10M, all videos are from short video platforms with
complete semantics rather than truncated segments from long videos, and all the
video, depth, infrared, and audio modalities are aligned to their textual
descriptions. LanguageBind has achieved superior performance on a wide range of
15 benchmarks covering video, audio, depth, and infrared. Moreover, multiple
experiments have provided evidence for the effectiveness of LanguageBind in
achieving indirect alignment and complementarity among diverse modalities. Code
address: https://github.com/PKU-YuanGroup/LanguageBindComment: Accepted by ICLR 202
Regenerated woody plants influence soil microbial communities in a subtropical forest
10 páginas.- 4 figuras.- 3 tablas.- referencias.- upplementary data to this article can be found online at https://doi.
org/10.1016/j.apsoil.2023.104890Forests are critical for supporting multiple ecosystem services such as climate change mitigation. Microbial diversity in soil provides important functions to maintain and regenerate forest ecosystems, and yet a critical knowledge gap remains in identifying the linkage between attributes of regenerated woody plant (RWP) communities and the diversity patterns of soil microbial communities in subtropical plantations. Here, we investigated the changes in soil microbial communities and plant traits in a nine hectare Chinese fir (Cunninghamia lanceolata; CF) plantation to assess how non-planted RWP communities regulate soil bacterial and fungal diversity, and further explore the potential mechanisms that structure their interaction. Our study revealed that soil bacterial richness was positively associated with RWP richness, whereas soil fungal richness was negatively associated with RWP basal area. Meanwhile, RWP richness was positively correlated with ectomycorrhizal (ECM) fungal richness but negatively correlated with the richness of both pathogenic and saprotrophic fungi, suggesting that the RWP-fungal richness relationship was trophic guild-specific. Soil microbial community beta diversity (i.e., dissimilarity in community composition) was strongly coupled with both RWP beta diversity and the heterogeneity of RWP basal area. Our study highlights the importance of community-level RWP plant attributes for the regulation of microbial biodiversity in plantation systems, which should be considered in forest management programs in the future.This work was funded by the National Key Research and Development Program of China (2021YFD2201301 and 2022YFF1303003), the National Natural Science Foundation of China (U22A20612), and the Key Project of Jiangxi Province Natural Science Foundation of China (20224ACB205003).Peer reviewe