13 research outputs found

    TokenFlow: Consistent Diffusion Features for Consistent Video Editing

    Full text link
    The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-the-art video models are still lagging behind image models in terms of visual quality and user control over the generated content. In this work, we present a framework that harnesses the power of a text-to-image diffusion model for the task of text-driven video editing. Specifically, given a source video and a target text-prompt, our method generates a high-quality video that adheres to the target text, while preserving the spatial layout and motion of the input video. Our method is based on a key observation that consistency in the edited video can be obtained by enforcing consistency in the diffusion feature space. We achieve this by explicitly propagating diffusion features based on inter-frame correspondences, readily available in the model. Thus, our framework does not require any training or fine-tuning, and can work in conjunction with any off-the-shelf text-to-image editing method. We demonstrate state-of-the-art editing results on a variety of real-world videos. Webpage: https://diffusion-tokenflow.github.io

    MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

    Full text link
    Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge, currently mostly addressed by costly and long re-training and fine-tuning or ad-hoc adaptations to specific image generation tasks. In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning. At the center of our approach is a new generation process, based on an optimization task that binds together multiple diffusion generation processes with a shared set of parameters or constraints. We show that MultiDiffusion can be readily applied to generate high quality and diverse images that adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes. Project webpage: https://multidiffusion.github.i

    Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer

    Full text link
    We present a new method for text-driven motion transfer - synthesizing a video that complies with an input text prompt describing the target objects and scene while maintaining an input video's motion and scene layout. Prior methods are confined to transferring motion across two subjects within the same or closely related object categories and are applicable for limited domains (e.g., humans). In this work, we consider a significantly more challenging setting in which the target and source objects differ drastically in shape and fine-grained motion characteristics (e.g., translating a jumping dog into a dolphin). To this end, we leverage a pre-trained and fixed text-to-video diffusion model, which provides us with generative and motion priors. The pillar of our method is a new space-time feature loss derived directly from the model. This loss guides the generation process to preserve the overall motion of the input video while complying with the target object in terms of shape and fine-grained motion traits.Comment: Project page: https://diffusion-motion-transfer.github.io

    Text2LIVE: Text-Driven Layered Image and Video Editing

    Full text link
    We present a method for zero-shot, text-driven appearance manipulation in natural images and videos. Given an input image or video and a target text prompt, our goal is to edit the appearance of existing objects (e.g., object's texture) or augment the scene with visual effects (e.g., smoke, fire) in a semantically meaningful manner. We train a generator using an internal dataset of training examples, extracted from a single input (image or video and target text prompt), while leveraging an external pre-trained CLIP model to establish our losses. Rather than directly generating the edited output, our key idea is to generate an edit layer (color+opacity) that is composited over the original input. This allows us to constrain the generation process and maintain high fidelity to the original input via novel text-driven losses that are applied directly to the edit layer. Our method neither relies on a pre-trained generator nor requires user-provided edit masks. We demonstrate localized, semantic edits on high-resolution natural images and videos across a variety of objects and scenes.Comment: Project page: https://text2live.github.i

    Lumiere: A Space-Time Diffusion Model for Video Generation

    Full text link
    We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.Comment: Webpage: https://lumiere-video.github.io/ | Video: https://www.youtube.com/watch?v=wxLr02Dz2S

    A reference map of potential determinants for the human serum metabolome

    Get PDF
    The serum metabolome contains a plethora of biomarkers and causative agents of various diseases, some of which are endogenously produced and some that have been taken up from the environment(1). The origins of specific compounds are known, including metabolites that are highly heritable(2,3), or those that are influenced by the gut microbiome(4), by lifestyle choices such as smoking(5), or by diet(6). However, the key determinants of most metabolites are still poorly understood. Here we measured the levels of 1,251 metabolites in serum samples from a unique and deeply phenotyped healthy human cohort of 491 individuals. We applied machine-learning algorithms to predict metabolite levels in held-out individuals on the basis of host genetics, gut microbiome, clinical parameters, diet, lifestyle and anthropometric measurements, and obtained statistically significant predictions for more than 76% of the profiled metabolites. Diet and microbiome had the strongest predictive power, and each explained hundreds of metabolites-in some cases, explaining more than 50% of the observed variance. We further validated microbiome-related predictions by showing a high replication rate in two geographically independent cohorts(7,8) that were not available to us when we trained the algorithms. We used feature attribution analysis(9) to reveal specific dietary and bacterial interactions. We further demonstrate that some of these interactions might be causal, as some metabolites that we predicted to be positively associated with bread were found to increase after a randomized clinical trial of bread intervention. Overall, our results reveal potential determinants of more than 800 metabolites, paving the way towards a mechanistic understanding of alterations in metabolites under different conditions and to designing interventions for manipulating the levels of circulating metabolites.The levels of 1,251 metabolites are measured in 475 phenotyped individuals, and machine-learning algorithms reveal that diet and the microbiome are the determinants with the strongest predictive power for the levels of these metabolites

    Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning

    Get PDF
    Understanding the spatial organization of tissues is of critical importance for both basic and translational research. While recent advances in tissue imaging are opening an exciting new window into the biology of human tissues, interpreting the data that they create is a significant computational challenge. Cell segmentation, the task of uniquely identifying each cell in an image, remains a substantial barrier for tissue imaging, as existing approaches are inaccurate or require a substantial amount of manual curation to yield useful results. Here, we addressed the problem of cell segmentation in tissue imaging data through large-scale data annotation and deep learning. We constructed TissueNet, an image dataset containing >1 million paired whole-cell and nuclear annotations for tissue images from nine organs and six imaging platforms. We created Mesmer, a deep learning-enabled segmentation algorithm trained on TissueNet that performs nuclear and whole-cell segmentation in tissue imaging data. We demonstrated that Mesmer has better speed and accuracy than previous methods, generalizes to the full diversity of tissue types and imaging platforms in TissueNet, and achieves human-level performance for whole-cell segmentation. Mesmer enabled the automated extraction of key cellular features, such as subcellular localization of protein signal, which was challenging with previous approaches. We further showed that Mesmer could be adapted to harness cell lineage information present in highly multiplexed datasets. We used this enhanced version to quantify cell morphology changes during human gestation. All underlying code and models are released with permissive licenses as a community resource

    Environment dominates over host genetics in shaping human gut microbiota

    Get PDF
    Human gut microbiome composition is shaped by multiple factors but the relative contribution of host genetics remains elusive. Here we examine genotype and microbiome data from 1,046 healthy individuals with several distinct ancestral origins who share a relatively common environment, and demonstrate that the gut microbiome is not significantly associated with genetic ancestry, and that host genetics have a minor role in determining microbiome composition. We show that, by contrast, there are significant similarities in the compositions of the microbiomes of genetically unrelated individuals who share a household, and that over 20% of the inter-person microbiome variability is associated with factors related to diet, drugs and anthropometric measurements. We further demonstrate that microbiome data significantly improve the prediction accuracy for many human traits, such as glucose and obesity measures, compared to models that use only host genetic and environmental data. These results suggest that microbiome alterations aimed at improving clinical outcomes may be carried out across diverse genetic backgrounds
    corecore