139 research outputs found

    Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training

    Full text link
    Generating images from graph-structured inputs, such as scene graphs, is uniquely challenging due to the difficulty of aligning nodes and connections in graphs with objects and their relations in images. Most existing methods address this challenge by using scene layouts, which are image-like representations of scene graphs designed to capture the coarse structures of scene images. Because scene layouts are manually crafted, the alignment with images may not be fully optimized, causing suboptimal compliance between the generated images and the original scene graphs. To tackle this issue, we propose to learn scene graph embeddings by directly optimizing their alignment with images. Specifically, we pre-train an encoder to extract both global and local information from scene graphs that are predictive of the corresponding images, relying on two loss functions: masked autoencoding loss and contrastive loss. The former trains embeddings by reconstructing randomly masked image regions, while the latter trains embeddings to discriminate between compliant and non-compliant images according to the scene graph. Given these embeddings, we build a latent diffusion model to generate images from scene graphs. The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections. On the Visual Genome and COCO-Stuff datasets, we demonstrate that SGDiff outperforms state-of-the-art methods, as measured by both the Inception Score and Fr\'echet Inception Distance (FID) metrics. We will release our source code and trained models at https://github.com/YangLing0818/SGDiff.Comment: Code and models shall be released at https://github.com/YangLing0818/SGDif

    The Snowflake Hypothesis: Training Deep GNN with One Node One Receptive field

    Full text link
    Despite Graph Neural Networks demonstrating considerable promise in graph representation learning tasks, GNNs predominantly face significant issues with over-fitting and over-smoothing as they go deeper as models of computer vision realm. In this work, we conduct a systematic study of deeper GNN research trajectories. Our findings indicate that the current success of deep GNNs primarily stems from (I) the adoption of innovations from CNNs, such as residual/skip connections, or (II) the tailor-made aggregation algorithms like DropEdge. However, these algorithms often lack intrinsic interpretability and indiscriminately treat all nodes within a given layer in a similar manner, thereby failing to capture the nuanced differences among various nodes. To this end, we introduce the Snowflake Hypothesis -- a novel paradigm underpinning the concept of ``one node, one receptive field''. The hypothesis draws inspiration from the unique and individualistic patterns of each snowflake, proposing a corresponding uniqueness in the receptive fields of nodes in the GNNs. We employ the simplest gradient and node-level cosine distance as guiding principles to regulate the aggregation depth for each node, and conduct comprehensive experiments including: (1) different training schemes; (2) various shallow and deep GNN backbones, and (3) various numbers of layers (8, 16, 32, 64) on multiple benchmarks (six graphs including dense graphs with millions of nodes); (4) compare with different aggregation strategies. The observational results demonstrate that our hypothesis can serve as a universal operator for a range of tasks, and it displays tremendous potential on deep GNNs. It can be applied to various GNN frameworks, enhancing its effectiveness when operating in-depth, and guiding the selection of the optimal network depth in an explainable and generalizable way

    CLOP: Video-and-Language Pre-Training with Knowledge Regularizations

    Full text link
    Video-and-language pre-training has shown promising results for learning generalizable representations. Most existing approaches usually model video and text in an implicit manner, without considering explicit structural representations of the multi-modal content. We denote such form of representations as structural knowledge, which express rich semantics of multiple granularities. There are related works that propose object-aware approaches to inject similar knowledge as inputs. However, the existing methods usually fail to effectively utilize such knowledge as regularizations to shape a superior cross-modal representation space. To this end, we propose a Cross-modaL knOwledge-enhanced Pre-training (CLOP) method with Knowledge Regularizations. There are two key designs of ours: 1) a simple yet effective Structural Knowledge Prediction (SKP) task to pull together the latent representations of similar videos; and 2) a novel Knowledge-guided sampling approach for Contrastive Learning (KCL) to push apart cross-modal hard negative samples. We evaluate our method on four text-video retrieval tasks and one multi-choice QA task. The experiments show clear improvements, outperforming prior works by a substantial margin. Besides, we provide ablations and insights of how our methods affect the latent representation space, demonstrating the value of incorporating knowledge regularizations into video-and-language pre-training.Comment: ACM Multimedia 2022 (MM'22

    Construction of a Global Knowledge Input-Output Table

    Get PDF
    This paper describes the construction of the Knowledge Input-Output (KIO) table constructed as part of the RETHINK project. Using PATSTAT data on forty years of patent data from across the globae, the KIO table provides information on the number of patent applications across ten major patenting countries and the rest of the world and across 131 technology classifications. It further provides a network of patent citations, thus indicating how patents build from existing knowledge and contribute to the construction of further innovation. In addition to describing the KIO’s construction, we provide a number of stylized facts on patenting activity and the citation network. These facts illustrate the lessons that can be learned from patent citation data while also identifying potential pitfalls in their use

    The Unintended Consequences of Trade Protection on the Environment

    Get PDF
    We analyze the impact of a rise in protectionism on environmental regulation. Using the 2018 US-China trade war as a quasi-natural experiment, we  nd that higher exposure to tariffs leads to less stringent regulation targets in China, increasing air pollution and carbon emissions. Politically motivated changes in environmental policies rationalize our results: the central government and local party secretaries relax environmental regulations to mitigate the negative consequences of tariffs for polluting industries. We find heterogeneous e ects depending on politicians' characteristics: younger, recently appointed, and more connected local politicians are more likely to ease environmental regulation. This policy reaction benefits politicians: prefectures with the most considerable easing in environmental regulation manage to curb the negative economic consequences of the trade war, while their mayors have a relatively larger probability of promotion. This paper presents the  first empirical evidence of political incentives to manipulate environmental regulation to curb negative economic shocks

    Doped Lead Fluoride Chloride Crystals for the HHCAL Detector Concept

    Full text link

    One-step preparation of robust elastic plastic polyvinyl chloride sponges with a layered structure for highly efficient separation of water-in-oil emulsions

    Get PDF
    To address the environmental pollution and human health issues caused by oily wastewater and PVC plastic waste, a practical zero-waste solution has been developed. In this study, PVC sponges with superlipophilic and superhydrophobic properties were prepared using vapor induced phase inversion and recycling PVC food wrap, without the use of any additives. This sponge effectively separates oil and water. The pore size of PVC sponges could be adjusted by varying the PVC concentration and solvent ratio, which led to improvements in pore density, specific surface area, porosity, oil sorption capacity, and emulsion separation performance. The emulsion separation experiment demonstrated that the 7 wt% PVC sponge (7-0-1) can efficiently separate oil from water-in-oil emulsion, with excellent separation efficiency and a flux of 161.5 L·m−2·h−1·bar−1. Moreover, the sponge exhibits impressive properties such as elastic recovery, flexibility, self-cleaning, and mechanical strength. Remarkably, even after recycling, the sponge maintains its hydrophobicity and emulsion separation performance. This hydrophobic sponge has great potential for mass production and oil–water separation such as in oil spill accidents.publishedVersio
    corecore