139 research outputs found
Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training
Generating images from graph-structured inputs, such as scene graphs, is
uniquely challenging due to the difficulty of aligning nodes and connections in
graphs with objects and their relations in images. Most existing methods
address this challenge by using scene layouts, which are image-like
representations of scene graphs designed to capture the coarse structures of
scene images. Because scene layouts are manually crafted, the alignment with
images may not be fully optimized, causing suboptimal compliance between the
generated images and the original scene graphs. To tackle this issue, we
propose to learn scene graph embeddings by directly optimizing their alignment
with images. Specifically, we pre-train an encoder to extract both global and
local information from scene graphs that are predictive of the corresponding
images, relying on two loss functions: masked autoencoding loss and contrastive
loss. The former trains embeddings by reconstructing randomly masked image
regions, while the latter trains embeddings to discriminate between compliant
and non-compliant images according to the scene graph. Given these embeddings,
we build a latent diffusion model to generate images from scene graphs. The
resulting method, called SGDiff, allows for the semantic manipulation of
generated images by modifying scene graph nodes and connections. On the Visual
Genome and COCO-Stuff datasets, we demonstrate that SGDiff outperforms
state-of-the-art methods, as measured by both the Inception Score and Fr\'echet
Inception Distance (FID) metrics. We will release our source code and trained
models at https://github.com/YangLing0818/SGDiff.Comment: Code and models shall be released at
https://github.com/YangLing0818/SGDif
The Snowflake Hypothesis: Training Deep GNN with One Node One Receptive field
Despite Graph Neural Networks demonstrating considerable promise in graph
representation learning tasks, GNNs predominantly face significant issues with
over-fitting and over-smoothing as they go deeper as models of computer vision
realm. In this work, we conduct a systematic study of deeper GNN research
trajectories. Our findings indicate that the current success of deep GNNs
primarily stems from (I) the adoption of innovations from CNNs, such as
residual/skip connections, or (II) the tailor-made aggregation algorithms like
DropEdge. However, these algorithms often lack intrinsic interpretability and
indiscriminately treat all nodes within a given layer in a similar manner,
thereby failing to capture the nuanced differences among various nodes. To this
end, we introduce the Snowflake Hypothesis -- a novel paradigm underpinning the
concept of ``one node, one receptive field''. The hypothesis draws inspiration
from the unique and individualistic patterns of each snowflake, proposing a
corresponding uniqueness in the receptive fields of nodes in the GNNs.
We employ the simplest gradient and node-level cosine distance as guiding
principles to regulate the aggregation depth for each node, and conduct
comprehensive experiments including: (1) different training schemes; (2)
various shallow and deep GNN backbones, and (3) various numbers of layers (8,
16, 32, 64) on multiple benchmarks (six graphs including dense graphs with
millions of nodes); (4) compare with different aggregation strategies. The
observational results demonstrate that our hypothesis can serve as a universal
operator for a range of tasks, and it displays tremendous potential on deep
GNNs. It can be applied to various GNN frameworks, enhancing its effectiveness
when operating in-depth, and guiding the selection of the optimal network depth
in an explainable and generalizable way
CLOP: Video-and-Language Pre-Training with Knowledge Regularizations
Video-and-language pre-training has shown promising results for learning
generalizable representations. Most existing approaches usually model video and
text in an implicit manner, without considering explicit structural
representations of the multi-modal content. We denote such form of
representations as structural knowledge, which express rich semantics of
multiple granularities. There are related works that propose object-aware
approaches to inject similar knowledge as inputs. However, the existing methods
usually fail to effectively utilize such knowledge as regularizations to shape
a superior cross-modal representation space. To this end, we propose a
Cross-modaL knOwledge-enhanced Pre-training (CLOP) method with Knowledge
Regularizations. There are two key designs of ours: 1) a simple yet effective
Structural Knowledge Prediction (SKP) task to pull together the latent
representations of similar videos; and 2) a novel Knowledge-guided sampling
approach for Contrastive Learning (KCL) to push apart cross-modal hard negative
samples. We evaluate our method on four text-video retrieval tasks and one
multi-choice QA task. The experiments show clear improvements, outperforming
prior works by a substantial margin. Besides, we provide ablations and insights
of how our methods affect the latent representation space, demonstrating the
value of incorporating knowledge regularizations into video-and-language
pre-training.Comment: ACM Multimedia 2022 (MM'22
Construction of a Global Knowledge Input-Output Table
This paper describes the construction of the Knowledge Input-Output (KIO) table constructed as part of the RETHINK project. Using PATSTAT data on forty years of patent data from across the globae, the KIO table provides information on the number of patent applications across ten major patenting countries and the rest of the world and across 131 technology classifications. It further provides a network of patent citations, thus indicating how patents build from existing knowledge and contribute to the construction of further innovation. In addition to describing the KIO’s construction, we provide a number of stylized facts on patenting activity and the citation network. These facts illustrate the lessons that can be learned from patent citation data while also identifying potential pitfalls in their use
The Unintended Consequences of Trade Protection on the Environment
We analyze the impact of a rise in protectionism on environmental regulation. Using the 2018 US-China trade war as a quasi-natural experiment, we nd that higher exposure to tariffs leads to less stringent regulation targets in China, increasing air pollution and carbon emissions. Politically motivated changes in environmental policies rationalize our results: the central government and local party secretaries relax environmental regulations to mitigate the negative consequences of tariffs for polluting industries. We find heterogeneous e ects depending on politicians' characteristics: younger, recently appointed, and more connected local politicians are more likely to ease environmental regulation. This policy reaction benefits politicians: prefectures with the most considerable easing in environmental regulation manage to curb the negative economic consequences of the trade war, while their mayors have a relatively larger probability of promotion. This paper presents the first empirical evidence of political incentives to manipulate environmental regulation to curb negative economic shocks
One-step preparation of robust elastic plastic polyvinyl chloride sponges with a layered structure for highly efficient separation of water-in-oil emulsions
To address the environmental pollution and human health issues caused by oily wastewater and PVC plastic waste, a practical zero-waste solution has been developed. In this study, PVC sponges with superlipophilic and superhydrophobic properties were prepared using vapor induced phase inversion and recycling PVC food wrap, without the use of any additives. This sponge effectively separates oil and water. The pore size of PVC sponges could be adjusted by varying the PVC concentration and solvent ratio, which led to improvements in pore density, specific surface area, porosity, oil sorption capacity, and emulsion separation performance. The emulsion separation experiment demonstrated that the 7 wt% PVC sponge (7-0-1) can efficiently separate oil from water-in-oil emulsion, with excellent separation efficiency and a flux of 161.5 L·m−2·h−1·bar−1. Moreover, the sponge exhibits impressive properties such as elastic recovery, flexibility, self-cleaning, and mechanical strength. Remarkably, even after recycling, the sponge maintains its hydrophobicity and emulsion separation performance. This hydrophobic sponge has great potential for mass production and oil–water separation such as in oil spill accidents.publishedVersio
- …