84 research outputs found
PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models
This paper presents PolyDiffuse, a novel structured reconstruction algorithm
that transforms visual sensor data into polygonal shapes with Diffusion Models
(DM), an emerging machinery amid exploding generative AI, while formulating
reconstruction as a generation process conditioned on sensor data. The task of
structured reconstruction poses two fundamental challenges to DM: 1) A
structured geometry is a ``set'' (e.g., a set of polygons for a floorplan
geometry), where a sample of elements has different but equivalent
representations, making the denoising highly ambiguous; and 2) A
``reconstruction'' task has a single solution, where an initial noise needs to
be chosen carefully, while any initial noise works for a generation task. Our
technical contribution is the introduction of a Guided Set Diffusion Model
where 1) the forward diffusion process learns guidance networks to control
noise injection so that one representation of a sample remains distinct from
its other permutation variants, thus resolving denoising ambiguity; and 2) the
reverse denoising process reconstructs polygonal shapes, initialized and
directed by the guidance networks, as a conditional generation process subject
to the sensor data. We have evaluated our approach for reconstructing two types
of polygonal shapes: floorplan as a set of polygons and HD map for autonomous
cars as a set of polylines. Through extensive experiments on standard
benchmarks, we demonstrate that PolyDiffuse significantly advances the current
state of the art and enables broader practical applications.Comment: Project page: https://poly-diffuse.github.io
On time-consistent equilibrium stopping under aggregation of diverse discount rates
This paper studies the central planner's decision making on behalf of a group
of members with diverse discount rates. In the context of optimal stopping, we
work with a smooth aggregation preference to incorporate all heterogeneous
discount rates with an attitude function that reflects the aggregation rule in
the same spirit of ambiguity aversion in the smooth ambiguity preference
proposed in Klibanoff et al.(2005). The optimal stopping problem renders to be
time inconsistent, for which we develop an iterative approach using consistent
planning and characterize all time-consistent equilibria as fixed points of an
operator in the setting of one-dimensional diffusion processes. We provide some
sufficient conditions on both the underlying models and the attitude function
such that the smallest equilibrium attains the optimal equilibrium in which the
attitude function becomes equivalent to the linear aggregation rule as of
diversity neutral. When the sufficient condition of the attitude function is
violated, we can illustrate by various examples that the characterization of
the optimal equilibrium may differ significantly from some existing results for
an individual agent, which now sensitively depends on the attitude function and
the diversity distribution of discount rates
Predicting Gene Ontology Function of Human MicroRNAs by Integrating Multiple Networks
MicroRNAs (miRNAs) have been demonstrated to play significant biological roles in many human biological processes. Inferring the functions of miRNAs is an important strategy for understanding disease pathogenesis at the molecular level. In this paper, we propose an integrated model, PmiRGO, to infer the gene ontology (GO) functions of miRNAs by integrating multiple data sources, including the expression profiles of miRNAs, miRNA-target interactions, and protein-protein interactions (PPI). PmiRGO starts by building a global network consisting of three networks. Then, it employs DeepWalk to learn latent representations as network features of the global heterogeneous network. Finally, the SVM-based models are applied to label the GO terms of miRNAs. The experimental results show that PmiRGO has a significantly better performance than existing state-of-the-art methods in terms of Fmax. A case study further demonstrates the feasibility of PmiRGO to annotate the potential functions of miRNAs
Dual Defense: Adversarial, Traceable, and Invisible Robust Watermarking against Face Swapping
The malicious applications of deep forgery, represented by face swapping,
have introduced security threats such as misinformation dissemination and
identity fraud. While some research has proposed the use of robust watermarking
methods to trace the copyright of facial images for post-event traceability,
these methods cannot effectively prevent the generation of forgeries at the
source and curb their dissemination. To address this problem, we propose a
novel comprehensive active defense mechanism that combines traceability and
adversariality, called Dual Defense. Dual Defense invisibly embeds a single
robust watermark within the target face to actively respond to sudden cases of
malicious face swapping. It disrupts the output of the face swapping model
while maintaining the integrity of watermark information throughout the entire
dissemination process. This allows for watermark extraction at any stage of
image tracking for traceability. Specifically, we introduce a watermark
embedding network based on original-domain feature impersonation attack. This
network learns robust adversarial features of target facial images and embeds
watermarks, seeking a well-balanced trade-off between watermark invisibility,
adversariality, and traceability through perceptual adversarial encoding
strategies. Extensive experiments demonstrate that Dual Defense achieves
optimal overall defense success rates and exhibits promising universality in
anti-face swapping tasks and dataset generalization ability. It maintains
impressive adversariality and traceability in both original and robust
settings, surpassing current forgery defense methods that possess only one of
these capabilities, including CMUA-Watermark, Anti-Forgery, FakeTagger, or PGD
methods
BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps
Learning to follow instructions is of fundamental importance to autonomous
agents for vision-and-language navigation (VLN). In this paper, we study how an
agent can navigate long paths when learning from a corpus that consists of
shorter ones. We show that existing state-of-the-art agents do not generalize
well. To this end, we propose BabyWalk, a new VLN agent that is learned to
navigate by decomposing long instructions into shorter ones (BabySteps) and
completing them sequentially. A special design memory buffer is used by the
agent to turn its past experiences into contexts for future steps. The learning
process is composed of two phases. In the first phase, the agent uses imitation
learning from demonstration to accomplish BabySteps. In the second phase, the
agent uses curriculum-based reinforcement learning to maximize rewards on
navigation tasks with increasingly longer instructions. We create two new
benchmark datasets (of long navigation tasks) and use them in conjunction with
existing ones to examine BabyWalk's generalization ability. Empirical results
show that BabyWalk achieves state-of-the-art results on several metrics, in
particular, is able to follow long instructions better. The codes and the
datasets are released on our project page https://github.com/Sha-Lab/babywalk.Comment: Accepted by ACL 202
LOGEN: Few-shot Logical Knowledge-Conditioned Text Generation with Self-training
Natural language generation from structured data mainly focuses on
surface-level descriptions, suffering from uncontrollable content selection and
low fidelity. Previous works leverage logical forms to facilitate logical
knowledge-conditioned text generation. Though achieving remarkable progress,
they are data-hungry, which makes the adoption for real-world applications
challenging with limited data. To this end, this paper proposes a unified
framework for logical knowledge-conditioned text generation in the few-shot
setting. With only a few seeds logical forms (e.g., 20/100 shot), our approach
leverages self-training and samples pseudo logical forms based on content and
structure consistency. Experimental results demonstrate that our approach can
obtain better few-shot performance than baselines.Comment: Work in progres
Democratizing Pathological Image Segmentation with Lay Annotators via Molecular-empowered Learning
Multi-class cell segmentation in high-resolution Giga-pixel whole slide
images (WSI) is critical for various clinical applications. Training such an AI
model typically requires labor-intensive pixel-wise manual annotation from
experienced domain experts (e.g., pathologists). Moreover, such annotation is
error-prone when differentiating fine-grained cell types (e.g., podocyte and
mesangial cells) via the naked human eye. In this study, we assess the
feasibility of democratizing pathological AI deployment by only using lay
annotators (annotators without medical domain knowledge). The contribution of
this paper is threefold: (1) We proposed a molecular-empowered learning scheme
for multi-class cell segmentation using partial labels from lay annotators; (2)
The proposed method integrated Giga-pixel level molecular-morphology
cross-modality registration, molecular-informed annotation, and
molecular-oriented segmentation model, so as to achieve significantly superior
performance via 3 lay annotators as compared with 2 experienced pathologists;
(3) A deep corrective learning (learning with imperfect label) method is
proposed to further improve the segmentation performance using partially
annotated noisy data. From the experimental results, our learning method
achieved F1 = 0.8496 using molecular-informed annotations from lay annotators,
which is better than conventional morphology-based annotations (F1 = 0.7051)
from experienced pathologists. Our method democratizes the development of a
pathological segmentation deep model to the lay annotator level, which
consequently scales up the learning process similar to a non-medical computer
vision task. The official implementation and cell annotations are publicly
available at https://github.com/hrlblab/MolecularEL
- …