137 research outputs found
SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization
In this paper, we introduce Segmentation-Driven Deformation Multi-View Stereo
(SD-MVS), a method that can effectively tackle challenges in 3D reconstruction
of textureless areas. We are the first to adopt the Segment Anything Model
(SAM) to distinguish semantic instances in scenes and further leverage these
constraints for pixelwise patch deformation on both matching cost and
propagation. Concurrently, we propose a unique refinement strategy that
combines spherical coordinates and gradient descent on normals and pixelwise
search interval on depths, significantly improving the completeness of
reconstructed 3D model. Furthermore, we adopt the Expectation-Maximization (EM)
algorithm to alternately optimize the aggregate matching cost and
hyperparameters, effectively mitigating the problem of parameters being
excessively dependent on empirical tuning. Evaluations on the ETH3D
high-resolution multi-view stereo benchmark and the Tanks and Temples dataset
demonstrate that our method can achieve state-of-the-art results with less time
consumption.Comment: 10 pages, 9 figures, published to AAAI202
Phonemic Adversarial Attack against Audio Recognition in Real World
Recently, adversarial attacks for audio recognition have attracted much
attention. However, most of the existing studies mainly rely on the
coarse-grain audio features at the instance level to generate adversarial
noises, which leads to expensive generation time costs and weak universal
attacking ability. Motivated by the observations that all audio speech consists
of fundamental phonemes, this paper proposes a phonemic adversarial tack (PAT)
paradigm, which attacks the fine-grain audio features at the phoneme level
commonly shared across audio instances, to generate phonemic adversarial
noises, enjoying the more general attacking ability with fast generation speed.
Specifically, for accelerating the generation, a phoneme density balanced
sampling strategy is introduced to sample quantity less but phonemic features
abundant audio instances as the training data via estimating the phoneme
density, which substantially alleviates the heavy dependency on the large
training dataset. Moreover, for promoting universal attacking ability, the
phonemic noise is optimized in an asynchronous way with a sliding window, which
enhances the phoneme diversity and thus well captures the critical fundamental
phonemic patterns. By conducting extensive experiments, we comprehensively
investigate the proposed PAT framework and demonstrate that it outperforms the
SOTA baselines by large margins (i.e., at least 11X speed up and 78% attacking
ability improvement)
Robust Unstructured Knowledge Access in Conversational Dialogue with ASR Errors
Performance of spoken language understanding (SLU) can be degraded with
automatic speech recognition (ASR) errors. We propose a novel approach to
improve SLU robustness by randomly corrupting clean training text with an ASR
error simulator, followed by self-correcting the errors and minimizing the
target classification loss in a joint manner. In the proposed error simulator,
we leverage confusion networks generated from an ASR decoder without human
transcriptions to generate a variety of error patterns for model training. We
evaluate our approach on the DSTC10 challenge targeted for knowledge-grounded
task-oriented conversational dialogues with ASR errors. Experimental results
show the effectiveness of our proposed approach, boosting the knowledge-seeking
turn detection (KTD) F1 significantly from 0.9433 to 0.9904. Knowledge cluster
classification is boosted from 0.7924 to 0.9333 in Recall@1. After knowledge
document re-ranking, our approach shows significant improvement in all
knowledge selection metrics, from 0.7358 to 0.7806 in Recall@1, from 0.8301 to
0.9333 in Recall@5, and from 0.7798 to 0.8460 in MRR@5 on the test set. In the
recent DSTC10 evaluation, our approach demonstrates significant improvement in
knowledge selection, boosting Recall@1 from 0.495 to 0.7144 compared to the
official baseline. Our source code is released in GitHub
https://github.com/yctam/dstc10_track2_task2.git.Comment: 7 pages, 2 figures. Accepted at ICASSP 202
Adversarial Examples in the Physical World: A Survey
Deep neural networks (DNNs) have demonstrated high vulnerability to
adversarial examples. Besides the attacks in the digital world, the practical
implications of adversarial examples in the physical world present significant
challenges and safety concerns. However, current research on physical
adversarial examples (PAEs) lacks a comprehensive understanding of their unique
characteristics, leading to limited significance and understanding. In this
paper, we address this gap by thoroughly examining the characteristics of PAEs
within a practical workflow encompassing training, manufacturing, and
re-sampling processes. By analyzing the links between physical adversarial
attacks, we identify manufacturing and re-sampling as the primary sources of
distinct attributes and particularities in PAEs. Leveraging this knowledge, we
develop a comprehensive analysis and classification framework for PAEs based on
their specific characteristics, covering over 100 studies on physical-world
adversarial examples. Furthermore, we investigate defense strategies against
PAEs and identify open challenges and opportunities for future research. We aim
to provide a fresh, thorough, and systematic understanding of PAEs, thereby
promoting the development of robust adversarial learning and its application in
open-world scenarios.Comment: Adversarial examples, physical-world scenarios, attacks and defense
Robustness-enhanced Uplift Modeling with Adversarial Feature Desensitization
Uplift modeling has shown very promising results in online marketing.
However, most existing works are prone to the robustness challenge in some
practical applications. In this paper, we first present a possible explanation
for the above phenomenon. We verify that there is a feature sensitivity problem
in online marketing using different real-world datasets, where the perturbation
of some key features will seriously affect the performance of the uplift model
and even cause the opposite trend. To solve the above problem, we propose a
novel robustness-enhanced uplift modeling framework with adversarial feature
desensitization (RUAD). Specifically, our RUAD can more effectively alleviate
the feature sensitivity of the uplift model through two customized modules,
including a feature selection module with joint multi-label modeling to
identify a key subset from the input features and an adversarial feature
desensitization module using adversarial training and soft interpolation
operations to enhance the robustness of the model against this selected subset
of features. Finally, we conduct extensive experiments on a public dataset and
a real product dataset to verify the effectiveness of our RUAD in online
marketing. In addition, we also demonstrate the robustness of our RUAD to the
feature sensitivity, as well as the compatibility with different uplift models
Attack Deterministic Conditional Image Generative Models for Diverse and Controllable Generation
Existing generative adversarial network (GAN) based conditional image
generative models typically produce fixed output for the same conditional
input, which is unreasonable for highly subjective tasks, such as large-mask
image inpainting or style transfer. On the other hand, GAN-based diverse image
generative methods require retraining/fine-tuning the network or designing
complex noise injection functions, which is computationally expensive,
task-specific, or struggle to generate high-quality results. Given that many
deterministic conditional image generative models have been able to produce
high-quality yet fixed results, we raise an intriguing question: is it possible
for pre-trained deterministic conditional image generative models to generate
diverse results without changing network structures or parameters? To answer
this question, we re-examine the conditional image generation tasks from the
perspective of adversarial attack and propose a simple and efficient plug-in
projected gradient descent (PGD) like method for diverse and controllable image
generation. The key idea is attacking the pre-trained deterministic generative
models by adding a micro perturbation to the input condition. In this way,
diverse results can be generated without any adjustment of network structures
or fine-tuning of the pre-trained models. In addition, we can also control the
diverse results to be generated by specifying the attack direction according to
a reference text or image. Our work opens the door to applying adversarial
attack to low-level vision tasks, and experiments on various conditional image
generation tasks demonstrate the effectiveness and superiority of the proposed
method.Comment: 9 pages, 7 figures, accepted by AAAI2
MIR2: Towards Provably Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization
Robust multi-agent reinforcement learning (MARL) necessitates resilience to
uncertain or worst-case actions by unknown allies. Existing max-min
optimization techniques in robust MARL seek to enhance resilience by training
agents against worst-case adversaries, but this becomes intractable as the
number of agents grows, leading to exponentially increasing worst-case
scenarios. Attempts to simplify this complexity often yield overly pessimistic
policies, inadequate robustness across scenarios and high computational
demands. Unlike these approaches, humans naturally learn adaptive and resilient
behaviors without the necessity of preparing for every conceivable worst-case
scenario. Motivated by this, we propose MIR2, which trains policy in routine
scenarios and minimize Mutual Information as Robust Regularization.
Theoretically, we frame robustness as an inference problem and prove that
minimizing mutual information between histories and actions implicitly
maximizes a lower bound on robustness under certain assumptions. Further
analysis reveals that our proposed approach prevents agents from overreacting
to others through an information bottleneck and aligns the policy with a robust
action prior. Empirically, our MIR2 displays even greater resilience against
worst-case adversaries than max-min optimization in StarCraft II, Multi-agent
Mujoco and rendezvous. Our superiority is consistent when deployed in
challenging real-world robot swarm control scenario. See code and demo videos
in Supplementary Materials
- …