259 research outputs found
Towards the Transferable Audio Adversarial Attack via Ensemble Methods
In recent years, deep learning (DL) models have achieved significant progress
in many domains, such as autonomous driving, facial recognition, and speech
recognition. However, the vulnerability of deep learning models to adversarial
attacks has raised serious concerns in the community because of their
insufficient robustness and generalization. Also, transferable attacks have
become a prominent method for black-box attacks. In this work, we explore the
potential factors that impact adversarial examples (AEs) transferability in
DL-based speech recognition. We also discuss the vulnerability of different DL
systems and the irregular nature of decision boundaries. Our results show a
remarkable difference in the transferability of AEs between speech and images,
with the data relevance being low in images but opposite in speech recognition.
Motivated by dropout-based ensemble approaches, we propose random gradient
ensembles and dynamic gradient-weighted ensembles, and we evaluate the impact
of ensembles on the transferability of AEs. The results show that the AEs
created by both approaches are valid for transfer to the black box API.Comment: Submitted to Cybersecurity journal 202
Concepción, guion literario y desarrollo narrativo del cortometraje de ficción “Vecinos extraños”.
Este trabajo presenta la concepción, el guion literario y el desarrollo narrativo del
cortometraje de ficción audiovisual y de creación propia “Vecinos extraños”. A lo largo
de la memoria de TFM el lector puede consultar el método seguido para la elaboración
del cortometraje (incluyendo la formación del equipo de rodaje y los materiales de
filmación), así como un desglose minucioso de las fases de preproducción, rodaje y
postproducción. Posteriormente, detallamos la experiencia personal que ha supuesto
para cada uno de los autores (Feng Yuxuan y Qin Xuyang) la elaboración de este TFM
como punto culminante de un año inmersos en el Máster de Cine, Comunicación e
Industria Audiovisual de la Universidad de Valladolid. Por último, el lector puede
consultar el guion literario íntegro de “Vecinos extraños” en formato profesional de
guion y, por supuesto, visionar el cortometraje a través del enlace facilitado para ello.Departamento de Historia Moderna, Contemporánea y de América, Periodismo y Comunicación Audiovisual y PublicidadMáster en Cine, comunicación e Industria Audiovisua
LLaMA Rider: Spurring Large Language Models to Explore the Open World
Recently, various studies have leveraged Large Language Models (LLMs) to help
decision-making and planning in environments, and try to align the LLMs'
knowledge with the world conditions. Nonetheless, the capacity of LLMs to
continuously acquire environmental knowledge and adapt in an open world remains
uncertain. In this paper, we propose an approach to spur LLMs to explore the
open world, gather experiences, and learn to improve their task-solving
capabilities. In this approach, a multi-round feedback-revision mechanism is
utilized to encourage LLMs to actively select appropriate revision actions
guided by feedback information from the environment. This facilitates
exploration and enhances the model's performance. Besides, we integrate
sub-task relabeling to assist LLMs in maintaining consistency in sub-task
planning and help the model learn the combinatorial nature between tasks,
enabling it to complete a wider range of tasks through training based on the
acquired exploration experiences. By evaluation in Minecraft, an open-ended
sandbox world, we demonstrate that our approach LLaMA-Rider enhances the
efficiency of the LLM in exploring the environment, and effectively improves
the LLM's ability to accomplish more tasks through fine-tuning with merely 1.3k
instances of collected data, showing minimal training costs compared to the
baseline using reinforcement learning.Comment: 18 page
Integrating Relation Constraints with Neural Relation Extractors
Recent years have seen rapid progress in identifying predefined relationship
between entity pairs using neural networks NNs. However, such models often make
predictions for each entity pair individually, thus often fail to solve the
inconsistency among different predictions, which can be characterized by
discrete relation constraints. These constraints are often defined over
combinations of entity-relation-entity triples, since there often lack of
explicitly well-defined type and cardinality requirements for the relations. In
this paper, we propose a unified framework to integrate relation constraints
with NNs by introducing a new loss term, ConstraintLoss. Particularly, we
develop two efficient methods to capture how well the local predictions from
multiple instance pairs satisfy the relation constraints. Experiments on both
English and Chinese datasets show that our approach can help NNs learn from
discrete relation constraints to reduce inconsistency among local predictions,
and outperform popular neural relation extraction NRE models even enhanced with
extra post-processing. Our source code and datasets will be released at
https://github.com/PKUYeYuan/Constraint-Loss-AAAI-2020.Comment: Accepted to AAAI-202
How Many Answers Should I Give? An Empirical Study of Multi-Answer Reading Comprehension
The multi-answer phenomenon, where a question may have multiple answers
scattered in the document, can be well handled by humans but is challenging
enough for machine reading comprehension (MRC) systems. Despite recent progress
in multi-answer MRC, there lacks a systematic analysis of how this phenomenon
arises and how to better address it. In this work, we design a taxonomy to
categorize commonly-seen multi-answer MRC instances, with which we inspect
three multi-answer datasets and analyze where the multi-answer challenge comes
from. We further analyze how well different paradigms of current multi-answer
MRC models deal with different types of multi-answer instances. We find that
some paradigms capture well the key information in the questions while others
better model the relationship between questions and contexts. We thus explore
strategies to make the best of the strengths of different paradigms.
Experiments show that generation models can be a promising platform to
incorporate different paradigms. Our annotations and code are released for
further research.Comment: Findings of ACL 202
Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Large text-to-image diffusion models have impressive capabilities in
generating photorealistic images from text prompts. How to effectively guide or
control these powerful models to perform different downstream tasks becomes an
important open problem. To tackle this challenge, we introduce a principled
finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image
diffusion models to downstream tasks. Unlike existing methods, OFT can provably
preserve hyperspherical energy which characterizes the pairwise neuron
relationship on the unit hypersphere. We find that this property is crucial for
preserving the semantic generation ability of text-to-image diffusion models.
To improve finetuning stability, we further propose Constrained Orthogonal
Finetuning (COFT) which imposes an additional radius constraint to the
hypersphere. Specifically, we consider two important finetuning text-to-image
tasks: subject-driven generation where the goal is to generate subject-specific
images given a few images of a subject and a text prompt, and controllable
generation where the goal is to enable the model to take in additional control
signals. We empirically show that our OFT framework outperforms existing
methods in generation quality and convergence speed.Comment: NeurIPS 2023 (43 pages, 34 figures, project page:
https://oft.wyliu.com/
Metasurface-Based Free-Space Multi-port Beam Splitter with Arbitrary Power Ratio
A beam splitter (BS) is one of the most critical building blocks in optical
systems. Despite various attempts of flat-type BSs to miniaturize the
conventional cube BS reported, it remains a challenge to realize an ultrathin
optical BS with multi-port output, non-uniform splitting ratio and steerable
outgoing directions. Herein, we have demonstrated a free-space optical
multi-port beam splitter (MPBS) based on a polarization-independent
all-dielectric metasurface. By applying an optimized phase-pattern paradigm via
a gradient-descent-based iterative algorithm to amorphous silicon (a-Si)
metasurfaces, we have prepared a variety of MPBS samples with arbitrarily
predetermined output port number (2~7), power ratio and spatial distribution of
output beams. The experimental results reveal that the fabricated MPBSs could
achieve high total splitting efficiency (TSE, above 74.7%) and beam-splitting
fidelity (similarity, above 78.4%) within the bandwidth of 100 nm (1500~1600
nm). We envision that such MPBS could provide fabulous flexibility for optical
integrated system and diverse applications
- …