7 research outputs found
Auxiliary Learning as an Asymmetric Bargaining Game
Auxiliary learning is an effective method for enhancing the generalization
capabilities of trained models, particularly when dealing with small datasets.
However, this approach may present several difficulties: (i) optimizing
multiple objectives can be more challenging, and (ii) how to balance the
auxiliary tasks to best assist the main task is unclear. In this work, we
propose a novel approach, named AuxiNash, for balancing tasks in auxiliary
learning by formalizing the problem as generalized bargaining game with
asymmetric task bargaining power. Furthermore, we describe an efficient
procedure for learning the bargaining power of tasks based on their
contribution to the performance of the main task and derive theoretical
guarantees for its convergence. Finally, we evaluate AuxiNash on multiple
multi-task benchmarks and find that it consistently outperforms competing
methods.Comment: ICML 202
Equivariant Architectures for Learning in Deep Weight Spaces
Designing machine learning architectures for processing neural networks in
their raw weight matrix form is a newly introduced research direction.
Unfortunately, the unique symmetry structure of deep weight spaces makes this
design very challenging. If successful, such architectures would be capable of
performing a wide range of intriguing tasks, from adapting a pre-trained
network to a new domain to editing objects represented as functions (INRs or
NeRFs). As a first step towards this goal, we present here a novel network
architecture for learning in deep weight spaces. It takes as input a
concatenation of weights and biases of a pre-trained MLP and processes it using
a composition of layers that are equivariant to the natural permutation
symmetry of the MLP's weights: Changing the order of neurons in intermediate
layers of the MLP does not affect the function it represents. We provide a full
characterization of all affine equivariant and invariant layers for these
symmetries and show how these layers can be implemented using three basic
operations: pooling, broadcasting, and fully connected layers applied to the
input in an appropriate manner. We demonstrate the effectiveness of our
architecture and its advantages over natural baselines in a variety of learning
tasks.Comment: ICML 202
DisCLIP: Open-Vocabulary Referring Expression Generation
Referring Expressions Generation (REG) aims to produce textual descriptions
that unambiguously identifies specific objects within a visual scene.
Traditionally, this has been achieved through supervised learning methods,
which perform well on specific data distributions but often struggle to
generalize to new images and concepts. To address this issue, we present a
novel approach for REG, named DisCLIP, short for discriminative CLIP. We build
on CLIP, a large-scale visual-semantic model, to guide an LLM to generate a
contextual description of a target concept in an image while avoiding other
distracting concepts. Notably, this optimization happens at inference time and
does not require additional training or tuning of learned parameters. We
measure the quality of the generated text by evaluating the capability of a
receiver model to accurately identify the described object within the scene. To
achieve this, we use a frozen zero-shot comprehension module as a critique of
our generated referring expressions. We evaluate DisCLIP on multiple referring
expression benchmarks through human evaluation and show that it significantly
outperforms previous methods on out-of-domain datasets. Our results highlight
the potential of using pre-trained visual-semantic models for generating
high-quality contextual descriptions
Multi-Task Learning as a Bargaining Game
In Multi-task learning (MTL), a joint model is trained to simultaneously make
predictions for several tasks. Joint training reduces computation costs and
improves data efficiency; however, since the gradients of these different tasks
may conflict, training a joint model for MTL often yields lower performance
than its corresponding single-task counterparts. A common method for
alleviating this issue is to combine per-task gradients into a joint update
direction using a particular heuristic. In this paper, we propose viewing the
gradients combination step as a bargaining game, where tasks negotiate to reach
an agreement on a joint direction of parameter update. Under certain
assumptions, the bargaining problem has a unique solution, known as the Nash
Bargaining Solution, which we propose to use as a principled approach to
multi-task learning. We describe a new MTL optimization procedure, Nash-MTL,
and derive theoretical guarantees for its convergence. Empirically, we show
that Nash-MTL achieves state-of-the-art results on multiple MTL benchmarks in
various domains.Comment: ICML 202