1,935 research outputs found
Does Invariant Graph Learning via Environment Augmentation Learn Invariance?
Invariant graph representation learning aims to learn the invariance among
data from different environments for out-of-distribution generalization on
graphs. As the graph environment partitions are usually expensive to obtain,
augmenting the environment information has become the de facto approach.
However, the usefulness of the augmented environment information has never been
verified. In this work, we find that it is fundamentally impossible to learn
invariant graph representations via environment augmentation without additional
assumptions. Therefore, we develop a set of minimal assumptions, including
variation sufficiency and variation consistency, for feasible invariant graph
learning. We then propose a new framework Graph invAriant Learning Assistant
(GALA). GALA incorporates an assistant model that needs to be sensitive to
graph environment changes or distribution shifts. The correctness of the proxy
predictions by the assistant model hence can differentiate the variations in
spurious subgraphs. We show that extracting the maximally invariant subgraph to
the proxy predictions provably identifies the underlying invariant subgraph for
successful OOD generalization under the established minimal assumptions.
Extensive experiments on datasets including DrugOOD with various graph
distribution shifts confirm the effectiveness of GALA.Comment: NeurIPS 2023, 34 pages, 35 figure
Positional Information Matters for Invariant In-Context Learning: A Case Study of Simple Function Classes
In-context learning (ICL) refers to the ability of a model to condition on a
few in-context demonstrations (input-output examples of the underlying task) to
generate the answer for a new query input, without updating parameters. Despite
the impressive ICL ability of LLMs, it has also been found that ICL in LLMs is
sensitive to input demonstrations and limited to short context lengths. To
understand the limitations and principles for successful ICL, we conduct an
investigation with ICL linear regression of transformers. We characterize
several Out-of-Distribution (OOD) cases for ICL inspired by realistic LLM ICL
failures and compare transformers with DeepSet, a simple yet powerful
architecture for ICL. Surprisingly, DeepSet outperforms transformers across a
variety of distribution shifts, implying that preserving permutation invariance
symmetry to input demonstrations is crucial for OOD ICL. The phenomenon
specifies a fundamental requirement by ICL, which we termed as ICL invariance.
Nevertheless, the positional encodings in LLMs will break ICL invariance. To
this end, we further evaluate transformers with identical positional encodings
and find preserving ICL invariance in transformers achieves state-of-the-art
performance across various ICL distribution shiftsComment: Ongoing work; preliminary versio
FOCUS: Fairness via Agent-Awareness for Federated Learning on Heterogeneous Data
Federated learning (FL) allows agents to jointly train a global model without
sharing their local data. However, due to the heterogeneous nature of local
data, it is challenging to optimize or even define fairness of the trained
global model for the agents. For instance, existing work usually considers
accuracy equity as fairness for different agents in FL, which is limited,
especially under the heterogeneous setting, since it is intuitively "unfair" to
enforce agents with high-quality data to achieve similar accuracy to those who
contribute low-quality data, which may discourage the agents from participating
in FL. In this work, we propose a formal FL fairness definition, fairness via
agent-awareness (FAA), which takes different contributions of heterogeneous
agents into account. Under FAA, the performance of agents with high-quality
data will not be sacrificed just due to the existence of large amounts of
agents with low-quality data. In addition, we propose a fair FL training
algorithm based on agent clustering (FOCUS) to achieve fairness in FL measured
by FAA. Theoretically, we prove the convergence and optimality of FOCUS under
mild conditions for linear and general convex loss functions with bounded
smoothness. We also prove that FOCUS always achieves higher fairness in terms
of FAA compared with standard FedAvg under both linear and general convex loss
functions. Empirically, we show that on four FL datasets, including synthetic
data, images, and texts, FOCUS achieves significantly higher fairness in terms
of FAA while maintaining competitive prediction accuracy compared with FedAvg
and state-of-the-art fair FL algorithms
Transportation dynamics on networks of mobile agents
Most existing works on transportation dynamics focus on networks of a fixed
structure, but networks whose nodes are mobile have become widespread, such as
cell-phone networks. We introduce a model to explore the basic physics of
transportation on mobile networks. Of particular interest are the dependence of
the throughput on the speed of agent movement and communication range. Our
computations reveal a hierarchical dependence for the former while, for the
latter, we find an algebraic power law between the throughput and the
communication range with an exponent determined by the speed. We develop a
physical theory based on the Fokker-Planck equation to explain these phenomena.
Our findings provide insights into complex transportation dynamics arising
commonly in natural and engineering systems
Beyond Unfolding: Exact Recovery of Latent Convex Tensor Decomposition under Reshuffling
Exact recovery of tensor decomposition (TD) methods is a desirable property
in both unsupervised learning and scientific data analysis. The numerical
defects of TD methods, however, limit their practical applications on
real-world data. As an alternative, convex tensor decomposition (CTD) was
proposed to alleviate these problems, but its exact-recovery property is not
properly addressed so far. To this end, we focus on latent convex tensor
decomposition (LCTD), a practically widely-used CTD model, and rigorously prove
a sufficient condition for its exact-recovery property. Furthermore, we show
that such property can be also achieved by a more general model than LCTD. In
the new model, we generalize the classic tensor (un-)folding into reshuffling
operation, a more flexible mapping to relocate the entries of the matrix into a
tensor. Armed with the reshuffling operations and exact-recovery property, we
explore a totally novel application for (generalized) LCTD, i.e., image
steganography. Experimental results on synthetic data validate our theory, and
results on image steganography show that our method outperforms the
state-of-the-art methods.Comment: AAAI-202
Privileged Prior Information Distillation for Image Matting
Performance of trimap-free image matting methods is limited when trying to
decouple the deterministic and undetermined regions, especially in the scenes
where foregrounds are semantically ambiguous, chromaless, or high
transmittance. In this paper, we propose a novel framework named Privileged
Prior Information Distillation for Image Matting (PPID-IM) that can effectively
transfer privileged prior environment-aware information to improve the
performance of students in solving hard foregrounds. The prior information of
trimap regulates only the teacher model during the training stage, while not
being fed into the student network during actual inference. In order to achieve
effective privileged cross-modality (i.e. trimap and RGB) information
distillation, we introduce a Cross-Level Semantic Distillation (CLSD) module
that reinforces the trimap-free students with more knowledgeable semantic
representations and environment-aware information. We also propose an
Attention-Guided Local Distillation module that efficiently transfers
privileged local attributes from the trimap-based teacher to trimap-free
students for the guidance of local-region optimization. Extensive experiments
demonstrate the effectiveness and superiority of our PPID framework on the task
of image matting. In addition, our trimap-free IndexNet-PPID surpasses the
other competing state-of-the-art methods by a large margin, especially in
scenarios with chromaless, weak texture, or irregular objects.Comment: 15 pages, 7 figure
Recommended from our members
Cooperative advertising strategy selection problem for considering pricing and advertising decisions in a two-period online supply chain
This article studies the cooperative advertising problem of a two-period online supply chain consisting of a manufacturer and an online retail platform. The manufacturer provides national advertising in the first period to build the brand image and increase the awareness of the product. And the online retail platform provides platform advertising for selling the product to consumers on its platform during two periods. The manufacturer and the online retail platform may choose different cooperative advertising strategies for national advertising and platform advertising, which are one-way subsidy strategy, two-way subsidy strategy, and revenue-share strategy. We formulate a Stackelberg game model to study the cooperative advertising problem by taking price and advertising effect into account and analyze how the profit is influenced in different cooperative advertising strategies. We find that under the revenue-share strategy, the manufacturer provides a higher subsidy rate for the online retail platform advertising than that in other cooperative advertising strategies. Interestingly, there are conditions where, while just the manufacturer contributes a percentage of the platform advertising and the online retail platform has no effort on the national advertising, the total profit would be better than that in revenue-share strategy even in revenue-share strategy, the cooperative relationship is closer between the manufacture and the online retail platform
Multi-dark-state resonances in cold multi-Zeeman-sublevel atoms
We present our experimental and theoretical studies of multi-dark-state
resonances (MDSRs) generated in a unique cold rubidium atomic system with only
one coupling laser beam. Such MDSRs are caused by different transition
strengths of the strong coupling beam connecting different Zeeman sublevels.
Controlling the transparency windows in such electromagnetically induced
transparency system can have potential applications in multi-wavelength optical
communication and quantum information processing.Comment: 11pages, 4figure
- …