172 research outputs found
On the Finite-Time Complexity and Practical Computation of Approximate Stationarity Concepts of Lipschitz Functions
We report a practical finite-time algorithmic scheme to compute approximately
stationary points for nonconvex nonsmooth Lipschitz functions. In particular,
we are interested in two kinds of approximate stationarity notions for
nonconvex nonsmooth problems, i.e., Goldstein approximate stationarity (GAS)
and near-approximate stationarity (NAS). For GAS, our scheme removes the
unrealistic subgradient selection oracle assumption in (Zhang et al., 2020,
Assumption 1) and computes GAS with the same finite-time complexity. For NAS,
Davis & Drusvyatskiy (2019) showed that -weakly convex functions admit
finite-time computation, while Tian & So (2021) provided the matching
impossibility results of dimension-free finite-time complexity for first-order
methods. Complement to these developments, in this paper, we isolate a new
class of functions that could be Clarke irregular (and thus not weakly convex
anymore) and show that our new algorithmic scheme can compute NAS points for
functions in that class within finite time. To demonstrate the wide
applicability of our new theoretical framework, we show that -margin SVM,
-layer, and -layer ReLU neural networks, all being Clarke irregular,
satisfy our new conditions.Comment: 20 pages, 3 figures, ICML 202
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
In our work, we explore the synergistic capabilities of pre-trained
vision-and-language models (VLMs) and large language models (LLMs) for visual
commonsense reasoning (VCR). We categorize the problem of VCR into visual
commonsense understanding (VCU) and visual commonsense inference (VCI). For
VCU, which involves perceiving the literal visual content, pre-trained VLMs
exhibit strong cross-dataset generalization. On the other hand, in VCI, where
the goal is to infer conclusions beyond image content, VLMs face difficulties.
We find that a baseline where VLMs provide perception results (image captions)
to LLMs leads to improved performance on VCI. However, we identify a challenge
with VLMs' passive perception, which often misses crucial context information,
leading to incorrect or uncertain reasoning by LLMs. To mitigate this issue, we
suggest a collaborative approach where LLMs, when uncertain about their
reasoning, actively direct VLMs to concentrate on and gather relevant visual
elements to support potential commonsense inferences. In our method, named
ViCor, pre-trained LLMs serve as problem classifiers to analyze the problem
category, VLM commanders to leverage VLMs differently based on the problem
classification, and visual commonsense reasoners to answer the question. VLMs
will perform visual recognition and understanding. We evaluate our framework on
two VCR benchmark datasets and outperform all other methods that do not require
in-domain supervised fine-tuning
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning
While distributional reinforcement learning (RL) has demonstrated empirical
success, the question of when and why it is beneficial has remained unanswered.
In this work, we provide one explanation for the benefits of distributional RL
through the lens of small-loss bounds, which scale with the instance-dependent
optimal cost. If the optimal cost is small, our bounds are stronger than those
from non-distributional approaches. As warmup, we show that learning the cost
distribution leads to small-loss regret bounds in contextual bandits (CB), and
we find that distributional CB empirically outperforms the state-of-the-art on
three challenging tasks. For online RL, we propose a distributional
version-space algorithm that constructs confidence sets using maximum
likelihood estimation, and we prove that it achieves small-loss regret in the
tabular MDPs and enjoys small-loss PAC bounds in latent variable models.
Building on similar insights, we propose a distributional offline RL algorithm
based on the pessimism principle and prove that it enjoys small-loss PAC
bounds, which exhibit a novel robustness property. For both online and offline
RL, our results provide the first theoretical benefits of learning
distributions even when we only need the mean for making decisions
An Adaptive Incremental Gradient Method With Support for Non-Euclidean Norms
Stochastic variance reduced methods have shown strong performance in solving
finite-sum problems. However, these methods usually require the users to
manually tune the step-size, which is time-consuming or even infeasible for
some large-scale optimization tasks. To overcome the problem, we propose and
analyze several novel adaptive variants of the popular SAGA algorithm.
Eventually, we design a variant of Barzilai-Borwein step-size which is tailored
for the incremental gradient method to ensure memory efficiency and fast
convergence. We establish its convergence guarantees under general settings
that allow non-Euclidean norms in the definition of smoothness and the
composite objectives, which cover a broad range of applications in machine
learning. We improve the analysis of SAGA to support non-Euclidean norms, which
fills the void of existing work. Numerical experiments on standard datasets
demonstrate a competitive performance of the proposed algorithm compared with
existing variance-reduced methods and their adaptive variants
Efficient Private SCO for Heavy-Tailed Data via Clipping
We consider stochastic convex optimization for heavy-tailed data with the
guarantee of being differentially private (DP). Prior work on this problem is
restricted to the gradient descent (GD) method, which is inefficient for
large-scale problems. In this paper, we resolve this issue and derive the first
high-probability bounds for the private stochastic method with clipping. For
general convex problems, we derive excess population risks
\Tilde{O}\left(\frac{d^{1/7}\sqrt{\ln\frac{(n \epsilon)^2}{\beta
d}}}{(n\epsilon)^{2/7}}\right) and
\Tilde{O}\left(\frac{d^{1/7}\ln\frac{(n\epsilon)^2}{\beta
d}}{(n\epsilon)^{2/7}}\right) under bounded or unbounded domain assumption,
respectively (here is the sample size, is the dimension of the data,
is the confidence level and is the private level). Then, we
extend our analysis to the strongly convex case and non-smooth case (which
works for generalized smooth objectives with Hlder-continuous
gradients). We establish new excess risk bounds without bounded domain
assumption. The results above achieve lower excess risks and gradient
complexities than existing methods in their corresponding cases. Numerical
experiments are conducted to justify the theoretical improvement
- …