28 research outputs found
Physarum Powered Differentiable Linear Programming Layers and Applications
Consider a learning algorithm, which involves an internal call to an
optimization routine such as a generalized eigenvalue problem, a cone
programming problem or even sorting. Integrating such a method as layers within
a trainable deep network in a numerically stable way is not simple -- for
instance, only recently, strategies have emerged for eigendecomposition and
differentiable sorting. We propose an efficient and differentiable solver for
general linear programming problems which can be used in a plug and play manner
within deep neural networks as a layer. Our development is inspired by a
fascinating but not widely used link between dynamics of slime mold (physarum)
and mathematical optimization schemes such as steepest descent. We describe our
development and demonstrate the use of our solver in a video object
segmentation task and meta-learning for few-shot learning. We review the
relevant known results and provide a technical analysis describing its
applicability for our use cases. Our solver performs comparably with a
customized projected gradient descent method on the first task and outperforms
the very recently proposed differentiable CVXPY solver on the second task.
Experiments show that our solver converges quickly without the need for a
feasible initial point. Interestingly, our scheme is easy to implement and can
easily serve as layers whenever a learning procedure needs a fast approximate
solution to a LP, within a larger network
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
Often, deep network models are purely inductive during training and while
performing inference on unseen data. Thus, when such models are used for
predictions, it is well known that they often fail to capture the semantic
information and implicit dependencies that exist among objects (or concepts) on
a population level. Moreover, it is still unclear how domain or prior modal
knowledge can be specified in a backpropagation friendly manner, especially in
large-scale and noisy settings. In this work, we propose an end-to-end vision
and language model incorporating explicit knowledge graphs. We also introduce
an interactive out-of-distribution (OOD) layer using implicit network operator.
The layer is used to filter noise that is brought by external knowledge base.
In practice, we apply our model on several vision and language downstream tasks
including visual question answering, visual reasoning, and image-text retrieval
on different datasets. Our experiments show that it is possible to design
models that perform similarly to state-of-art results but with significantly
fewer samples and training time
Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization
Modern ML applications increasingly rely on complex deep learning models and
large datasets. There has been an exponential growth in the amount of
computation needed to train the largest models. Therefore, to scale computation
and data, these models are inevitably trained in a distributed manner in
clusters of nodes, and their updates are aggregated before being applied to the
model. However, a distributed setup is prone to Byzantine failures of
individual nodes, components, and software. With data augmentation added to
these settings, there is a critical need for robust and efficient aggregation
systems. We define the quality of workers as reconstruction ratios ,
and formulate aggregation as a Maximum Likelihood Estimation procedure using
Beta densities. We show that the Regularized form of log-likelihood wrt
subspace can be approximately solved using iterative least squares solver, and
provide convergence guarantees using recent Convex Optimization landscape
results. Our empirical findings demonstrate that our approach significantly
enhances the robustness of state-of-the-art Byzantine resilient aggregators. We
evaluate our method in a distributed setup with a parameter server, and show
simultaneous improvements in communication efficiency and accuracy across
various tasks. The code is publicly available at
https://github.com/hamidralmasi/FlagAggregato
Accelerated Neural Network Training with Rooted Logistic Objectives
Many neural networks deployed in the real world scenarios are trained using
cross entropy based loss functions. From the optimization perspective, it is
known that the behavior of first order methods such as gradient descent
crucially depend on the separability of datasets. In fact, even in the most
simplest case of binary classification, the rate of convergence depends on two
factors: (1) condition number of data matrix, and (2) separability of the
dataset. With no further pre-processing techniques such as
over-parametrization, data augmentation etc., separability is an intrinsic
quantity of the data distribution under consideration. We focus on the
landscape design of the logistic function and derive a novel sequence of {\em
strictly} convex functions that are at least as strict as logistic loss. The
minimizers of these functions coincide with those of the minimum norm solution
wherever possible. The strict convexity of the derived function can be extended
to finetune state-of-the-art models and applications. In empirical experimental
analysis, we apply our proposed rooted logistic objective to multiple deep
models, e.g., fully-connected neural networks and transformers, on various of
classification benchmarks. Our results illustrate that training with rooted
loss function is converged faster and gains performance improvements.
Furthermore, we illustrate applications of our novel rooted loss function in
generative modeling based downstream applications, such as finetuning StyleGAN
model with the rooted loss. The code implementing our losses and models can be
found here for open source software development purposes:
https://anonymous.4open.science/r/rooted_loss
Optimizing Nondecomposable Data Dependent Regularizers via Lagrangian Reparameterization offers Significant Performance and Efficiency Gains
Data dependent regularization is known to benefit a wide variety of problems
in machine learning. Often, these regularizers cannot be easily decomposed into
a sum over a finite number of terms, e.g., a sum over individual example-wise
terms. The measure, Area under the ROC curve (AUCROC) and Precision
at a fixed recall (P@R) are some prominent examples that are used in many
applications. We find that for most medium to large sized datasets, scalability
issues severely limit our ability in leveraging the benefits of such
regularizers. Importantly, the key technical impediment despite some recent
progress is that, such objectives remain difficult to optimize via
backpropapagation procedures. While an efficient general-purpose strategy for
this problem still remains elusive, in this paper, we show that for many
data-dependent nondecomposable regularizers that are relevant in applications,
sizable gains in efficiency are possible with minimal code-level changes; in
other words, no specialized tools or numerical schemes are needed. Our
procedure involves a reparameterization followed by a partial dualization --
this leads to a formulation that has provably cheap projection operators. We
present a detailed analysis of runtime and convergence properties of our
algorithm. On the experimental side, we show that a direct use of our scheme
significantly improves the state of the art IOU measures reported for MSCOCO
Stuff segmentation dataset