63 research outputs found
Accelerated Neural Network Training with Rooted Logistic Objectives
Many neural networks deployed in the real world scenarios are trained using
cross entropy based loss functions. From the optimization perspective, it is
known that the behavior of first order methods such as gradient descent
crucially depend on the separability of datasets. In fact, even in the most
simplest case of binary classification, the rate of convergence depends on two
factors: (1) condition number of data matrix, and (2) separability of the
dataset. With no further pre-processing techniques such as
over-parametrization, data augmentation etc., separability is an intrinsic
quantity of the data distribution under consideration. We focus on the
landscape design of the logistic function and derive a novel sequence of {\em
strictly} convex functions that are at least as strict as logistic loss. The
minimizers of these functions coincide with those of the minimum norm solution
wherever possible. The strict convexity of the derived function can be extended
to finetune state-of-the-art models and applications. In empirical experimental
analysis, we apply our proposed rooted logistic objective to multiple deep
models, e.g., fully-connected neural networks and transformers, on various of
classification benchmarks. Our results illustrate that training with rooted
loss function is converged faster and gains performance improvements.
Furthermore, we illustrate applications of our novel rooted loss function in
generative modeling based downstream applications, such as finetuning StyleGAN
model with the rooted loss. The code implementing our losses and models can be
found here for open source software development purposes:
https://anonymous.4open.science/r/rooted_loss
Evaluation of colorectal cancer subtypes and cell lines using deep learning
Colorectal cancer (CRC) is a common cancer with a high mortality rate and a rising incidence rate in the developed world. Molecular profiling techniques have been used to better understand the variability between tumors and disease models such as cell lines. To maximize the translatability and clinical relevance of in vitro studies, the selection of optimal cancer models is imperative. We have developed a deep learning-based method to measure the similarity between CRC tumors and disease models such as cancer cell lines. Our method efficiently leverages multiomics data sets containing copy number alterations, gene expression, and point mutations and learns latent factors that describe data in lower dimensions. These latent factors represent the patterns that are clinically relevant and explain the variability of molecular profiles across tumors and cell lines. Using these, we propose refined CRC subtypes and provide best-matching cell lines to different subtypes. These findings are relevant to patient stratification and selection of cell lines for early-stage drug discovery pipelines, biomarker discovery, and target identification
Mechanistic Mode Connectivity
We study neural network loss landscapes through the lens of mode
connectivity, the observation that minimizers of neural networks retrieved via
training on a dataset are connected via simple paths of low loss. Specifically,
we ask the following question: are minimizers that rely on different mechanisms
for making their predictions connected via simple paths of low loss? We provide
a definition of mechanistic similarity as shared invariances to input
transformations and demonstrate that lack of linear connectivity between two
models implies they use dissimilar mechanisms for making their predictions.
Relevant to practice, this result helps us demonstrate that naive fine-tuning
on a downstream dataset can fail to alter a model's mechanisms, e.g.,
fine-tuning can fail to eliminate a model's reliance on spurious attributes.
Our analysis also motivates a method for targeted alteration of a model's
mechanisms, named connectivity-based fine-tuning (CBFT), which we analyze using
several synthetic datasets for the task of reducing a model's reliance on
spurious attributes.Comment: Accepted at ICML, 202
- …