9 research outputs found
Relating Regularization and Generalization through the Intrinsic Dimension of Activations
Given a pair of models with similar training set performance, it is natural
to assume that the model that possesses simpler internal representations would
exhibit better generalization. In this work, we provide empirical evidence for
this intuition through an analysis of the intrinsic dimension (ID) of model
activations, which can be thought of as the minimal number of factors of
variation in the model's representation of the data. First, we show that common
regularization techniques uniformly decrease the last-layer ID (LLID) of
validation set activations for image classification models and show how this
strongly affects generalization performance. We also investigate how excessive
regularization decreases a model's ability to extract features from data in
earlier layers, leading to a negative effect on validation accuracy even while
LLID continues to decrease and training accuracy remains near-perfect. Finally,
we examine the LLID over the course of training of models that exhibit
grokking. We observe that well after training accuracy saturates, when models
``grok'' and validation accuracy suddenly improves from random to perfect,
there is a co-occurent sudden drop in LLID, thus providing more insight into
the dynamics of sudden generalization.Comment: NeurIPS 2022 OPT and HITY workshop
CaloMan: Fast generation of calorimeter showers with density estimation on learned manifolds
Precision measurements and new physics searches at the Large Hadron Collider
require efficient simulations of particle propagation and interactions within
the detectors. The most computationally expensive simulations involve
calorimeter showers. Advances in deep generative modelling - particularly in
the realm of high-dimensional data - have opened the possibility of generating
realistic calorimeter showers orders of magnitude more quickly than
physics-based simulation. However, the high-dimensional representation of
showers belies the relative simplicity and structure of the underlying physical
laws. This phenomenon is yet another example of the manifold hypothesis from
machine learning, which states that high-dimensional data is supported on
low-dimensional manifolds. We thus propose modelling calorimeter showers first
by learning their manifold structure, and then estimating the density of data
across this manifold. Learning manifold structure reduces the dimensionality of
the data, which enables fast training and generation when compared with
competing methods.Comment: Accepted to the Machine Learning and the Physical Sciences Workshop
at NeurIPS 202
The Union of Manifolds Hypothesis and its Implications for Deep Generative Modelling
Deep learning has had tremendous success at learning low-dimensional
representations of high-dimensional data. This success would be impossible if
there was no hidden low-dimensional structure in data of interest; this
existence is posited by the manifold hypothesis, which states that the data
lies on an unknown manifold of low intrinsic dimension. In this paper, we argue
that this hypothesis does not properly capture the low-dimensional structure
typically present in data. Assuming the data lies on a single manifold implies
intrinsic dimension is identical across the entire data space, and does not
allow for subregions of this space to have a different number of factors of
variation. To address this deficiency, we put forth the union of manifolds
hypothesis, which accommodates the existence of non-constant intrinsic
dimensions. We empirically verify this hypothesis on commonly-used image
datasets, finding that indeed, intrinsic dimension should be allowed to vary.
We also show that classes with higher intrinsic dimensions are harder to
classify, and how this insight can be used to improve classification accuracy.
We then turn our attention to the impact of this hypothesis in the context of
deep generative models (DGMs). Most current DGMs struggle to model datasets
with several connected components and/or varying intrinsic dimensions. To
tackle these shortcomings, we propose clustered DGMs, where we first cluster
the data and then train a DGM on each cluster. We show that clustered DGMs can
model multiple connected components with different intrinsic dimensions, and
empirically outperform their non-clustered counterparts without increasing
computational requirements
Edoxaban versus warfarin in patients with atrial fibrillation
Contains fulltext :
125374.pdf (publisher's version ) (Open Access)BACKGROUND: Edoxaban is a direct oral factor Xa inhibitor with proven antithrombotic effects. The long-term efficacy and safety of edoxaban as compared with warfarin in patients with atrial fibrillation is not known. METHODS: We conducted a randomized, double-blind, double-dummy trial comparing two once-daily regimens of edoxaban with warfarin in 21,105 patients with moderate-to-high-risk atrial fibrillation (median follow-up, 2.8 years). The primary efficacy end point was stroke or systemic embolism. Each edoxaban regimen was tested for noninferiority to warfarin during the treatment period. The principal safety end point was major bleeding. RESULTS: The annualized rate of the primary end point during treatment was 1.50% with warfarin (median time in the therapeutic range, 68.4%), as compared with 1.18% with high-dose edoxaban (hazard ratio, 0.79; 97.5% confidence interval [CI], 0.63 to 0.99; P<0.001 for noninferiority) and 1.61% with low-dose edoxaban (hazard ratio, 1.07; 97.5% CI, 0.87 to 1.31; P=0.005 for noninferiority). In the intention-to-treat analysis, there was a trend favoring high-dose edoxaban versus warfarin (hazard ratio, 0.87; 97.5% CI, 0.73 to 1.04; P=0.08) and an unfavorable trend with low-dose edoxaban versus warfarin (hazard ratio, 1.13; 97.5% CI, 0.96 to 1.34; P=0.10). The annualized rate of major bleeding was 3.43% with warfarin versus 2.75% with high-dose edoxaban (hazard ratio, 0.80; 95% CI, 0.71 to 0.91; P<0.001) and 1.61% with low-dose edoxaban (hazard ratio, 0.47; 95% CI, 0.41 to 0.55; P<0.001). The corresponding annualized rates of death from cardiovascular causes were 3.17% versus 2.74% (hazard ratio, 0.86; 95% CI, 0.77 to 0.97; P=0.01), and 2.71% (hazard ratio, 0.85; 95% CI, 0.76 to 0.96; P=0.008), and the corresponding rates of the key secondary end point (a composite of stroke, systemic embolism, or death from cardiovascular causes) were 4.43% versus 3.85% (hazard ratio, 0.87; 95% CI, 0.78 to 0.96; P=0.005), and 4.23% (hazard ratio, 0.95; 95% CI, 0.86 to 1.05; P=0.32). CONCLUSIONS: Both once-daily regimens of edoxaban were noninferior to warfarin with respect to the prevention of stroke or systemic embolism and were associated with significantly lower rates of bleeding and death from cardiovascular causes. (Funded by Daiichi Sankyo Pharma Development; ENGAGE AF-TIMI 48 ClinicalTrials.gov number, NCT00781391.)