786 research outputs found
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Scaling laws have been recently employed to derive compute-optimal model size
(number of parameters) for a given compute duration. We advance and refine such
methods to infer compute-optimal model shapes, such as width and depth, and
successfully implement this in vision transformers. Our shape-optimized vision
transformer, SoViT, achieves results competitive with models that exceed twice
its size, despite being pre-trained with an equivalent amount of compute. For
example, SoViT-400m/14 achieves 90.3% fine-tuning accuracy on ILSRCV2012,
surpassing the much larger ViT-g/14 and approaching ViT-G/14 under identical
settings, with also less than half the inference cost. We conduct a thorough
evaluation across multiple tasks, such as image classification, captioning, VQA
and zero-shot transfer, demonstrating the effectiveness of our model across a
broad range of domains and identifying limitations. Overall, our findings
challenge the prevailing approach of blindly scaling up vision models and pave
a path for a more informed scaling.Comment: 10 pages, 7 figures, 9 tables. Version 2: Layout fixe
Transient receptor potential melastatin 7 cation channel kinase new player in angiotensin II–induced hypertension
Transient receptor potential melastatin 7 (TRPM7) is a bifunctional protein comprising a magnesium (Mg2+)/cation channel and a kinase domain. We previously demonstrated that vasoactive agents regulate vascular TRPM7. Whether TRPM7 plays a role in the pathophysiology of hypertension and associated cardiovascular dysfunction is unknown. We studied TRPM7 kinase–deficient mice (TRPM7Δkinase; heterozygous for TRPM7 kinase) and wild-type (WT) mice infused with angiotensin II (Ang II; 400 ng/kg per minute, 4 weeks). TRPM7 kinase expression was lower in heart and aorta from TRPM7Δkinase versus WT mice, effects that were further reduced by Ang II infusion. Plasma Mg2+ was lower in TRPM7Δkinase versus WT mice in basal and stimulated conditions. Ang II increased blood pressure in both strains with exaggerated responses in TRPM7Δkinase versus WT groups (P<0.05). Acetylcholine-induced vasorelaxation was reduced in Ang II–infused TRPM7Δkinase mice, an effect associated with Akt and endothelial nitric oxide synthase downregulation. Vascular cell adhesion molecule–1 expression was increased in Ang II–infused TRPM7 kinase–deficient mice. TRPM7 kinase targets, calpain, and annexin-1, were activated by Ang II in WT but not in TRPM7Δkinase mice. Echocardiographic and histopathologic analysis demonstrated cardiac hypertrophy and left ventricular dysfunction in Ang II–treated groups. In TRPM7 kinase–deficient mice, Ang II–induced cardiac functional and structural effects were amplified compared with WT counterparts. Our data demonstrate that in TRPM7Δkinase mice, Ang II–induced hypertension is exaggerated, cardiac remodeling and left ventricular dysfunction are amplified, and endothelial function is impaired. These processes are associated with hypomagnesemia, blunted TRPM7 kinase expression/signaling, endothelial nitric oxide synthase downregulation, and proinflammatory vascular responses. Our findings identify TRPM7 kinase as a novel player in Ang II–induced hypertension and associated vascular and target organ damage
Tuning computer vision models with task rewards
Misalignment between model predictions and intended usage can be detrimental
for the deployment of computer vision models. The issue is exacerbated when the
task involves complex structured outputs, as it becomes harder to design
procedures which address this misalignment. In natural language processing,
this is often addressed using reinforcement learning techniques that align
models with a task reward. We adopt this approach and show its surprising
effectiveness across multiple computer vision tasks, such as object detection,
panoptic segmentation, colorization and image captioning. We believe this
approach has the potential to be widely useful for better aligning models with
a diverse range of computer vision tasks.Comment: 11 page
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Vision Transformers (ViT) have been shown to attain highly competitive
performance for a wide range of vision applications, such as image
classification, object detection and semantic image segmentation. In comparison
to convolutional neural networks, the Vision Transformer's weaker inductive
bias is generally found to cause an increased reliance on model regularization
or data augmentation (``AugReg'' for short) when training on smaller training
datasets. We conduct a systematic empirical study in order to better understand
the interplay between the amount of training data, AugReg, model size and
compute budget. As one result of this study we find that the combination of
increased compute and AugReg can yield models with the same performance as
models trained on an order of magnitude more training data: we train ViT models
of various sizes on the public ImageNet-21k dataset which either match or
outperform their counterparts trained on the larger, but not publicly available
JFT-300M dataset.Comment: Andreas, Alex, Xiaohua and Lucas contributed equally. We release more
than 50'000 ViT models trained under diverse settings on various datasets. We
believe this to be a treasure trove for model analysis. Available at
https://github.com/google-research/vision_transformer and
https://github.com/rwightman/pytorch-image-model
Direct Observation of Dynamic Symmetry Breaking above Room Temperature in Methylammonium Lead Iodide Perovskite
Lead halide perovskites such as methylammonium lead triiodide (MAPI) have
outstanding optical and electronic properties for photovoltaic applications,
yet a full understanding of how this solution processable material works so
well is currently missing. Previous research has revealed that MAPI possesses
multiple forms of static disorder regardless of preparation method, which is
surprising in light of its excellent performance. Using high energy resolution
inelastic X-ray (HERIX) scattering, we measure phonon dispersions in MAPI and
find direct evidence for another form of disorder in single crystals: large
amplitude anharmonic zone-edge rotational instabilities of the PbI_6 octahedra
that persist to room temperature and above, left over from structural phase
transitions that take place tens to hundreds of degrees below. Phonon
calculations show that the orientations of the methylammonium couple strongly
and cooperatively to these modes. The result is a non-centrosymmetric,
instantaneous local structure, which we observe in atomic pair distribution
function (PDF) measurements. This local symmetry breaking is unobservable by
Bragg diffraction, but can explain key material properties such as the
structural phase sequence, ultra low thermal transport, and large minority
charge carrier lifetimes despite moderate carrier mobility.Comment: 30 pages, 11 figure
- …