786 research outputs found

    Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

    Full text link
    Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers. Our shape-optimized vision transformer, SoViT, achieves results competitive with models that exceed twice its size, despite being pre-trained with an equivalent amount of compute. For example, SoViT-400m/14 achieves 90.3% fine-tuning accuracy on ILSRCV2012, surpassing the much larger ViT-g/14 and approaching ViT-G/14 under identical settings, with also less than half the inference cost. We conduct a thorough evaluation across multiple tasks, such as image classification, captioning, VQA and zero-shot transfer, demonstrating the effectiveness of our model across a broad range of domains and identifying limitations. Overall, our findings challenge the prevailing approach of blindly scaling up vision models and pave a path for a more informed scaling.Comment: 10 pages, 7 figures, 9 tables. Version 2: Layout fixe

    Transient receptor potential melastatin 7 cation channel kinase new player in angiotensin II–induced hypertension

    Get PDF
    Transient receptor potential melastatin 7 (TRPM7) is a bifunctional protein comprising a magnesium (Mg2+)/cation channel and a kinase domain. We previously demonstrated that vasoactive agents regulate vascular TRPM7. Whether TRPM7 plays a role in the pathophysiology of hypertension and associated cardiovascular dysfunction is unknown. We studied TRPM7 kinase–deficient mice (TRPM7Δkinase; heterozygous for TRPM7 kinase) and wild-type (WT) mice infused with angiotensin II (Ang II; 400 ng/kg per minute, 4 weeks). TRPM7 kinase expression was lower in heart and aorta from TRPM7Δkinase versus WT mice, effects that were further reduced by Ang II infusion. Plasma Mg2+ was lower in TRPM7Δkinase versus WT mice in basal and stimulated conditions. Ang II increased blood pressure in both strains with exaggerated responses in TRPM7Δkinase versus WT groups (P<0.05). Acetylcholine-induced vasorelaxation was reduced in Ang II–infused TRPM7Δkinase mice, an effect associated with Akt and endothelial nitric oxide synthase downregulation. Vascular cell adhesion molecule–1 expression was increased in Ang II–infused TRPM7 kinase–deficient mice. TRPM7 kinase targets, calpain, and annexin-1, were activated by Ang II in WT but not in TRPM7Δkinase mice. Echocardiographic and histopathologic analysis demonstrated cardiac hypertrophy and left ventricular dysfunction in Ang II–treated groups. In TRPM7 kinase–deficient mice, Ang II–induced cardiac functional and structural effects were amplified compared with WT counterparts. Our data demonstrate that in TRPM7Δkinase mice, Ang II–induced hypertension is exaggerated, cardiac remodeling and left ventricular dysfunction are amplified, and endothelial function is impaired. These processes are associated with hypomagnesemia, blunted TRPM7 kinase expression/signaling, endothelial nitric oxide synthase downregulation, and proinflammatory vascular responses. Our findings identify TRPM7 kinase as a novel player in Ang II–induced hypertension and associated vascular and target organ damage

    Tuning computer vision models with task rewards

    Full text link
    Misalignment between model predictions and intended usage can be detrimental for the deployment of computer vision models. The issue is exacerbated when the task involves complex structured outputs, as it becomes harder to design procedures which address this misalignment. In natural language processing, this is often addressed using reinforcement learning techniques that align models with a task reward. We adopt this approach and show its surprising effectiveness across multiple computer vision tasks, such as object detection, panoptic segmentation, colorization and image captioning. We believe this approach has the potential to be widely useful for better aligning models with a diverse range of computer vision tasks.Comment: 11 page

    How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

    Full text link
    Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications, such as image classification, object detection and semantic image segmentation. In comparison to convolutional neural networks, the Vision Transformer's weaker inductive bias is generally found to cause an increased reliance on model regularization or data augmentation (``AugReg'' for short) when training on smaller training datasets. We conduct a systematic empirical study in order to better understand the interplay between the amount of training data, AugReg, model size and compute budget. As one result of this study we find that the combination of increased compute and AugReg can yield models with the same performance as models trained on an order of magnitude more training data: we train ViT models of various sizes on the public ImageNet-21k dataset which either match or outperform their counterparts trained on the larger, but not publicly available JFT-300M dataset.Comment: Andreas, Alex, Xiaohua and Lucas contributed equally. We release more than 50'000 ViT models trained under diverse settings on various datasets. We believe this to be a treasure trove for model analysis. Available at https://github.com/google-research/vision_transformer and https://github.com/rwightman/pytorch-image-model

    Direct Observation of Dynamic Symmetry Breaking above Room Temperature in Methylammonium Lead Iodide Perovskite

    Full text link
    Lead halide perovskites such as methylammonium lead triiodide (MAPI) have outstanding optical and electronic properties for photovoltaic applications, yet a full understanding of how this solution processable material works so well is currently missing. Previous research has revealed that MAPI possesses multiple forms of static disorder regardless of preparation method, which is surprising in light of its excellent performance. Using high energy resolution inelastic X-ray (HERIX) scattering, we measure phonon dispersions in MAPI and find direct evidence for another form of disorder in single crystals: large amplitude anharmonic zone-edge rotational instabilities of the PbI_6 octahedra that persist to room temperature and above, left over from structural phase transitions that take place tens to hundreds of degrees below. Phonon calculations show that the orientations of the methylammonium couple strongly and cooperatively to these modes. The result is a non-centrosymmetric, instantaneous local structure, which we observe in atomic pair distribution function (PDF) measurements. This local symmetry breaking is unobservable by Bragg diffraction, but can explain key material properties such as the structural phase sequence, ultra low thermal transport, and large minority charge carrier lifetimes despite moderate carrier mobility.Comment: 30 pages, 11 figure
    • …
    corecore