27 research outputs found

    InceptionNeXt: When Inception Meets ConvNeXt

    Full text link
    Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7x7 depthwise convolution. Although such depthwise operator only consumes a few FLOPs, it largely harms the model efficiency on powerful computing devices due to the high memory access costs. For example, ConvNeXt-T has similar FLOPs with ResNet-50 but only achieves 60% throughputs when trained on A100 GPUs with full precision. Although reducing the kernel size of ConvNeXt can improve speed, it results in significant performance degradation. It is still unclear how to speed up large-kernel-based CNN models while preserving their performance. To tackle this issue, inspired by Inceptions, we propose to decompose large-kernel depthwise convolution into four parallel branches along channel dimension, i.e. small square kernel, two orthogonal band kernels, and an identity mapping. With this new Inception depthwise convolution, we build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance. For instance, InceptionNeXt-T achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K. We anticipate InceptionNeXt can serve as an economical baseline for future architecture design to reduce carbon footprint. Code is available at https://github.com/sail-sg/inceptionnext.Comment: Code: https://github.com/sail-sg/inceptionnex

    MetaFormer Is Actually What You Need for Vision

    Full text link
    Transformers have shown great potential in computer vision tasks. A common belief is their attention-based token mixer module contributes most to their competence. However, recent works show the attention-based module in Transformers can be replaced by spatial MLPs and the resulted models still perform quite well. Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance. To verify this, we deliberately replace the attention module in Transformers with an embarrassingly simple spatial pooling operator to conduct only basic token mixing. Surprisingly, we observe that the derived model, termed as PoolFormer, achieves competitive performance on multiple computer vision tasks. For example, on ImageNet-1K, PoolFormer achieves 82.1% top-1 accuracy, surpassing well-tuned Vision Transformer/MLP-like baselines DeiT-B/ResMLP-B24 by 0.3%/1.1% accuracy with 35%/52% fewer parameters and 50%/62% fewer MACs. The effectiveness of PoolFormer verifies our hypothesis and urges us to initiate the concept of "MetaFormer", a general architecture abstracted from Transformers without specifying the token mixer. Based on the extensive experiments, we argue that MetaFormer is the key player in achieving superior results for recent Transformer and MLP-like models on vision tasks. This work calls for more future research dedicated to improving MetaFormer instead of focusing on the token mixer modules. Additionally, our proposed PoolFormer could serve as a starting baseline for future MetaFormer architecture design. Code is available at https://github.com/sail-sg/poolformer.Comment: CVPR 2022 (Oral). Code: https://github.com/sail-sg/poolforme

    MYC activation cooperates with Vhl and Ink4a/Arf loss to induce clear cell renal cell carcinoma

    Get PDF
    Renal carcinoma is a common and aggressive malignancy whose histopathogenesis is incompletely understood and that is largely resistant to cytotoxic chemotherapy. We present two mouse models of kidney cancer that recapitulate the genomic alterations found in human papillary (pRCC) and clear cell RCC (ccRCC), the most common RCC subtypes. MYC activation results in highly penetrant pRCC tumours (MYC), while MYC activation, when combined with Vhl and Cdkn2a (Ink4a/Arf) deletion (VIM), produce kidney tumours that approximate human ccRCC. RNAseq of the mouse tumours demonstrate that MYC tumours resemble Type 2 pRCC, which are known to harbour MYC activation. Furthermore, VIM tumours more closely simulate human ccRCC. Based on their high penetrance, short latency, and histologic fidelity, these models of papillary and clear cell RCC should be significant contributions to the field of kidney cancer research

    Inception Transformer

    Full text link
    Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information. To tackle this issue, we present a novel and general-purpose Inception Transformer, or iFormer for short, that effectively learns comprehensive features with both high- and low-frequency information in visual data. Specifically, we design an Inception mixer to explicitly graft the advantages of convolution and max-pooling for capturing the high-frequency information to Transformers. Different from recent hybrid frameworks, the Inception mixer brings greater efficiency through a channel splitting mechanism to adopt parallel convolution/max-pooling path and self-attention path as high- and low-frequency mixers, while having the flexibility to model discriminative information scattered within a wide frequency range. Considering that bottom layers play more roles in capturing high-frequency details while top layers more in modeling low-frequency global information, we further introduce a frequency ramp structure, i.e. gradually decreasing the dimensions fed to the high-frequency mixer and increasing those to the low-frequency mixer, which can effectively trade-off high- and low-frequency components across different layers. We benchmark the iFormer on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection and ADE20K segmentation. For example, our iFormer-S hits the top-1 accuracy of 83.4% on ImageNet-1K, much higher than DeiT-S by 3.6%, and even slightly better than much bigger model Swin-B (83.3%) with only 1/4 parameters and 1/3 FLOPs. Code and models will be released at https://github.com/sail-sg/iFormer.Comment: Code and models will be released at https://github.com/sail-sg/iForme

    Asynchronous mixing of kidney progenitor cells potentiates nephrogenesis in organoids

    No full text
    A fundamental challenge in emulating kidney tissue formation through directed differentiation of human pluripotent stem cells is that kidney development is iterative, and to reproduce the asynchronous mix of differentiation states found in the fetal kidney we combined cells differentiated at different times in the same organoid. Asynchronous mixing promoted nephrogenesis, and heterochronic organoids were well vascularized when engrafted under the kidney capsule. Micro-CT and injection of a circulating vascular marker demonstrated that engrafted kidney tissue was connected to the systemic circulation by 2 weeks after engraftment. Proximal tubule glucose uptake was confirmed, but despite these promising measures of graft function, overgrowth of stromal cells prevented long-term study. We propose that this is a technical feature of the engraftment procedure rather than a specific shortcoming of the directed differentiation because kidney organoids derived from primary cells and whole embryonic kidneys develop similar stromal overgrowth when engrafted under the kidney capsule

    A Performance Evaluation of Local Features for Image-Based 3D Reconstruction

    No full text

    Aerodynamik eines stumpfen Kegels in reagierender Ueberschallstroemung

    No full text
    Aerodynamics of a blunted cone in reacting hypersonic flows. The flow of a planetary probe under hypervelocity re-entry conditions has two idiosyncrasies not present in (cold) hypersonic flows of conventional test facilities: the strong dissociation reactions occurring behind the bow shock wave and the freezing of the chemical reactions of the flow by the rapid expansion at the shoulder of the probe. The aims of the present study was to understand the relative importance of the two phenomena upon the total heat and pressure loads on a planetary probe and its possible payload. For the experimental study an instrumented blunted 140 cone was tested in the High Enthalpy Schock Tunnel (HEG) of the DLR in Goettingen. Numerical calculations were performed with a Thin-Layer Navier-Stokes code which is capable of simulating chemical and thermal nonequilibrium flows. The density contours of the flowfield by the holographic interferometer demonstrate the fast chemical freezing of the flow at the shoulder, nevertheless the extension of the non-equilibrium region behind the bow shock wave was overpredicted by the numerical calculations. For the forebody loads the prediction methods were reliable, whereas in the wake of the model considerable discrepancies between experimental and numerical results have been observed. (orig.)102 figs., 21 tabs., 146 refs.Available from FIZ Karlsruhe / FIZ - Fachinformationszzentrum Karlsruhe / TIB - Technische InformationsbibliothekSIGLEDEGerman
    corecore