3 research outputs found

    Language Grounded QFormer for Efficient Vision Language Understanding

    Full text link
    Large-scale pretraining and instruction tuning have been successful for training general-purpose language models with broad competencies. However, extending to general-purpose vision-language models is challenging due to the distributional diversity in visual inputs. A recent line of work explores vision-language instruction tuning, taking inspiration from the Query Transformer (QFormer) approach proposed in BLIP-2 models for bridging frozen modalities. However, these approaches rely heavily on large-scale multi-modal pretraining for representation learning before eventual finetuning, incurring a huge computational overhead, poor scaling, and limited accessibility. To that end, we propose a more efficient method for QFormer-based vision-language alignment and demonstrate the effectiveness of our strategy compared to existing baselines in improving the efficiency of vision-language pretraining.Comment: Preprint Under Revie

    Room-temperature ferromagnetism in graphitic petal arrays

    No full text
    We report room-temperature ferromagnetism of graphitic petal arrays grown on Si substrates by microwave plasma chemical vapor deposition without catalyst. The samples have been characterized by Raman and X-ray photoelectron spectroscopy to confirm the absence of possible ferromagnetic impurities. The petals exhibit ferromagnetic hysteresis with saturation magnetization of similar to 4.67 emu cm(-3) and coercivity of similar to 105 Oe at 300 K, comparable to the reported behavior of few-layer graphene. Upon O(2) annealing the saturation magnetization and coercivity decreased to 2.1 emu cm(-3) and similar to 75 Oe respectively. The origin of ferromagnetism is believed to arise from the edge defects and vacancies in the petals

    Octree Representation Improves Data Fidelity of Cardiac CT Images and Convolutional Neural Network Semantic Segmentation of Left Atrial and Ventricular Chambers

    No full text
    PurposeTo assess whether octree representation and octree-based convolutional neural networks (CNNs) improve segmentation accuracy of three-dimensional images.Materials and methodsCardiac CT angiographic examinations from 100 patients (mean age, 67 years ± 17 [standard deviation]; 60 men) performed between June 2012 and June 2018 with semantic segmentations of the left ventricular (LV) and left atrial (LA) blood pools at the end-diastolic and end-systolic cardiac phases were retrospectively evaluated. Image quality (root mean square error [RMSE]) and segmentation fidelity (global Dice and border Dice coefficients) metrics of the octree representation were compared with spatial downsampling for a range of memory footprints. Fivefold cross-validation was used to train an octree-based CNN and CNNs with spatial downsampling at four levels of image compression or spatial downsampling. The semantic segmentation performance of octree-based CNN (OctNet) was compared with the performance of U-Nets with spatial downsampling.ResultsOctrees provided high image and segmentation fidelity (median RMSE, 1.34 HU; LV Dice coefficient, 0.970; LV border Dice coefficient, 0.843) with a reduced memory footprint (87.5% reduction). Spatial downsampling to the same memory footprint had lower data fidelity (median RMSE, 12.96 HU; LV Dice coefficient, 0.852; LV border Dice coefficient, 0.310). OctNet segmentation improved the border segmentation Dice coefficient (LV, 0.612; LA, 0.636) compared with the highest performance among U-Nets with spatial downsampling (Dice coefficients: LV, 0.579; LA, 0.592).ConclusionOctree-based representations can reduce the memory footprint and improve segmentation border accuracy.Keywords CT, Cardiac, Segmentation, Supervised Learning, Convolutional Neural Network (CNN), Deep Learning Algorithms, Machine Learning Algorithms© RSNA, 2021
    corecore