Search CORE

3 research outputs found

Language Grounded QFormer for Efficient Vision Language Understanding

Author: Choraria Moulik
Sekhar Nitesh
Singhal Prateek
Varshney Lav R.
Wu Yue
Zhang Xu
Publication venue
Publication date: 13/11/2023
Field of study

Large-scale pretraining and instruction tuning have been successful for training general-purpose language models with broad competencies. However, extending to general-purpose vision-language models is challenging due to the distributional diversity in visual inputs. A recent line of work explores vision-language instruction tuning, taking inspiration from the Query Transformer (QFormer) approach proposed in BLIP-2 models for bridging frozen modalities. However, these approaches rely heavily on large-scale multi-modal pretraining for representation learning before eventual finetuning, incurring a huge computational overhead, poor scaling, and limited accessibility. To that end, we propose a more efficient method for QFormer-based vision-language alignment and demonstrate the effectiveness of our strategy compared to existing baselines in improving the efficiency of vision-language pretraining.Comment: Preprint Under Revie

arXiv.org e-Print Archive

Room-temperature ferromagnetism in graphitic petal arrays

Author: Fisher Timothy S.
Kumar Anurag
Kumar Nitesh
Rout Chandra Sekhar
Sundaresan A.
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2011
Field of study

We report room-temperature ferromagnetism of graphitic petal arrays grown on Si substrates by microwave plasma chemical vapor deposition without catalyst. The samples have been characterized by Raman and X-ray photoelectron spectroscopy to confirm the absence of possible ferromagnetic impurities. The petals exhibit ferromagnetic hysteresis with saturation magnetization of similar to 4.67 emu cm(-3) and coercivity of similar to 105 Oe at 300 K, comparable to the reported behavior of few-layer graphene. Upon O(2) annealing the saturation magnetization and coercivity decreased to 2.1 emu cm(-3) and similar to 75 Oe respectively. The origin of ferromagnetism is believed to arise from the edge defects and vacancies in the petals

Purdue E-Pubs

Octree Representation Improves Data Fidelity of Cardiac CT Images and Convolutional Neural Network Semantic Segmentation of Left Atrial and Ventricular Chambers

Author: Colvert Brendan
Contijoch Francisco J
Craine Amanda
Gupta Kunal
Raghavan Adhithi
Scott Anderson R
Sekhar Nitesh
Vigneault Davis M
Publication venue: eScholarship, University of California
Publication date: 29/09/2021
Field of study

PurposeTo assess whether octree representation and octree-based convolutional neural networks (CNNs) improve segmentation accuracy of three-dimensional images.Materials and methodsCardiac CT angiographic examinations from 100 patients (mean age, 67 years ± 17 [standard deviation]; 60 men) performed between June 2012 and June 2018 with semantic segmentations of the left ventricular (LV) and left atrial (LA) blood pools at the end-diastolic and end-systolic cardiac phases were retrospectively evaluated. Image quality (root mean square error [RMSE]) and segmentation fidelity (global Dice and border Dice coefficients) metrics of the octree representation were compared with spatial downsampling for a range of memory footprints. Fivefold cross-validation was used to train an octree-based CNN and CNNs with spatial downsampling at four levels of image compression or spatial downsampling. The semantic segmentation performance of octree-based CNN (OctNet) was compared with the performance of U-Nets with spatial downsampling.ResultsOctrees provided high image and segmentation fidelity (median RMSE, 1.34 HU; LV Dice coefficient, 0.970; LV border Dice coefficient, 0.843) with a reduced memory footprint (87.5% reduction). Spatial downsampling to the same memory footprint had lower data fidelity (median RMSE, 12.96 HU; LV Dice coefficient, 0.852; LV border Dice coefficient, 0.310). OctNet segmentation improved the border segmentation Dice coefficient (LV, 0.612; LA, 0.636) compared with the highest performance among U-Nets with spatial downsampling (Dice coefficients: LV, 0.579; LA, 0.592).ConclusionOctree-based representations can reduce the memory footprint and improve segmentation border accuracy.Keywords CT, Cardiac, Segmentation, Supervised Learning, Convolutional Neural Network (CNN), Deep Learning Algorithms, Machine Learning Algorithms© RSNA, 2021

PubMed Central

eScholarship - University of California