510 research outputs found

    Revisiting Classifier: Transferring Vision-Language Models for Video Recognition

    Full text link
    Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research. Along with the growth of computational capacity, we now have open-source vision-language pre-trained models in large scales of the model architecture and amount of data. In this study, we focus on transferring knowledge for video classification tasks. Conventional methods randomly initialize the linear classifier head for vision classification, but they leave the usage of the text encoder for downstream visual recognition tasks undiscovered. In this paper, we revise the role of the linear classifier and replace the classifier with the different knowledge from pre-trained model. We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning. The empirical study shows that our method improves both the performance and the training speed of video classification, with a negligible change in the model. Our simple yet effective tuning paradigm achieves state-of-the-art performance and efficient training on various video recognition scenarios, i.e., zero-shot, few-shot, general recognition. In particular, our paradigm achieves the state-of-the-art accuracy of 87.8% on Kinetics-400, and also surpasses previous methods by 20~50% absolute top-1 accuracy under zero-shot, few-shot settings on five popular video datasets. Code and models can be found at https://github.com/whwu95/Text4Vis .Comment: Accepted by AAAI-2023. Camera Ready Versio

    DISPERSION-DRIVEN ISOMERISM IN THE GAS PHASE: THEORETICAL AND MICROWAVE SPECTROSCOPIC STUDY OF ALLYL ISOCYANATE

    Get PDF
    The pure rotational spectrum of allyl isocyanate (\chem{CH_2=CHCH_2NCO}) was studied using chirped pulse and Balle-Flygare Fourier Transform microwave (FTMW) spectroscopy. Besides the previously reported \textit{gauche} conformer,\footnote{S. Maiti, A. I. Jaman, and R. N. Nandi, \textit{J. Mol. Spectrosc.} \textbf{158}, 8-13 (1993)} the lowest energy conformer was identified for the first time with the assistance of the quantum-chemical calculations performed at the B3LYP-D3(BJ) and MP2 levels of theory with Dunning’s cc-pVQZ basis set. The assignments were confirmed by the resolved hyperfine structure due to the 14^{14}N quadrupole moment and the spectra of the corresponding 13^{13}C, 15^{15}N and 18^{18}O singly substituted isotopologues in natural abundance. Rotational transitions of the most stable conformer revealed a tunneling splitting due to the interconversion motion between its two mirror images, and the tunneling path was established theoretically. In addition, benchmark calculations of various density functionals with and without dispersion corrections were carried out to investigate the effect of the short-range dispersion energy on the conformational structures

    ROTATIONAL SPECTRA AND STRUCTURAL DETERMINATION OF HCCNCS

    Get PDF
    The ground state of HCCNCS, prepared by high voltage electric discharge of a gas mixture of acetylene and chem{CH_3NCS} in neon during supersonic expansion, was studied using both chirped pulse Fourier transform microwave (cp-FTMW) and Balle Flygare FTMW spectrometers. The pure rotational spectra were measured for the parent, 34^{34}S, and three 13^{13}C isotopologues in natural abundance and the 14^{14}N nuclear quadrupole hyperfine structure was resolved. The observed spectra are consistent with a linear or quasilinear ground state of HCCNCS. The corresponding rotational constants were used to derive the substitution (rs_{s}) and effective ground state (r0_{0}) geometries. Supporting calculations at the MP2/cc-pVQZ and CCSD(T)/cc-pVQZ (expanded basis cc-pV(Q+d)Z for sulfur) levels of theory reveal that the potential energy surface is virtually flat around the minimum and yield an equilibrium structure (re_{e}) that is consistent with experiment

    THE MOLECULAR STRUCTURE OF MONOFLUOROBENZALDEHYDES

    Get PDF
    The pure rotational spectra of 2- and 3-fluorobenzaldehyde have been investigated using a chirped pulse Fourier transform microwave (FTMW) spectrometer in the range of 8-18 GHz and a Balle-Flygare FTMW spectrometer in the range of 4-26 GHz. As in a previous study of monofluorobenzaldehydes,footnote{Jos'{e} L. Alonso and Rosa M. Villama~{n}'{a}n, J. Chem. Soc., Faraday Trans. 2, 1989, 85(2), 137-149} only transitions due to a single planar conformer were observed for 2-fluorobenzaldehyde (O-trans) whereas two planar conformers (O-trans and O-cis) of 3-fluorobenzaldehydes were confirmed. Transitions due to the seven unique 13^{13}C isotopologues of each of the three molecules have been observed for the first time. Their rotational constants were used to derive the effective ground state (r0_{0}) and substitution (rs_{s}) structures. The results compare favourably with the equilibrium (re_{e}) geometries which were determined following geometry optimization at the MP2/aug-cc-pVTZ level of theory

    TransHP: Image Classification with Hierarchical Prompting

    Full text link
    This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task. Different from prior HIC methods, our hierarchical prompting is the first to explicitly inject ancestor-class information as a tokenized hint that benefits the descendant-class discrimination. We think it well imitates human visual recognition, i.e., humans may use the ancestor class as a prompt to draw focus on the subtle differences among descendant classes. We model this prompting mechanism into a Transformer with Hierarchical Prompting (TransHP). TransHP consists of three steps: 1) learning a set of prompt tokens to represent the coarse (ancestor) classes, 2) on-the-fly predicting the coarse class of the input image at an intermediate block, and 3) injecting the prompt token of the predicted coarse class into the intermediate feature. Though the parameters of TransHP maintain the same for all input images, the injected coarse-class prompt conditions (modifies) the subsequent feature extraction and encourages a dynamic focus on relatively subtle differences among the descendant classes. Extensive experiments show that TransHP improves image classification on accuracy (e.g., improving ViT-B/16 by +2.83% ImageNet classification accuracy), training data efficiency (e.g., +12.69% improvement under 10% ImageNet training data), and model explainability. Moreover, TransHP also performs favorably against prior HIC methods, showing that TransHP well exploits the hierarchical information

    Sense: Model Hardware Co-design for Accelerating Sparse CNN on Systolic Array

    Full text link
    Sparsity is an intrinsic property of convolutional neural network(CNN) and worth exploiting for CNN accelerators, but extra processing comes with hardware overhead, causing many architectures suffering from only minor profit. Meanwhile, systolic array has been increasingly competitive on CNNs acceleration for its high spatiotemporal locality and low hardware overhead. However, the irregularity of sparsity induces imbalanced workload under the rigid systolic dataflow, causing performance degradation. Thus, this paper proposed a systolicarray-based architecture, called Sense, for sparse CNN acceleration by model-hardware co-design, achieving large performance improvement. To balance input feature map(IFM) and weight loads across Processing Element(PE) array, we applied channel clustering to gather IFMs with approximate sparsity for array computation, and co-designed a load-balancing weight pruning method to keep the sparsity ratio of each kernel at a certain value with little accuracy loss, improving PE utilization and overall performance. Additionally, Adaptive Dataflow Configuration is applied to determine the computing strategy based on the storage ratio of IFMs and weights, lowering 1.17x-1.8x DRAM access compared with Swallow and further reducing system energy consumption. The whole design is implemented on ZynqZCU102 with 200MHz and performs at 471-, 34-, 53- and 191-image/s for AlexNet, VGG-16, ResNet-50 and GoogleNet respectively. Compared against sparse systolic-array-based accelerators, Swallow, FESA and SPOTS, Sense achieves 1x-2.25x, 1.95x-2.5x and 1.17x-2.37x performance improvement on these CNNs respectively with reasonable overhead.Comment: 14 pages, 29 figures, 6 tables, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEM
    • …
    corecore