510 research outputs found
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Transferring knowledge from task-agnostic pre-trained deep models for
downstream tasks is an important topic in computer vision research. Along with
the growth of computational capacity, we now have open-source vision-language
pre-trained models in large scales of the model architecture and amount of
data. In this study, we focus on transferring knowledge for video
classification tasks. Conventional methods randomly initialize the linear
classifier head for vision classification, but they leave the usage of the text
encoder for downstream visual recognition tasks undiscovered. In this paper, we
revise the role of the linear classifier and replace the classifier with the
different knowledge from pre-trained model. We utilize the well-pretrained
language model to generate good semantic target for efficient transferring
learning. The empirical study shows that our method improves both the
performance and the training speed of video classification, with a negligible
change in the model. Our simple yet effective tuning paradigm achieves
state-of-the-art performance and efficient training on various video
recognition scenarios, i.e., zero-shot, few-shot, general recognition. In
particular, our paradigm achieves the state-of-the-art accuracy of 87.8% on
Kinetics-400, and also surpasses previous methods by 20~50% absolute top-1
accuracy under zero-shot, few-shot settings on five popular video datasets.
Code and models can be found at https://github.com/whwu95/Text4Vis .Comment: Accepted by AAAI-2023. Camera Ready Versio
Artificial Intelligence for Crystal Growth and Characterization
[no abstract available
DISPERSION-DRIVEN ISOMERISM IN THE GAS PHASE: THEORETICAL AND MICROWAVE SPECTROSCOPIC STUDY OF ALLYL ISOCYANATE
The pure rotational spectrum of allyl isocyanate (\chem{CH_2=CHCH_2NCO}) was studied using chirped pulse and Balle-Flygare Fourier Transform microwave (FTMW) spectroscopy. Besides the previously reported \textit{gauche} conformer,\footnote{S. Maiti, A. I. Jaman, and R. N. Nandi, \textit{J. Mol. Spectrosc.} \textbf{158}, 8-13 (1993)} the lowest energy conformer was identified for the first time with the assistance of the quantum-chemical calculations performed at the B3LYP-D3(BJ) and MP2 levels of theory with Dunning’s cc-pVQZ basis set. The assignments were confirmed by the resolved hyperfine structure due to the N quadrupole moment and the spectra of the corresponding C, N and O singly substituted isotopologues in natural abundance. Rotational transitions of the most stable conformer revealed a tunneling splitting due to the interconversion motion between its two mirror images, and the tunneling path was established theoretically. In addition, benchmark calculations of various density functionals with and without dispersion corrections were carried out to investigate the effect of the short-range dispersion energy on the conformational structures
ROTATIONAL SPECTRA AND STRUCTURAL DETERMINATION OF HCCNCS
The ground state of HCCNCS, prepared by high voltage electric discharge of a gas mixture of acetylene and chem{CH_3NCS} in neon during supersonic expansion, was studied using both chirped pulse Fourier transform microwave (cp-FTMW) and Balle Flygare FTMW spectrometers. The pure rotational spectra were measured for the parent, S, and three C isotopologues in natural abundance and the N nuclear quadrupole hyperfine structure was resolved. The observed spectra are consistent with a linear or quasilinear ground state of HCCNCS. The corresponding rotational constants were used to derive the substitution (r) and effective ground state (r) geometries. Supporting calculations at the MP2/cc-pVQZ and CCSD(T)/cc-pVQZ (expanded basis cc-pV(Q+d)Z for sulfur) levels of theory reveal that the potential energy surface is virtually flat around the minimum and yield an equilibrium structure (r) that is consistent with experiment
THE MOLECULAR STRUCTURE OF MONOFLUOROBENZALDEHYDES
The pure rotational spectra of 2- and 3-fluorobenzaldehyde have been investigated using a chirped pulse Fourier transform microwave (FTMW) spectrometer in the range of 8-18 GHz and a Balle-Flygare FTMW spectrometer in the range of 4-26 GHz. As in a previous study of monofluorobenzaldehydes,footnote{Jos'{e} L. Alonso and Rosa M. Villama~{n}'{a}n, J. Chem. Soc., Faraday Trans. 2, 1989, 85(2), 137-149} only transitions due to a single planar conformer were observed for 2-fluorobenzaldehyde (O-trans) whereas two planar conformers (O-trans and O-cis) of 3-fluorobenzaldehydes were confirmed. Transitions due to the seven unique C isotopologues of each of the three molecules have been observed for the first time. Their rotational constants were used to derive the effective ground state (r) and substitution (r) structures. The results compare favourably with the equilibrium (r) geometries which were determined following geometry optimization at the MP2/aug-cc-pVTZ level of theory
TransHP: Image Classification with Hierarchical Prompting
This paper explores a hierarchical prompting mechanism for the hierarchical
image classification (HIC) task. Different from prior HIC methods, our
hierarchical prompting is the first to explicitly inject ancestor-class
information as a tokenized hint that benefits the descendant-class
discrimination. We think it well imitates human visual recognition, i.e.,
humans may use the ancestor class as a prompt to draw focus on the subtle
differences among descendant classes. We model this prompting mechanism into a
Transformer with Hierarchical Prompting (TransHP). TransHP consists of three
steps: 1) learning a set of prompt tokens to represent the coarse (ancestor)
classes, 2) on-the-fly predicting the coarse class of the input image at an
intermediate block, and 3) injecting the prompt token of the predicted coarse
class into the intermediate feature. Though the parameters of TransHP maintain
the same for all input images, the injected coarse-class prompt conditions
(modifies) the subsequent feature extraction and encourages a dynamic focus on
relatively subtle differences among the descendant classes. Extensive
experiments show that TransHP improves image classification on accuracy (e.g.,
improving ViT-B/16 by +2.83% ImageNet classification accuracy), training data
efficiency (e.g., +12.69% improvement under 10% ImageNet training data), and
model explainability. Moreover, TransHP also performs favorably against prior
HIC methods, showing that TransHP well exploits the hierarchical information
Sense: Model Hardware Co-design for Accelerating Sparse CNN on Systolic Array
Sparsity is an intrinsic property of convolutional neural network(CNN) and
worth exploiting for CNN accelerators, but extra processing comes with hardware
overhead, causing many architectures suffering from only minor profit.
Meanwhile, systolic array has been increasingly competitive on CNNs
acceleration for its high spatiotemporal locality and low hardware overhead.
However, the irregularity of sparsity induces imbalanced workload under the
rigid systolic dataflow, causing performance degradation. Thus, this paper
proposed a systolicarray-based architecture, called Sense, for sparse CNN
acceleration by model-hardware co-design, achieving large performance
improvement. To balance input feature map(IFM) and weight loads across
Processing Element(PE) array, we applied channel clustering to gather IFMs with
approximate sparsity for array computation, and co-designed a load-balancing
weight pruning method to keep the sparsity ratio of each kernel at a certain
value with little accuracy loss, improving PE utilization and overall
performance. Additionally, Adaptive Dataflow Configuration is applied to
determine the computing strategy based on the storage ratio of IFMs and
weights, lowering 1.17x-1.8x DRAM access compared with Swallow and further
reducing system energy consumption. The whole design is implemented on
ZynqZCU102 with 200MHz and performs at 471-, 34-, 53- and 191-image/s for
AlexNet, VGG-16, ResNet-50 and GoogleNet respectively. Compared against sparse
systolic-array-based accelerators, Swallow, FESA and SPOTS, Sense achieves
1x-2.25x, 1.95x-2.5x and 1.17x-2.37x performance improvement on these CNNs
respectively with reasonable overhead.Comment: 14 pages, 29 figures, 6 tables, IEEE TRANSACTIONS ON VERY LARGE SCALE
INTEGRATION (VLSI) SYSTEM
- …