229 research outputs found
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
Motion representation plays a vital role in human action recognition in
videos. In this study, we introduce a novel compact motion representation for
video action recognition, named Optical Flow guided Feature (OFF), which
enables the network to distill temporal information through a fast and robust
approach. The OFF is derived from the definition of optical flow and is
orthogonal to the optical flow. The derivation also provides theoretical
support for using the difference between two frames. By directly calculating
pixel-wise spatiotemporal gradients of the deep feature maps, the OFF could be
embedded in any existing CNN based video action recognition framework with only
a slight additional cost. It enables the CNN to extract spatiotemporal
information, especially the temporal information between frames simultaneously.
This simple but powerful idea is validated by experimental results. The network
with OFF fed only by RGB inputs achieves a competitive accuracy of 93.3% on
UCF-101, which is comparable with the result obtained by two streams (RGB and
optical flow), but is 15 times faster in speed. Experimental results also show
that OFF is complementary to other motion modalities such as optical flow. When
the proposed method is plugged into the state-of-the-art video action
recognition framework, it has 96:0% and 74:2% accuracy on UCF-101 and HMDB-51
respectively. The code for this project is available at
https://github.com/kevin-ssy/Optical-Flow-Guided-Feature.Comment: CVPR 2018. code available at
https://github.com/kevin-ssy/Optical-Flow-Guided-Featur
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Transferring knowledge from task-agnostic pre-trained deep models for
downstream tasks is an important topic in computer vision research. Along with
the growth of computational capacity, we now have open-source vision-language
pre-trained models in large scales of the model architecture and amount of
data. In this study, we focus on transferring knowledge for video
classification tasks. Conventional methods randomly initialize the linear
classifier head for vision classification, but they leave the usage of the text
encoder for downstream visual recognition tasks undiscovered. In this paper, we
revise the role of the linear classifier and replace the classifier with the
different knowledge from pre-trained model. We utilize the well-pretrained
language model to generate good semantic target for efficient transferring
learning. The empirical study shows that our method improves both the
performance and the training speed of video classification, with a negligible
change in the model. Our simple yet effective tuning paradigm achieves
state-of-the-art performance and efficient training on various video
recognition scenarios, i.e., zero-shot, few-shot, general recognition. In
particular, our paradigm achieves the state-of-the-art accuracy of 87.8% on
Kinetics-400, and also surpasses previous methods by 20~50% absolute top-1
accuracy under zero-shot, few-shot settings on five popular video datasets.
Code and models can be found at https://github.com/whwu95/Text4Vis .Comment: Accepted by AAAI-2023. Camera Ready Versio
Do Managers In Chinese Family Firms Learn From The Market? Evidence From Chinese Private Placement
Recent empirical papers report managers’ learning in merger and acquisition (M&A) decisions and family control is central in many countries. Does learning exist in family firms’ financing decisions? Based on the announced private placements from Chinese family firms, we investigate the relation between managers’ final decisions in family firms and the market reaction to the announcement. Our analysis suggests that a non-linear relation exists between managers’ learning and family control. Managers generally learn from the market when making final decisions but family involvement can reduce this probability. Supplementary testing indicates that managers in family firms with low ownership are less likely to learn from the market than those in family firms with high ownership. Further analysis suggests that corporate governance can influence managers’ learning. Family member’ participation in purchasing the placed shares and serve as the top managers can make manager’ learning less likely when the ownership is low. Independent directors in family firms don’t play their due role in supervising the behavior of managers and large shareholders
Market Feedback And Managers’ Decisions In Private Placement – Evidence From Chinese Family Firms
What effect does market feedback have on managers’ decisions on private placement in family firms? Based on information asymmetry, agency theory, and corporate governance theory, we investigate the relationship between managers’ final decisions and market feedback to the announcement. We find that managers in family firms accept market feedback in decision-making and their attitude can be affected by many external factors. Managers tend to listen to the market when family firms are non-high-tech, when family members participate in purchasing the placed shares, when family members serve as managers, and when separation of control rights from ownership is small
Energy-Based Models For Speech Synthesis
Recently there has been a lot of interest in non-autoregressive (non-AR)
models for speech synthesis, such as FastSpeech 2 and diffusion models. Unlike
AR models, these models do not have autoregressive dependencies among outputs
which makes inference efficient. This paper expands the range of available
non-AR models with another member called energy-based models (EBMs). The paper
describes how noise contrastive estimation, which relies on the comparison
between positive and negative samples, can be used to train EBMs. It proposes a
number of strategies for generating effective negative samples, including using
high-performing AR models. It also describes how sampling from EBMs can be
performed using Langevin Markov Chain Monte-Carlo (MCMC). The use of Langevin
MCMC enables to draw connections between EBMs and currently popular diffusion
models. Experiments on LJSpeech dataset show that the proposed approach offers
improvements over Tacotron 2
TransGrasp: Grasp Pose Estimation of a Category of Objects by Transferring Grasps from Only One Labeled Instance
Grasp pose estimation is an important issue for robots to interact with the
real world. However, most of existing methods require exact 3D object models
available beforehand or a large amount of grasp annotations for training. To
avoid these problems, we propose TransGrasp, a category-level grasp pose
estimation method that predicts grasp poses of a category of objects by
labeling only one object instance. Specifically, we perform grasp pose transfer
across a category of objects based on their shape correspondences and propose a
grasp pose refinement module to further fine-tune grasp pose of grippers so as
to ensure successful grasps. Experiments demonstrate the effectiveness of our
method on achieving high-quality grasps with the transferred grasp poses. Our
code is available at https://github.com/yanjh97/TransGrasp.Comment: Accepted to European Conference on Computer Vision (ECCV) 202
Reform of Training Ways of Engineering Practice and Innovation Ability for Petroleum Engineering Students
To improve the students’ ability of engineering practice and innovation, Petroleum Engineering Experiment Teaching Center reformed the training ways, including construction of two-way practice teaching system, reform of the experimental teaching organization and management mode, experimental teaching mode, experimental teaching contents, experimental teaching assessment methods, form and content of college students’ extracurricular scientific activities, etc.. These help to form the good situation of independent learning and independent experiments, training students’ engineering capability and innovative capability, which have achieved fruitful results in practice
[μ-2,2′-Dimethyl-2,2′-(p-phenylene)dipropyl]bis[chloridobis(2-methyl-2-phenylpropyl)tin(IV)]
The molecular structure of the title compound, [Sn2(C10H13)4(C14H20)Cl2], is a binuclear centrosymmetric complex, in which the Sn atoms are four-coordinated by three C atoms and one Cl atom in a distorted tetrahedral geometry
- …