229 research outputs found

    Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

    Full text link
    Motion representation plays a vital role in human action recognition in videos. In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach. The OFF is derived from the definition of optical flow and is orthogonal to the optical flow. The derivation also provides theoretical support for using the difference between two frames. By directly calculating pixel-wise spatiotemporal gradients of the deep feature maps, the OFF could be embedded in any existing CNN based video action recognition framework with only a slight additional cost. It enables the CNN to extract spatiotemporal information, especially the temporal information between frames simultaneously. This simple but powerful idea is validated by experimental results. The network with OFF fed only by RGB inputs achieves a competitive accuracy of 93.3% on UCF-101, which is comparable with the result obtained by two streams (RGB and optical flow), but is 15 times faster in speed. Experimental results also show that OFF is complementary to other motion modalities such as optical flow. When the proposed method is plugged into the state-of-the-art video action recognition framework, it has 96:0% and 74:2% accuracy on UCF-101 and HMDB-51 respectively. The code for this project is available at https://github.com/kevin-ssy/Optical-Flow-Guided-Feature.Comment: CVPR 2018. code available at https://github.com/kevin-ssy/Optical-Flow-Guided-Featur

    Revisiting Classifier: Transferring Vision-Language Models for Video Recognition

    Full text link
    Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research. Along with the growth of computational capacity, we now have open-source vision-language pre-trained models in large scales of the model architecture and amount of data. In this study, we focus on transferring knowledge for video classification tasks. Conventional methods randomly initialize the linear classifier head for vision classification, but they leave the usage of the text encoder for downstream visual recognition tasks undiscovered. In this paper, we revise the role of the linear classifier and replace the classifier with the different knowledge from pre-trained model. We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning. The empirical study shows that our method improves both the performance and the training speed of video classification, with a negligible change in the model. Our simple yet effective tuning paradigm achieves state-of-the-art performance and efficient training on various video recognition scenarios, i.e., zero-shot, few-shot, general recognition. In particular, our paradigm achieves the state-of-the-art accuracy of 87.8% on Kinetics-400, and also surpasses previous methods by 20~50% absolute top-1 accuracy under zero-shot, few-shot settings on five popular video datasets. Code and models can be found at https://github.com/whwu95/Text4Vis .Comment: Accepted by AAAI-2023. Camera Ready Versio

    Do Managers In Chinese Family Firms Learn From The Market? Evidence From Chinese Private Placement

    Get PDF
    Recent empirical papers report managers’ learning in merger and acquisition (M&A) decisions and family control is central in many countries. Does learning exist in family firms’ financing decisions? Based on the announced private placements from Chinese family firms, we investigate the relation between managers’ final decisions in family firms and the market reaction to the announcement. Our analysis suggests that a non-linear relation exists between managers’ learning and family control. Managers generally learn from the market when making final decisions but family involvement can reduce this probability. Supplementary testing indicates that managers in family firms with low ownership are less likely to learn from the market than those in family firms with high ownership. Further analysis suggests that corporate governance can influence managers’ learning. Family member’ participation in purchasing the placed shares and serve as the top managers can make manager’ learning less likely when the ownership is low. Independent directors in family firms don’t play their due role in supervising the behavior of managers and large shareholders

    Market Feedback And Managers’ Decisions In Private Placement – Evidence From Chinese Family Firms

    Get PDF
    What effect does market feedback have on managers’ decisions on private placement in family firms? Based on information asymmetry, agency theory, and corporate governance theory, we investigate the relationship between managers’ final decisions and market feedback to the announcement. We find that managers in family firms accept market feedback in decision-making and their attitude can be affected by many external factors. Managers tend to listen to the market when family firms are non-high-tech, when family members participate in purchasing the placed shares, when family members serve as managers, and when separation of control rights from ownership is small

    Energy-Based Models For Speech Synthesis

    Full text link
    Recently there has been a lot of interest in non-autoregressive (non-AR) models for speech synthesis, such as FastSpeech 2 and diffusion models. Unlike AR models, these models do not have autoregressive dependencies among outputs which makes inference efficient. This paper expands the range of available non-AR models with another member called energy-based models (EBMs). The paper describes how noise contrastive estimation, which relies on the comparison between positive and negative samples, can be used to train EBMs. It proposes a number of strategies for generating effective negative samples, including using high-performing AR models. It also describes how sampling from EBMs can be performed using Langevin Markov Chain Monte-Carlo (MCMC). The use of Langevin MCMC enables to draw connections between EBMs and currently popular diffusion models. Experiments on LJSpeech dataset show that the proposed approach offers improvements over Tacotron 2

    TransGrasp: Grasp Pose Estimation of a Category of Objects by Transferring Grasps from Only One Labeled Instance

    Full text link
    Grasp pose estimation is an important issue for robots to interact with the real world. However, most of existing methods require exact 3D object models available beforehand or a large amount of grasp annotations for training. To avoid these problems, we propose TransGrasp, a category-level grasp pose estimation method that predicts grasp poses of a category of objects by labeling only one object instance. Specifically, we perform grasp pose transfer across a category of objects based on their shape correspondences and propose a grasp pose refinement module to further fine-tune grasp pose of grippers so as to ensure successful grasps. Experiments demonstrate the effectiveness of our method on achieving high-quality grasps with the transferred grasp poses. Our code is available at https://github.com/yanjh97/TransGrasp.Comment: Accepted to European Conference on Computer Vision (ECCV) 202

    Reform of Training Ways of Engineering Practice and Innovation Ability for Petroleum Engineering Students

    Get PDF
    To improve the students’ ability of engineering practice and innovation, Petroleum Engineering Experiment Teaching Center reformed the training ways, including construction of two-way practice teaching system, reform of the experimental teaching organization and management mode, experimental teaching mode, experimental teaching contents, experimental teaching assessment methods, form and content of college students’ extracurricular scientific activities, etc.. These help to form the good situation of independent learning and independent experiments, training students’ engineering capability and innovative capability, which have achieved fruitful results in practice

    [μ-2,2′-Dimethyl-2,2′-(p-phenyl­ene)diprop­yl]bis­[chloridobis(2-methyl-2-phenyl­prop­yl)tin(IV)]

    Get PDF
    The mol­ecular structure of the title compound, [Sn2(C10H13)4(C14H20)Cl2], is a binuclear centrosymmetric complex, in which the Sn atoms are four-coordinated by three C atoms and one Cl atom in a distorted tetra­hedral geometry
    corecore