133,795 research outputs found

    Anchor Loss: Modulating Loss Scale Based on Prediction Difficulty

    Get PDF
    We propose a novel loss function that dynamically re-scales the cross entropy based on prediction difficulty regarding a sample. Deep neural network architectures in image classification tasks struggle to disambiguate visually similar objects. Likewise, in human pose estimation symmetric body parts often confuse the network with assigning indiscriminative scores to them. This is due to the output prediction, in which only the highest confidence label is selected without taking into consideration a measure of uncertainty. In this work, we define the prediction difficulty as a relative property coming from the confidence score gap between positive and negative labels. More precisely, the proposed loss function penalizes the network to avoid the score of a false prediction being significant. To demonstrate the efficacy of our loss function, we evaluate it on two different domains: image classification and human pose estimation. We find improvements in both applications by achieving higher accuracy compared to the baseline methods

    Cross-stitch Networks for Multi-task Learning

    Full text link
    Multi-task learning in Convolutional Networks has displayed remarkable success in the field of recognition. This success can be largely attributed to learning shared representations from multiple supervisory tasks. However, existing multi-task approaches rely on enumerating multiple network architectures specific to the tasks at hand, that do not generalize. In this paper, we propose a principled approach to learn shared representations in ConvNets using multi-task learning. Specifically, we propose a new sharing unit: "cross-stitch" unit. These units combine the activations from multiple networks and can be trained end-to-end. A network with cross-stitch units can learn an optimal combination of shared and task-specific representations. Our proposed method generalizes across multiple tasks and shows dramatically improved performance over baseline methods for categories with few training examples.Comment: To appear in CVPR 2016 (Spotlight

    Efficient Action Detection in Untrimmed Videos via Multi-Task Learning

    Full text link
    This paper studies the joint learning of action recognition and temporal localization in long, untrimmed videos. We employ a multi-task learning framework that performs the three highly related steps of action proposal, action recognition, and action localization refinement in parallel instead of the standard sequential pipeline that performs the steps in order. We develop a novel temporal actionness regression module that estimates what proportion of a clip contains action. We use it for temporal localization but it could have other applications like video retrieval, surveillance, summarization, etc. We also introduce random shear augmentation during training to simulate viewpoint change. We evaluate our framework on three popular video benchmarks. Results demonstrate that our joint model is efficient in terms of storage and computation in that we do not need to compute and cache dense trajectory features, and that it is several times faster than its sequential ConvNets counterpart. Yet, despite being more efficient, it outperforms state-of-the-art methods with respect to accuracy.Comment: WACV 2017 camera ready, minor updates about test time efficienc
    • …
    corecore