933 research outputs found

    Semantic Embedding Space for Zero-Shot Action Recognition

    Full text link
    The number of categories for action recognition is growing rapidly. It is thus becoming increasingly hard to collect sufficient training data to learn conventional models for each category. This issue may be ameliorated by the increasingly popular 'zero-shot learning' (ZSL) paradigm. In this framework a mapping is constructed between visual features and a human interpretable semantic description of each category, allowing categories to be recognised in the absence of any training data. Existing ZSL studies focus primarily on image data, and attribute-based semantic representations. In this paper, we address zero-shot recognition in contemporary video action recognition tasks, using semantic word vector space as the common space to embed videos and category labels. This is more challenging because the mapping between the semantic space and space-time features of videos containing complex actions is more complex and harder to learn. We demonstrate that a simple self-training and data augmentation strategy can significantly improve the efficacy of this mapping. Experiments on human action datasets including HMDB51 and UCF101 demonstrate that our approach achieves the state-of-the-art zero-shot action recognition performance.Comment: 5 page

    Chinese loans in Old Vietnamese with a sesquisyllabic phonology

    Get PDF
    While consonant clusters, taken broadly to include presyllables, are commonly hypothesized for Old Chinese, little direct evidence is available for establishing the early forms of specific words. This essay examines a hitherto overlooked source: Old Vietnamese, a language substantially attested in a single document, which writes certain words, monosyllabic in modern Vietnamese, in an orthography suggesting sesquisyllabic phonology. For a number of words loaned from Chinese, Old Vietnamese provides the only testimony of the form of the Vietic borrowing. The small list of currently known sesquisyllabic words of Chinese origin attested in this document includes examples of both words with a secure initial Chinese cluster and words with plausible Vietic-internal prefixation

    BoundaryFace: A mining framework with noise label self-correction for Face Recognition

    Full text link
    Face recognition has made tremendous progress in recent years due to the advances in loss functions and the explosive growth in training sets size. A properly designed loss is seen as key to extract discriminative features for classification. Several margin-based losses have been proposed as alternatives of softmax loss in face recognition. However, two issues remain to consider: 1) They overlook the importance of hard sample mining for discriminative learning. 2) Label noise ubiquitously exists in large-scale datasets, which can seriously damage the model's performance. In this paper, starting from the perspective of decision boundary, we propose a novel mining framework that focuses on the relationship between a sample's ground truth class center and its nearest negative class center. Specifically, a closed-set noise label self-correction module is put forward, making this framework work well on datasets containing a lot of label noise. The proposed method consistently outperforms SOTA methods in various face recognition benchmarks. Training code has been released at https://github.com/SWJTU-3DVision/BoundaryFace.Comment: ECCV 2022. Code available at https://github.com/SWJTU-3DVision/BoundaryFac
    • …
    corecore