933 research outputs found
Semantic Embedding Space for Zero-Shot Action Recognition
The number of categories for action recognition is growing rapidly. It is
thus becoming increasingly hard to collect sufficient training data to learn
conventional models for each category. This issue may be ameliorated by the
increasingly popular 'zero-shot learning' (ZSL) paradigm. In this framework a
mapping is constructed between visual features and a human interpretable
semantic description of each category, allowing categories to be recognised in
the absence of any training data. Existing ZSL studies focus primarily on image
data, and attribute-based semantic representations. In this paper, we address
zero-shot recognition in contemporary video action recognition tasks, using
semantic word vector space as the common space to embed videos and category
labels. This is more challenging because the mapping between the semantic space
and space-time features of videos containing complex actions is more complex
and harder to learn. We demonstrate that a simple self-training and data
augmentation strategy can significantly improve the efficacy of this mapping.
Experiments on human action datasets including HMDB51 and UCF101 demonstrate
that our approach achieves the state-of-the-art zero-shot action recognition
performance.Comment: 5 page
Chinese loans in Old Vietnamese with a sesquisyllabic phonology
While consonant clusters, taken broadly to include presyllables, are commonly hypothesized for Old Chinese, little direct evidence is available for establishing the early forms of specific words. This essay examines a hitherto overlooked source: Old Vietnamese, a language substantially attested in a single document, which writes certain words, monosyllabic in modern Vietnamese, in an orthography suggesting sesquisyllabic phonology. For a number of words loaned from Chinese, Old Vietnamese provides the only testimony of the form of the Vietic borrowing. The small list of currently known sesquisyllabic words of Chinese origin attested in this document includes examples of both words with a secure initial Chinese cluster and words with plausible Vietic-internal prefixation
BoundaryFace: A mining framework with noise label self-correction for Face Recognition
Face recognition has made tremendous progress in recent years due to the
advances in loss functions and the explosive growth in training sets size. A
properly designed loss is seen as key to extract discriminative features for
classification. Several margin-based losses have been proposed as alternatives
of softmax loss in face recognition. However, two issues remain to consider: 1)
They overlook the importance of hard sample mining for discriminative learning.
2) Label noise ubiquitously exists in large-scale datasets, which can seriously
damage the model's performance. In this paper, starting from the perspective of
decision boundary, we propose a novel mining framework that focuses on the
relationship between a sample's ground truth class center and its nearest
negative class center. Specifically, a closed-set noise label self-correction
module is put forward, making this framework work well on datasets containing a
lot of label noise. The proposed method consistently outperforms SOTA methods
in various face recognition benchmarks. Training code has been released at
https://github.com/SWJTU-3DVision/BoundaryFace.Comment: ECCV 2022. Code available at
https://github.com/SWJTU-3DVision/BoundaryFac
- …