32 research outputs found

    Regularized Fine-grained Meta Face Anti-spoofing

    Full text link
    Face presentation attacks have become an increasingly critical concern when face recognition is widely applied. Many face anti-spoofing methods have been proposed, but most of them ignore the generalization ability to unseen attacks. To overcome the limitation, this work casts face anti-spoofing as a domain generalization (DG) problem, and attempts to address this problem by developing a new meta-learning framework called Regularized Fine-grained Meta-learning. To let our face anti-spoofing model generalize well to unseen attacks, the proposed framework trains our model to perform well in the simulated domain shift scenarios, which is achieved by finding generalized learning directions in the meta-learning process. Specifically, the proposed framework incorporates the domain knowledge of face anti-spoofing as the regularization so that meta-learning is conducted in the feature space regularized by the supervision of domain knowledge. This enables our model more likely to find generalized learning directions with the regularized meta-learning for face anti-spoofing task. Besides, to further enhance the generalization ability of our model, the proposed framework adopts a fine-grained learning strategy that simultaneously conducts meta-learning in a variety of domain shift scenarios in each iteration. Extensive experiments on four public datasets validate the effectiveness of the proposed method.Comment: Accepted by AAAI 2020. Codes are available at https://github.com/rshaojimmy/AAAI2020-RFMetaFA

    Strip-MLP: Efficient Token Interaction for Vision MLP

    Full text link
    Token interaction operation is one of the core modules in MLP-based models to exchange and aggregate information between different spatial locations. However, the power of token interaction on the spatial dimension is highly dependent on the spatial resolution of the feature maps, which limits the model's expressive ability, especially in deep layers where the feature are down-sampled to a small spatial size. To address this issue, we present a novel method called \textbf{Strip-MLP} to enrich the token interaction power in three ways. Firstly, we introduce a new MLP paradigm called Strip MLP layer that allows the token to interact with other tokens in a cross-strip manner, enabling the tokens in a row (or column) to contribute to the information aggregations in adjacent but different strips of rows (or columns). Secondly, a \textbf{C}ascade \textbf{G}roup \textbf{S}trip \textbf{M}ixing \textbf{M}odule (CGSMM) is proposed to overcome the performance degradation caused by small spatial feature size. The module allows tokens to interact more effectively in the manners of within-patch and cross-patch, which is independent to the feature spatial size. Finally, based on the Strip MLP layer, we propose a novel \textbf{L}ocal \textbf{S}trip \textbf{M}ixing \textbf{M}odule (LSMM) to boost the token interaction power in the local region. Extensive experiments demonstrate that Strip-MLP significantly improves the performance of MLP-based models on small datasets and obtains comparable or even better results on ImageNet. In particular, Strip-MLP models achieve higher average Top-1 accuracy than existing MLP-based models by +2.44\% on Caltech-101 and +2.16\% on CIFAR-100. The source codes will be available at~\href{https://github.com/Med-Process/Strip_MLP{https://github.com/Med-Process/Strip\_MLP}

    Point-to-Set Distance Metric Learning on Deep Representations for Visual Tracking

    Full text link

    Identification and modulation of electronic band structures of single-phase B-(AlxGa1-x)2O3 alloys grown by laser molecular beam epitaxy

    Get PDF
    Understanding the band structure evolution of (AlxGa1x)2O3 alloys is of fundamental importance for developing Ga2O3-based power electronic devices and vacuum ultraviolet super-radiation hard detectors. Here, we report on the bandgap engineering of b-(AlxGa1x)2O3 thin films and the identification of compositionally dependent electronic band structures by a combination of absorption spectra analyses and density functional theory calculations. Single-monoclinic b-phase (AlxGa1x)2O3 (0 x 0.54) films with a preferred (201) orientation were grown by laser molecular beam epitaxy with tunable bandgap ranging from 4.5 to 5.5 eV. The excellent fitting of absorption spectra by the relation of (ah) 1/2 / (h-E) unambiguously identifies that b-(AlxGa1x)2O3 alloys are indirect bandgap semiconductors. Theoretical calculations predict that the indirect nature of b-(AlxGa1x)2O3 becomes more pronounced with increased Al composition due to the increased eigenvalue energy gap between M and U points in the valence band. The experimentally determined indirect bandgap exhibits almost a linear relationship with Al composition, which is consistent with the theoretical calculation and indicates a small bowing effect and a good miscibility. The identification and modulation of (AlxGa1x)2O3 band structures allows rational design of ultra-wide bandgap oxide heterostructures for the applications in power electronics and solar-blind or X-ray detection.This research was supported by the National Key Research and Development Project (Grant No. 2017YFB0403003), the National Natural Science Foundation of China (Grant Nos. 61774081, 61322403, and 11227904), the Natural Science Foundation of Jiangsu Province (Grant Nos. BK20130013 and BK20161401), the Six Talent Peaks Project in Jiangsu Province (2014XXRJ001), the Fundamental Research Funds for the Central Universities (021014380093 and 021014380085) and the Australian Research Council. The computational part of this research was undertaken with the assistance of resources from the National Computational Infrastructure (NCI), which is supported by the Australian Government under the NCRIS program

    Dense Feature Aggregation and Pruning for RGBT Tracking

    Full text link
    How to perform effective information fusion of different modalities is a core factor in boosting the performance of RGBT tracking. This paper presents a novel deep fusion algorithm based on the representations from an end-to-end trained convolutional neural network. To deploy the complementarity of features of all layers, we propose a recursive strategy to densely aggregate these features that yield robust representations of target objects in each modality. In different modalities, we propose to prune the densely aggregated features of all modalities in a collaborative way. In a specific, we employ the operations of global average pooling and weighted random selection to perform channel scoring and selection, which could remove redundant and noisy features to achieve more robust feature representation. Experimental results on two RGBT tracking benchmark datasets suggest that our tracker achieves clear state-of-the-art against other RGB and RGBT tracking methods.Comment: arXiv admin note: text overlap with arXiv:1811.0985

    The Ninth Visual Object Tracking VOT2021 Challenge Results

    Get PDF
    acceptedVersionPeer reviewe

    Robust MIL-Based Feature Template Learning for Object Tracking

    No full text
    Because of appearance variations, training samples of the tracked targets collected by the online tracker are required for updating the tracking model. However, this often leads to tracking drift problem because of potentially corrupted samples: 1) contaminated/outlier samples resulting from large variations (e.g. occlusion, illumination), and 2) misaligned samples caused by tracking inaccuracy. Therefore, in order to reduce the tracking drift while maintaining the adaptability of a visual tracker, how to alleviate these two issues via an effective model learning (updating) strategy is a key problem to be solved. To address these issues, this paper proposes a novel and optimal model learning (updating) scheme which aims to simultaneously eliminate the negative effects from these two issues mentioned above in a unified robust feature template learning framework. Particularly, the proposed feature template learning framework is capable of: 1) adaptively learning uncontaminated feature templates by separating out contaminated samples, and 2) resolving label ambiguities caused by misaligned samples via a probabilistic multiple instance learning (MIL) model. Experiments on challenging video sequences show that the proposed tracker performs favourably against several state-of-the-art trackers

    Bi-Directional Center-Constrained Top-Ranking for Visible Thermal Person Re-Identification

    No full text
    corecore